Skip to main content

Collectors

A collector allows users to define a hybrid collection of rules, based on:

  • projects
  • streams
  • queries/filters.

At least one of these needs to be defined.

All results matching the collector's setup are buffered for 7 days on the server and consume 1 credit, each, no matter how many times they are read. In other words, data can be downloaded multiple times without additional cost.

A special use case of collectors lies in the access of past data, which is presented separately in the next section.

In this section, we present how to download the results of a collector and provide a list of operations (including the definition of a collector) along with examples.

Downloading the results of a collector

The search results of a collector can be accessed by a GET HTTP request, allowing for several optional parameters (see Table Optional parameters).

Command
curl -XGET 'https://api.talkwalker.com/api/v3/stream/c/<collector_id>/results?access_token=<access_token>'
Optional parameters
parameterdescriptionpossible valuesdefault value
resume_offsetposition to resume the data access from. Can be retrieved from control chunks."earliest" | "latest" | <resume_token>"earliest"
end_behaviourwhat to do when we reached the most recent result"stop" | "wait""wait"

Rate Limit

This endpoint is limited to 5 calls per minute.

Operations on the definition of a collector

The Talkwalker Streaming API allows users to create/replace, retrieve and delete a collector, using the endpoint

Definition of a collector

The definition of a collector consists of the following parameters.

Fields of a collector
parameterdescription

collector_id

collector ID

collector_query

projects (array)

project IDs

streams (array)

stream IDs

queries (array)

id

query ID

query

query string

project_topics (array)

project

project ID

topics (array)

topics IDs added within that project

filters (array)

filter IDs

danger

The collector_query must never include both project and project_topics, only one of these two is allowed.

To create an active collector, at least one parameter is required in the collector_query (e.g. project, stream or project_topics). + An empty collector query can be used to create a paused collector for past data export tasks. See the section Creation of export tasks in Talkwalker projects.

For all list parameters except filters (e.g. streams, queries, topics in project_topics), only one element needs to match (OR between elements of the list(stream IDs, topic IDs...)). + All filters need to be matched (AND between different filter IDs). + If multiple parameters are provided (e.g. project_topics and filters), they must all be matched (AND between different parameters).

Examples

Collector on a stream

Definition of stream "stream-1"
{
"stream_id": "stream-1",
"rules": [
{
"rule_id": "<rule_id>",
"query": "<query>"
}
]
}
Definition of collector "collector-1"
{
"collector_query" : {
"streams" : ["stream-1"]
}

All documents which match the stream remain in the collector for 7 days.

Collector on a stream filtered on an additional query
{
"collector_query" : {
"streams" : ["stream-1"],
"queries" : [{
"id" : "<q1>",
"query" : "<query>"
}]
}

This collector collects all documents, which match "stream-1" AND q1.

Collector on a project filtered on specific queries
{
"collector_query": {
"projects": ["<p1>", "<p2>"],
"queries": [
{
"id": "<q1>",
"query": "<query_1>"
},
{
"id": "<q2>",
"query": "<query_2>"
}
]
}
}

This collector collects all documents, which match (p1 OR p2) AND (q1 OR q2).

Collector on specific topics with additional queries and filters
{
"collector_query": {
"queries": [
{
"id": "<q1>",
"query": "<query_1>"
},
{
"id": "<q2>",
"query": "<query_2>"
}
],
"project_topics": {
"project": "<p1>",
"topics": ["<t1>", "<t2>"]
},
"filters": ["<f1>", "<f2>"]
}
}

This collector collects all documents, which match (q1 OR q2) AND (t1 OR t2) AND f1 AND f2.

Create / update a collector

Command
curl -XPUT 'https://api.talkwalker.com/api/v3/stream/c/<collector_id>?access_token=<access_token>'
-d '<collector_definition>'
-H 'Content-Type: application/json; charset=UTF-8'

For a <collector_definition>, the field state should not be set (it is set to ACTIVE automatically), and at least a project, a stream or a query must be set in the field collector_query. A collector can include only one project but multiple queries and streams. The number of allowed queries and streams is not limited.

Example Command
curl -XPUT 'https://api.talkwalker.com/api/v3/stream/c/collector-1?access_token=<access_token>&pretty=true'
-d '{"collector_query" : {"streams" : ["stream-1"], "queries" : [{"id" : "q-1", "query" : "lang:en"}]}}'
-H 'Content-Type: application/json; charset=UTF-8'
Example Response
{
"status_code": "0",
"status_message": "OK",
"request": "PUT /api/v3/stream/c/collector-1?access_token=<access_token>&pretty=true",
"result_stream": {
"collectors": [
{
"state": "ACTIVE",
"collector_id": "collector-1"
}
]
}
}

Rate Limit

This endpoint is limited to 20 calls per minute.

Retrieve the definition of a collector

Command
curl -XGET 'https://api.talkwalker.com/api/v3/stream/c/<collector_id>?access_token=<access_token>&pretty=true'

In the response, the state of the collector is included, which can assume the following values: UNKNOWN, ACTIVE, ERROR, DELETED, PAUSED, NO_CREDITS.

Example Response
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v3/stream/c/collector-1?access_token=<access_token>&pretty=true",
"result_stream": {
"collectors": [
{
"collector_id": "collector-1",
"state": "ACTIVE",
"query": {
"streams": ["stream-1"],
"queries": [
{
"id": "q-1",
"query": "lang:en"
}
]
}
}
]
}
}

Rate Limit

This endpoint is limited to 200 calls per minute.

Delete a collector

Deleting a collector permanently removes it and its content. A new collector with the same name can be created, but it will not include the old collector's results. Contrary, when updating a collector with a new query without deleting it, the old data is still included.

Command
curl -XDELETE 'https://api.talkwalker.com/api/v3/stream/c/<collector_id>?access_token=<access_token>&pretty=true'
Example Response
{
"status_code": "0",
"status_message": "OK",
"request": "DELETE /api/v3/stream/c/collector-1?access_token=<access_token>&pretty=true",
"result_stream": {
"collectors": [
{
"collector_id": "collector-1",
"state": "DELETED"
}
]
}
}

Rate Limit

This endpoint is limited to 20 calls per minute.

Pause a collector

When calling this endpoint, a collector's state changes to "PAUSED". A collector does not collect any real-time data while it is paused. When resuming a paused collector, all previously collected data is still included. A paused collector that is chosen as target for an export task still receives all exported data.

Command
curl -XPOST 'https://api.talkwalker.com/api/v3/stream/c/<collector_id>/pause?access_token=<access_token>&pretty=true'
Example Response
{
"status_code": "0",
"status_message": "OK",
"request": "POST /api/v3/stream/c/collector-1/pause?access_token=<access_token>&pretty=true",
"result_stream": {
"collectors": [
{
"collector_id": "collector-1",
"state": "PAUSED"
}
]
}
}

Rate Limit

This endpoint is limited to 40 calls per minute.

Resume a collector

Resuming a collector shifts its state from "PAUSED" to "ACTIVE". All incoming data from the point of resuming the collector onwards is stored again.

Command
curl -XPOST 'https://api.talkwalker.com/api/v3/stream/c/<collector_id>/resume?access_token=<access_token>&pretty=true'
Example Response
{
"status_code": "0",
"status_message": "OK",
"request": "POST /api/v3/stream/c/collector-1/resume?access_token=<access_token>&pretty=true",
"result_stream": {
"collectors": [
{
"collector_id": "collector-1",
"state": "ACTIVE"
}
]
}
}

Rate Limit

This endpoint is limited to 40 calls per minute.

Resume a collector

Resuming a collector shifts its state from "PAUSED" to "ACTIVE". All incoming data from the point of resuming the collector onwards is stored again.

Command
curl -XPOST 'https://api.talkwalker.com/api/v3/stream/c/<collector_id>/resume?access_token=<access_token>&pretty=true'
Example Response
{
"status_code": "0",
"status_message": "OK",
"request": "POST /api/v3/stream/c/collector-1/resume?access_token=<access_token>&pretty=true",
"result_stream": {
"collectors": [
{
"collector_id": "collector-1",
"state": "ACTIVE"
}
]
}
}

Rate Limit

This endpoint is limited to 40 calls per minute.

Retrieve the information of all streams and collectors

Command
curl -XGET 'https://api.talkwalker.com/api/v3/stream/info?access_token=<access_token>&pretty=true'
Example Response
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v3/stream/info?access_token=<access_token>&pretty=true",
"result_stream": {
"streams": [
{
"stream_id": "stream-1",
"enabled": true
}
],
"collectors": [
{
"collector_id": "collector-1",
"state": "ACTIVE"
}
]
}
}

Rate Limit

This endpoint is limited to 20 calls per minute.