Skip to main content

Past data export through collectors

Exports allow to create asynchronous tasks that copy a selection of past data into a collector, which can then be accessed through collectors.

There are three options to define the source of past data:

  • existing stream definitions (via a stream ID)
  • Talkwalker projects (via project ID)
  • specifying a query in the request body

An ongoing task can be checked or aborted by using its task ID, included in the response.

There are 3 POST endpoints, presented in the following subsections, which can execute an export task. These 3 endpoints share the following parameters, set inside the body:

Parameters for creating an export task
parameterdescriptionrequired?default
starttimestamp (milliseconds since 1.1.1970, e.g. 1539302400000) or date of the timeframe's start (2018-10-12)required
stoptimestamp or date of the timeframe's endoptional
targetID of the collectorrequired
querythe query to search for (conjunctive to existing queries, i.e. matching all)optional
limitthe maximum number of results to export before interruptingoptional1.000.000

Each exported result consumes 1 credit. Exporting the same result multiple times due to overlapping export tasks therefore requires multiple credits.

Creation of export tasks for Talkwalker projects

An export task for a Talkwalker project is started with a POST request to the endpoint.

Command
curl -XPOST 'https://api.talkwalker.com/api/v3/stream/p/<project_id>/export?access_token=<access_token>
-d '{"start": "<date>", "stop":<timestamp>, "target":"<target>"}'

Furthermore, for Talkwalker project export tasks, it is possible to further narrow down the result set. If not the complete project but a selection of its topics must be matched, this can be specified by using the topics parameter.

parameterdescriptionrequired?
topicsIDs of the topics taken into considerationoptional
Example Command
curl -XPOST 'https://api.talkwalker.com/api/v3/stream/p/<project_id>/export?access_token=<access_token>
-d '{"start": "2018-11-15", "stop":1545127673884, "target":"testcollector", "topics":["topic1_id","topic2_id"]}'

Tags can also be included or excluded by using them in the query parameter. In this case, the IDs of the tags should be provided

Example Command
curl -XPOST 'https://api.talkwalker.com/api/v3/stream/p/<project_id>/export?access_token=<access_token>
-d '{"start": "2018-11-15", "stop":1545127673884, "target":"testcollector", "topics":["topic1_id","topic2_id"], "query":"tag:tag_id"}'

It is also possible to retrieve the datasets that are defined in the Talkwalker Customer Intelligence project. This can be specified by using the datasets parameter. Datasets older than 7 days can no longer be queried.

parameterdescriptionrequired?
datasetsIDs of the datasets defined in the Talkwalker Customer Intelligence projectoptional
Example Command
curl -XPOST 'https://api.talkwalker.com/api/v3/stream/p/<CI_project_id>/export?access_token=<access_token>
-d '{"start": "2022-01-01", "stop":1641830679000, "target":"testcollector", "datasets":["datasets1_id","datasets2_id"]}'

Rate Limit

This endpoint is limited to 40 calls per minute.

Creation of export tasks for existing streams

An export of data based on an existing stream definition is done, similar to projects, by sending a POST request to the endpoint.

Command
curl -XPOST 'https://api.talkwalker.com/api/v3/stream/s/<stream_id>/export?access_token=<access_token>
-d '{"start": "<date>", "target":<collector_id>}'

Creation of export tasks based on query parameter

With a third endpoint, it is possible to create an export task without providing a project or stream ID. This endpoint depends on the query parameter, which consequently becomes required instead of optional.



Command
curl -XPOST 'https://api.talkwalker.com/api/v3/stream/export?access_token=<access_token>
-d '{"start": "<date>", "target":<collector_id>, "query":"<query>"}'

Rate Limit

This endpoint is limited to 40 calls per minute.

Example

In this example, we wish to export all data from September 2018.

We start by creating an empty collector.

curl -XPUT 'https://api.talkwalker.com/api/v3/stream/c/collector-1?access_token=<access_token>'
-d '{"collector_query" : {}}'
-H "Content-Type: application/json; charset=UTF-8"

The newly created collector is then used as target for the export task, where the time frame is limited to September 2018 using the start (as date) and stop (as timestamp, without quotes) parameters in the request body.

Command
curl -XPOST 'https://api.talkwalker.com/api/v3/stream/s/stream-1/export?access_token=<access_token>
-d '{"start": "2018-01-09", "stop": 1538352000000, "target":"collector-1"}'
Example response
{
"status_code": "0",
"status_message": "OK",
"request": "POST /api/v3/stream/export?access_token=<access_token>",
"result_tasks": {
"tasks": [
{
"creation_date": "2018-12-31T15:24:34.069Z",
"type": "export",
"id": "task-1",
"status": "queued",
"processed": 0,
"progress": 0.0,
"target": "collector-1"
}
]
}
}

In the response, the state of the export task is included, which can assume the following values: UNKNOWN, QUEUED, RUNNING, FINISHED, FAILED, DELETED, ABORTED, RESULT_LIMIT_REACHED.

Best practice: If results for longer time periods shall be exported, it makes sense to split the export task into multiple smaller export tasks (e.g. one month when exporting results for half a year). This allows for a better estimation of the credit cost and the amount of results for the remaining time frame.

Status of an export

Using the task ID, which can be obtained from the response when creating a new task, the status of an export can be accessed.

Command
curl -XGET 'https://api.talkwalker.com/api/v3/tasks/export/<task_id>?access_token=<access_token>
Example response
{
"status_code" : "0",
"status_message" : "OK",
"request" : "GET /api/v3/tasks/export/task-1?access_token=<access_token>",
"result_tasks" : {
"tasks" : [{
"creation_date" : "2018-03-21T08:23:00.335Z",
"type" : "export",
"id" : task-1,
"status" : "finished",
"processed" : 3,
"progress" : 1.0,
"target" : "coll-01"
}]
}
}

The same request will give the list of all recent tasks if the task ID parameter is left aside.

Command
curl -XGET 'https://api.talkwalker.com/api/v3/tasks/export?access_token=<access_token>
Example response
{
"status_code" : "0",
"status_message" : "OK",
"request" : "GET /api/v3/tasks/export?access_token=<access_token>",
"result_tasks" : {
"tasks" : [{
"creation_date" : "2018-03-21T08:28:35.469Z",
"type" : "export",
"id" : task-1,
"status" : "queued",
"processed" : 0,
"progress" : 0.0,
"target" : "collector-1"
},
{...},
{...}]
}
}

Rate Limit

This endpoint is limited to 40 calls per minute.

Abort a task

Using the task ID a currently running export task can be aborted.

Command
curl -XDELETE 'https://api.talkwalker.com/api/v3/tasks/export/<task_id>?access_token=<access_token>
Sample response
{
"status_code" : "0",
"status_message" : "OK",
"request" : "DELETE /api/v3/tasks/export/task-1?access_token=<access_token>",
"result_tasks" : {
"tasks" : [{
"id" : task-1,
"status" : "deleted"
}]
}
}

Rate Limit

This endpoint is limited to 40 calls per minute.