Past data export through collectors
Exports allow to create asynchronous tasks that copy a selection of past data into a collector, which can then be accessed through collectors.
There are three options to define the source of past data:
- existing stream definitions (via a stream ID)
- Talkwalker projects (via project ID)
- specifying a query in the request body
An ongoing task can be checked or aborted by using its task ID, included in the response.
There are 3 POST endpoints, presented in the following subsections, which can execute an export task. These 3 endpoints share the following parameters, set inside the body:
parameter | description | required? | default |
---|---|---|---|
start | timestamp (milliseconds since 1.1.1970, e.g. 1539302400000) or date of the timeframe's start (2018-10-12) | required | |
stop | timestamp or date of the timeframe's end | optional | |
target | ID of the collector | required | |
query | the query to search for (conjunctive to existing queries, i.e. matching all) | optional | |
limit | the maximum number of results to export before interrupting | optional | 1.000.000 |
Each exported result consumes 1 credit. Exporting the same result multiple times due to overlapping export tasks therefore requires multiple credits.
Creation of export tasks for Talkwalker projects
An export task for a Talkwalker project is started with a POST request to the endpoint.
curl -XPOST 'https://api.talkwalker.com/api/v3/stream/p/<project_id>/export?access_token=<access_token>
-d '{"start": "<date>", "stop":<timestamp>, "target":"<target>"}'
Furthermore, for Talkwalker project export tasks, it is possible to further narrow down the result set.
If not the complete project but a selection of its topics must be matched, this can be specified by using the topics
parameter.
parameter | description | required? |
---|---|---|
topics | IDs of the topics taken into consideration | optional |
curl -XPOST 'https://api.talkwalker.com/api/v3/stream/p/<project_id>/export?access_token=<access_token>
-d '{"start": "2018-11-15", "stop":1545127673884, "target":"testcollector", "topics":["topic1_id","topic2_id"]}'
Tags can also be included or excluded by using them in the query parameter. In this case, the IDs of the tags should be provided
curl -XPOST 'https://api.talkwalker.com/api/v3/stream/p/<project_id>/export?access_token=<access_token>
-d '{"start": "2018-11-15", "stop":1545127673884, "target":"testcollector", "topics":["topic1_id","topic2_id"], "query":"tag:tag_id"}'
It is also possible to retrieve the datasets that are defined in the Talkwalker Customer Intelligence project.
This can be specified by using the datasets
parameter. Datasets older than 7 days can no longer be queried.
parameter | description | required? |
---|---|---|
datasets | IDs of the datasets defined in the Talkwalker Customer Intelligence project | optional |
curl -XPOST 'https://api.talkwalker.com/api/v3/stream/p/<CI_project_id>/export?access_token=<access_token>
-d '{"start": "2022-01-01", "stop":1641830679000, "target":"testcollector", "datasets":["datasets1_id","datasets2_id"]}'
Rate Limit
This endpoint is limited to 40 calls per minute.
Creation of export tasks for existing streams
An export of data based on an existing stream definition is done, similar to projects, by sending a POST request to the endpoint.
curl -XPOST 'https://api.talkwalker.com/api/v3/stream/s/<stream_id>/export?access_token=<access_token>
-d '{"start": "<date>", "target":<collector_id>}'
Creation of export tasks based on query parameter
With a third endpoint, it is possible to create an export task without providing a project or stream ID.
This endpoint depends on the query
parameter, which consequently becomes required
instead of optional
.
curl -XPOST 'https://api.talkwalker.com/api/v3/stream/export?access_token=<access_token>
-d '{"start": "<date>", "target":<collector_id>, "query":"<query>"}'
Rate Limit
This endpoint is limited to 40 calls per minute.
Example
In this example, we wish to export all data from September 2018.
We start by creating an empty collector.
curl -XPUT 'https://api.talkwalker.com/api/v3/stream/c/collector-1?access_token=<access_token>'
-d '{"collector_query" : {}}'
-H "Content-Type: application/json; charset=UTF-8"
The newly created collector is then used as target for the export task, where the time frame is limited to September 2018 using the start
(as date) and stop
(as timestamp, without quotes) parameters in the request body.
curl -XPOST 'https://api.talkwalker.com/api/v3/stream/s/stream-1/export?access_token=<access_token>
-d '{"start": "2018-01-09", "stop": 1538352000000, "target":"collector-1"}'
{
"status_code": "0",
"status_message": "OK",
"request": "POST /api/v3/stream/export?access_token=<access_token>",
"result_tasks": {
"tasks": [
{
"creation_date": "2018-12-31T15:24:34.069Z",
"type": "export",
"id": "task-1",
"status": "queued",
"processed": 0,
"progress": 0.0,
"target": "collector-1"
}
]
}
}
In the response, the state of the export task is included, which can assume the following values: UNKNOWN
, QUEUED
, RUNNING
, FINISHED
, FAILED
, DELETED
, ABORTED
, RESULT_LIMIT_REACHED
.
Best practice: If results for longer time periods shall be exported, it makes sense to split the export task into multiple smaller export tasks (e.g. one month when exporting results for half a year). This allows for a better estimation of the credit cost and the amount of results for the remaining time frame.
Status of an export
Using the task ID, which can be obtained from the response when creating a new task, the status of an export can be accessed.
curl -XGET 'https://api.talkwalker.com/api/v3/tasks/export/<task_id>?access_token=<access_token>
{
"status_code" : "0",
"status_message" : "OK",
"request" : "GET /api/v3/tasks/export/task-1?access_token=<access_token>",
"result_tasks" : {
"tasks" : [{
"creation_date" : "2018-03-21T08:23:00.335Z",
"type" : "export",
"id" : task-1,
"status" : "finished",
"processed" : 3,
"progress" : 1.0,
"target" : "coll-01"
}]
}
}
The same request will give the list of all recent tasks if the task ID parameter is left aside.
curl -XGET 'https://api.talkwalker.com/api/v3/tasks/export?access_token=<access_token>
{
"status_code" : "0",
"status_message" : "OK",
"request" : "GET /api/v3/tasks/export?access_token=<access_token>",
"result_tasks" : {
"tasks" : [{
"creation_date" : "2018-03-21T08:28:35.469Z",
"type" : "export",
"id" : task-1,
"status" : "queued",
"processed" : 0,
"progress" : 0.0,
"target" : "collector-1"
},
{...},
{...}]
}
}
Rate Limit
This endpoint is limited to 40 calls per minute.
Abort a task
Using the task ID a currently running export task can be aborted.
curl -XDELETE 'https://api.talkwalker.com/api/v3/tasks/export/<task_id>?access_token=<access_token>
{
"status_code" : "0",
"status_message" : "OK",
"request" : "DELETE /api/v3/tasks/export/task-1?access_token=<access_token>",
"result_tasks" : {
"tasks" : [{
"id" : task-1,
"status" : "deleted"
}]
}
}
Rate Limit
This endpoint is limited to 40 calls per minute.