Skip to main content

Export from a project

Create an empty collector to receive export results

The result of the export task follow the sale structure as the streaming API. To push the result to a persisted queue, we need to create a collector without rules or filters.

Command
curl -L -X PUT 'https://api.talkwalker.com/api/v3/stream/c/<collector id>?access_token=<access_token>'
Request
{}

The collector id will be used as the target on the export task definition.

Response
{
"status_code": "0",
"status_message": "OK",
"request": "PUT /api/v3/stream/c/export_collector?access_token=***",
"result_stream": {
"collectors": [
{
"collector_id": "export_collector",
"state": "PAUSED"
}
]
},
"request_id": "#rosk74h74rno#"
}

Create an export task

In this sample, we will export documents from a project and a topic between the 1st of April 2022 and 30th of April 2022. We add additional query rules to export only documents from REDDIT and Twitter.

Command
curl -L -X POST 'https://api.talkwalker.com/api/v3/stream/p/<project_id>/export?access_token=<access_token>'
Request
{
"start": "2022-04-01",
"stop": "2023-05-01",
"target": "<collector_id>",
"query": "domainurl:reddit.com OR sourcetype:SOCIALMEDIA_TWITTER",
"topics": ["<topic_id>"]
}
Response
{
"status_code": "0",
"status_message": "OK",
"request": "POST /api/v3/stream/p/a2ce1ade-a513-413f-868c-fafc25ba9a23/export?access_token=***",
"result_tasks": {
"tasks": [
{
"creation_date": "2023-01-20T15:56:09.073Z",
"type": "export",
"id": "b870dfc9-2b97-41d2-924a-c6fcb0494562",
"processed": 0,
"progress": 0.0,
"target": "export_collector",
"status": "queued"
}
]
},
"request_id": "#rosk9kt3kfq0#"
}
note

The id is the task ID to save in order to query the task status.

Retrieve task status

Before reading the export task result, you need to wait the task to be finished.

Command
curl -L -X GET 'https://api.talkwalker.com/api/v3/tasks/export/<task_id>?access_token=<access_token>'
Response
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v3/tasks/export/b870dfc9-2b97-41d2-924a-c6fcb0494562?access_token=***",
"result_tasks": {
"tasks": [
{
"creation_date": "2023-01-20T15:56:08.949Z",
"type": "export",
"id": "b870dfc9-2b97-41d2-924a-c6fcb0494562",
"processed": 10684,
"progress": 1.0,
"target": "export_collector",
"status": "finished"
}
]
},
"request_id": "#roske9em4pj2#"
}
  • id is the task ID.
  • processed is the number of documents which were exported.
  • progress is the percentage of the export task progress. the value 1 means 100%,si the
  • status is the state of the export task which can assume the following values: UNKNOWN, QUEUED, RUNNING, FINISHED, FAILED, DELETED, ABORTED, RESULT_LIMIT_REACHED.

Read export task result

To retrieve the results, you can reads the target collector as a queue.

Command
curl -L -X GET 'https://api.talkwalker.com/api/v3/stream/c/<collector_id>/results?access_token=<access_token>&resume_offset=earliest&end_behaviour=stop'
  • The parameter resume_offset is set to earliest to start to read from the beginning of the collector.
  • The parameter end_behaviour is set to stop to close the stream connection once you read the last document in the queue.