Export from a project
Create an empty collector to receive export results
The result of the export task follow the sale structure as the streaming API. To push the result to a persisted queue, we need to create a collector without rules or filters.
Command
curl -L -X PUT 'https://api.talkwalker.com/api/v3/stream/c/<collector id>?access_token=<access_token>'
Request
{}
The collector id will be used as the target on the export task definition.
Response
{
"status_code": "0",
"status_message": "OK",
"request": "PUT /api/v3/stream/c/export_collector?access_token=***",
"result_stream": {
"collectors": [
{
"collector_id": "export_collector",
"state": "PAUSED"
}
]
},
"request_id": "#rosk74h74rno#"
}
Create an export task
In this sample, we will export documents from a project and a topic between the 1st of April 2022 and 30th of April 2022. We add additional query rules to export only documents from REDDIT and Twitter.
Command
curl -L -X POST 'https://api.talkwalker.com/api/v3/stream/p/<project_id>/export?access_token=<access_token>'
Request
{
"start": "2022-04-01",
"stop": "2023-05-01",
"target": "<collector_id>",
"query": "domainurl:reddit.com OR sourcetype:SOCIALMEDIA_TWITTER",
"topics": ["<topic_id>"]
}
Response
{
"status_code": "0",
"status_message": "OK",
"request": "POST /api/v3/stream/p/a2ce1ade-a513-413f-868c-fafc25ba9a23/export?access_token=***",
"result_tasks": {
"tasks": [
{
"creation_date": "2023-01-20T15:56:09.073Z",
"type": "export",
"id": "b870dfc9-2b97-41d2-924a-c6fcb0494562",
"processed": 0,
"progress": 0.0,
"target": "export_collector",
"status": "queued"
}
]
},
"request_id": "#rosk9kt3kfq0#"
}
note
The id is the task ID to save in order to query the task status.
Retrieve task status
Before reading the export task result, you need to wait the task to be finished.
Command
curl -L -X GET 'https://api.talkwalker.com/api/v3/tasks/export/<task_id>?access_token=<access_token>'
Response
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v3/tasks/export/b870dfc9-2b97-41d2-924a-c6fcb0494562?access_token=***",
"result_tasks": {
"tasks": [
{
"creation_date": "2023-01-20T15:56:08.949Z",
"type": "export",
"id": "b870dfc9-2b97-41d2-924a-c6fcb0494562",
"processed": 10684,
"progress": 1.0,
"target": "export_collector",
"status": "finished"
}
]
},
"request_id": "#roske9em4pj2#"
}
idis the task ID.processedis the number of documents which were exported.progressis the percentage of the export task progress. the value 1 means 100%,si thestatusis the state of the export task which can assume the following values:UNKNOWN,QUEUED,RUNNING,FINISHED,FAILED,DELETED,ABORTED,RESULT_LIMIT_REACHED.
Read export task result
To retrieve the results, you can reads the target collector as a queue.
Command
curl -L -X GET 'https://api.talkwalker.com/api/v3/stream/c/<collector_id>/results?access_token=<access_token>&resume_offset=earliest&end_behaviour=stop'
- The parameter
resume_offsetis set toearliestto start to read from the beginning of the collector. - The parameter
end_behaviouris set tostopto close the stream connection once you read the last document in the queue.