Export from a project
Create an empty collector to receive export results
The result of the export task follow the sale structure as the streaming API. To push the result to a persisted queue, we need to create a collector without rules or filters.
Command
curl -L -X PUT 'https://api.talkwalker.com/api/v3/stream/c/<collector id>?access_token=<access_token>'
Request
{}
The collector id
will be used as the target on the export task definition.
Response
{
"status_code": "0",
"status_message": "OK",
"request": "PUT /api/v3/stream/c/export_collector?access_token=***",
"result_stream": {
"collectors": [
{
"collector_id": "export_collector",
"state": "PAUSED"
}
]
},
"request_id": "#rosk74h74rno#"
}
Create an export task
In this sample, we will export documents from a project and a topic between the 1st of April 2022 and 30th of April 2022. We add additional query rules to export only documents from REDDIT and Twitter.
Command
curl -L -X POST 'https://api.talkwalker.com/api/v3/stream/p/<project_id>/export?access_token=<access_token>'
Request
{
"start": "2022-04-01",
"stop": "2023-05-01",
"target": "<collector_id>",
"query": "domainurl:reddit.com OR sourcetype:SOCIALMEDIA_TWITTER",
"topics": ["<topic_id>"]
}
Response
{
"status_code": "0",
"status_message": "OK",
"request": "POST /api/v3/stream/p/a2ce1ade-a513-413f-868c-fafc25ba9a23/export?access_token=***",
"result_tasks": {
"tasks": [
{
"creation_date": "2023-01-20T15:56:09.073Z",
"type": "export",
"id": "b870dfc9-2b97-41d2-924a-c6fcb0494562",
"processed": 0,
"progress": 0.0,
"target": "export_collector",
"status": "queued"
}
]
},
"request_id": "#rosk9kt3kfq0#"
}
note
The id
is the task ID to save in order to query the task status.
Retrieve task status
Before reading the export task result, you need to wait the task to be finished.
Command
curl -L -X GET 'https://api.talkwalker.com/api/v3/tasks/export/<task_id>?access_token=<access_token>'
Response
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v3/tasks/export/b870dfc9-2b97-41d2-924a-c6fcb0494562?access_token=***",
"result_tasks": {
"tasks": [
{
"creation_date": "2023-01-20T15:56:08.949Z",
"type": "export",
"id": "b870dfc9-2b97-41d2-924a-c6fcb0494562",
"processed": 10684,
"progress": 1.0,
"target": "export_collector",
"status": "finished"
}
]
},
"request_id": "#roske9em4pj2#"
}
id
is the task ID.processed
is the number of documents which were exported.progress
is the percentage of the export task progress. the value 1 means 100%,si thestatus
is the state of the export task which can assume the following values:UNKNOWN
,QUEUED
,RUNNING
,FINISHED
,FAILED
,DELETED
,ABORTED
,RESULT_LIMIT_REACHED
.
Read export task result
To retrieve the results, you can reads the target collector as a queue.
Command
curl -L -X GET 'https://api.talkwalker.com/api/v3/stream/c/<collector_id>/results?access_token=<access_token>&resume_offset=earliest&end_behaviour=stop'
- The parameter
resume_offset
is set toearliest
to start to read from the beginning of the collector. - The parameter
end_behaviour
is set tostop
to close the stream connection once you read the last document in the queue.