Skip to main content

Frequently asked questions

How to get all new documents from a Talkwalker project?

The following command accesses data from a Talkwalker project. This data access takes place in real-time data and applies the rules defined in the given Talkwalker project.

curl -XGET 'https://api.talkwalker.com/api/v3/stream/p/<project_id>/result?access_token=<access_token>'

Using a POST request instead of a GET request allows for additional filtering of the results, for example by setting a query.

curl -XPOST 'https://api.talkwalker.com/api/v3/stream/p/<project_id>/result?access_token=<access_token>'
-d '{"q":"<query>"}'
-H "Content-Type: application/json; charset=UTF-8"

What does an empty collector query mean?

It is possible to leave the collector_query empty when creating a new collector:

curl -XPUT 'https://api.talkwalker.com/api/v3/stream/c/<collector_id>?access_token=<access_token>'
-d '{"collector_query" : {}}'
-H "Content-Type: application/json; charset=UTF-8"

This collector will not collect any live data, instead it only serves as a target for export tasks.

How to get all documents from a Talkwalker project for a certain time period in the past?

In order to access past results from a Talkwalker project, an export task is used, and the exported results are sent to a collector. First, an empty collector is created. Empty collectors do not collect data in real-time, so the newly created collector remains paused while receiving all data from the export task.

curl -XPUT 'https://api.talkwalker.com/api/v3/stream/c/<collector_id>?access_token=<access_token>'
-d '{"collector_query" : {}}'
-H "Content-Type: application/json; charset=UTF-8"

Once the collector is created, it is used as target of a project export task. Export tasks require a start date or timestamp and can take an optional stop date or timestamp.

curl -XPOST 'https://api.talkwalker.com/api/v3/stream/p/<project_id>/export?access_token=<access_token>'
-d '{"start":"yyyy-MM-dd['T'HH:mm:ss[.SSS][xxx]]", "stop":"yyyy-MM-dd['T'HH:mm:ss[.SSS][xxx]]","target":"<collector_id>"}'
-H "Content-Type: application/json; charset=UTF-8"

An example can be found in the Streaming API documentation.

How to get past and live data for my project?

One collector can collect both, past and live data, with a combination of two approaches.

First, a new collector is created for the project in question:

curl -XPUT 'https://api.talkwalker.com/api/v3/stream/c/<collector_id>?access_token=<access_token>'
-d '{"collector_query" : {"project" : <project_id>}}'
-H "Content-Type: application/json; charset=UTF-8"

This collector, from the moment of its creation, collects live data. Thus, the only part missing is the past data.

In order to access the past data, an export task is needed, which is given the newly created collector as target:

curl -XPOST 'https://api.talkwalker.com/api/v3/stream/p/<project_id>/export?access_token=<access_token>'
-d '{"start":"yyyy-MM-dd['T'HH:mm:ss[.SSS][xxx]]", "target":"<collector_id>"}'
-H "Content-Type: application/json; charset=UTF-8"

What to do when out of credits during an export task?

If all credits are consumed during an export task, the task is interrupted. All data exported up to that point is sent to the target collector, so the amount of exported data always matches the amount of consumed credits.

How to get an estimation for the number of results of an export task?

There is no means for estimating the number of results beforehand.

However, the cost of the full export task can be estimated by subdividing the complete time period in multiple chunks. Exporting a fraction of the whole first can provide a rough picture of the dimension of the whole. Similarly, one can go chunk by chunk through a larger time frame and adapt the estimation after each finished task.

Imagine an export task that is supposed to export result data for a period of two years:

curl -XPOST 'https://api.talkwalker.com/api/v3/stream/p/<project_id>/export?access_token=<access_token>'
-d '{"start":"2016-09-01", "stop":"2018-09-01","target":"<collector_id>"}'
-H "Content-Type: application/json; charset=UTF-8"

This can also be subdivided 24 tasks of 1 month each:

curl -XPOST 'https://api.talkwalker.com/api/v3/stream/p/<project_id>/export?access_token=<access_token>'
-d '{"start":"2016-09-01", "stop":"2016-10-01","target":"<collector_id>"}'
-H "Content-Type: application/json; charset=UTF-8"

Still, the resulting estimation is never guaranteed to fully reflect the real outcome. Spikes in the data, for example, can't be estimated this way.

Is it possible to get results for multiple projects with one collector?

No, this is not possible. In order to receive results for multiple projects, one collector per project is needed.

How to limit the number of results of an export task?

When creating an export task, it is possible to add the limit parameter to the body:

curl -XPUT 'https://api.talkwalker.com/api/v3/stream/p/<project_id>/export?access_token=<access_token>'
-d '{"start":"2016-09-01","target":"<collector_id>", "limit":1000}'
-H "Content-Type: application/json; charset=UTF-8"

If there are 1000 results or less, the export task succeeds. Otherwise, it fails with status result_limit_reached, the first 1000 results are written to the collector and 1000 credits are consumed.

How to get the results of a stream from a certain point onward?

When creating a collector, a stream can be connected to it. All data found by that stream from this point on is collected by the collector.

curl -XPUT 'https://api.talkwalker.com/api/v3/stream/s/<stream_id>?access_token=<access_token>'
-d '{"rules" : [{"rule_id" : "<some_rule_id>", "query":"<some_query>"}]}'
-H "Content-Type: application/json; charset=UTF-8"
curl -XPUT 'https://api.talkwalker.com/api/v3/stream/c/<collector_id>?access_token=<access_token>'
-d '{"collector_query" : {"collector_query" : {"streams" : ["<stream_id>"]}}}'
-H "Content-Type: application/json; charset=UTF-8"

When displaying the results, there are two types of chunks: CT_RESULT, containing the matching documents, and CT_CONTROL containing control information.

{
"chunk_type": "CT_CONTROL",
"chunk_control": {
"connection_id": "#pch41wmpsxsh#",
"resume_offset": "<resume_token>",
"collector_id": "<collector_id>"
}
}

Among others, each CT_CONTROL chunk contains an offset value "resume_offset", which can be used as a request parameter. The result access will then start at the beginning of the slice with that resume_token and return all results from that point on.

curl -XGET 'https://api.talkwalker.com/api/v3/stream/c/<collector_id>/results?access_token=<access_token>&resume_offset=<resume_token>'

How to get only certain topics from a project

The following command creates a collector that returns in real time all documents from a list of topics of a Talkwalker project.

curl -XPUT 'https://api.talkwalker.com/api/v3/stream/c/<collector_id>?access_token=<access_token>'
-d '{"collector_query" : {"project_topics" : {"project": <project_id>, "topics":["<topic_id>"]}}}'
-H "Content-Type: application/json; charset=UTF-8"

Topics that were removed from the project, but included in the request, are not taken into consideration.

How to get the IDs of Talkwalker topics?

To get a list of the topics defined in a Talkwalker project use the project_id and the access_token on the https://api.talkwalker.com/api/v2/talkwalker/p/<project_id>/resources endpoint. Optionally, the filter type can be set if we want to obtain only search-topics: type=search

curl -XGET 'https://api.talkwalker.com/api/v2/talkwalker/p/<project_id>/resources?access_token=<access_token>&type=search'

The result, using the above filter, has the form:

{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v3/talkwalker/p/<project_id>/resources?access_token=<access_token>&type=search",
"result_resources": {
"projects": [
{
"id": "<project_id>",
"title": "Air France",
"topics": [
{
"id": "2p1nevfo_121244b12ade",
"title": "Category 1",
"nodes": [
{
"id": "l9gb1vj7_9utd4cawszq7",
"title": "topic 1"
},
{
"id": "g8wf5sd4_8svs0cfghje8",
"title": "topic 2"
}
]
},
{
"id": "kj241kj4_h214jhv21l2a",
"title": "Catergory 2",
"nodes": [
{
"id": "w6fc8sf4_4fds6hdgsjd1",
"title": "topic 1"
}
]
}
]
}
]
}
}

To get results for all projects in 'search' use search as topic ID. To use a single topic, use the ID of the topic (for example w6fc8sf4_4fds6hdgsjd1 for topic 1 of category 2).

How to eliminate comments from a stream?

To remove comments and retrieve only the original documents add -is:comment to the rules of a stream.

curl -XPUT 'https://api.talkwalker.com/api/v3/stream/s/<stream_id>?access_token=<access_token>'
-d '{"rules" : [{"rule_id" : "<rule_id>", "query":"-is:comment"}]}'
-H "Content-Type: application/json; charset=UTF-8"

Another possibility consists in using -is:comment as a query parameter when reading results from a stream.

curl -XGET 'https://api.talkwalker.com/api/v3/stream/s/<stream_id>/results?access_token=<access_token>&q=-is:comment'

How to get past documents of a Talkwalker project that include special keywords

First, an empty collector is created.

curl -XPUT 'https://api.talkwalker.com/api/v3/stream/c/<collector_id>?access_token=<access_token>'
-d '{"collector_query" : {}}'
-H "Content-Type: application/json; charset=UTF-8"

This collector is set as target of a project export task.

curl -OST 'https://api.talkwalker.com/api/v3/stream/p/<project_id>/export?access_token=<access_token>'
-d '{"start":"<date>", "stop":"<date>","target":"<collector_id>", "query":"keyword-1 AND keyword-2"}'
-H "Content-Type: application/json; charset=UTF-8"

The above export task sends all those document to the collector which were published in the given timeframe and which match the query. The query in this example requires documents to include two keywords.

How to use a single stream for multiple applications / clients?

In order to use one single stream to retrieve data for more than one application / client, we set one separate rule per application.

curl -XPUT https://api.talkwalker.com/api/v3/stream/s/<stream_id>?access_token=<access_token>
-d '{"rules":[{"rule_id" : "rule-app-1", "query" : "<query>"},{"rule_id" : "rule-app-2", "query" : "<query>"}]}'

After creating this stream, we receive in real time those results which match one of both queries.

curl -XGET 'https://api.talkwalker.com/api/v3/stream/s/<stream_id>/results?access_token=<access_token>'

The returned results are in the format below. The documents can be separated using the matched field, indicating which rule the result belongs to and thus, which application is concerned.

{
"chunk_type": "CT_RESULT",
"chunk_result": {
"data": {
"data": {}, // <default result data (see simple search)>
"highlighted_data": [
{
"matched": {
"rule_id": "rule-app-1",
"stream_id": "<stream_id>"
},
"title_snippet": "<title_snippet_for_rule>",
"content_snippet": "<content_snippet_for_rule>"
}
]
}
}
}

How to get the number of results grouped by media types?

The Talkwalker API provides only documents and histograms. To group results into custom sets, you have to get all the results and then compute those sets locally. Alternatively you can perform separate searches (or histograms) for each of the groups you want to create (use the Talkwalker query syntax to restrict the results to those matching a single group).

How to use Histogram API to reproduce topic widgets

See the Guide to using Histogram API to reproduce topic widgets.

How to add and manage custom tags on the documents

See the Guide to managing custom tags by API.

How to add and manage custom metrics

See the Guide to using custom metrics by API

How to upload documents in the project

See the Guide to uploading the documents via API.

How to use image detection API

Image detection API allow you to apply Talkwalker AI on one picture. See the guide to using the Talkwalker image API