How to monitor live data based on keywords.
Use-case description
You want to display in real time new mentions and articles retrieved by Talkwalker based on specific keywords and filter rules you defined. In this use-case, your system is waiting for articles and mentions pushed by Talkwalker so that it can be directly displayed.
In this use-case, social media monitoring is not applicable as we collect them in the context of pages monitored inside your Talkwalker project. If you want to collect these data, you need to monitor these data from your projects, jump to the section How to monitor your Talkwalker’s project.
Create/update the stream
This stream is to monitor in real-time new articles crawled by Talkwalker in the Talkwalker global DB (no social media content) based on keywords or query filters.
More about query syntax.
To create a stream you need to provide:
- A stream ID (defined by the consumer).
- One or multiple query filters (called rules in the Streaming API).
If the stream ID already exists, the request will override the rules.
curl -X PUT 'https://api.talkwalker.com/api/v3/stream/s/<stream_id>?access_token=<access_token>'
--header 'Content-Type: application/json' \
--data-raw '{
"rules": [
{
"rule_id": "rule_1",
"query": "cats AND dogs"
},
{
"rule_id": "rule_2",
"query": "electric AND lang:en AND sentiment:negative"
}
]
}'
In this sample, all new articles which match rule_1
OR rule_2
will be pushed to the stream.
Listen to the stream
General use-case
Once created, you need to listen to the stream in order to receive new articles which match your keywords.
curl -X GET 'https://api.talkwalker.com/api/v3/stream/s/<stream_id>/results?access_token=<access_token>'
{
"chunk_type": "CT_CONTROL",
"chunk_control": {
"stream": [
{
"id": "stream_1",
"status": "active"
}
],
"connection_id": "#rome42r1ymtx#"
}
}
{
"chunk_type": "CT_RESULT",
"chunk_result": {
"data": {
"data": {
"url": "https://www.bbc.co.uk/news/education-64265029",
"indexed": 1673890612733,
"search_indexed": 1673890625896,
"published": 1673888410000,
"title": "Teachers from NEU union to strike in England and Wales",
"content": "Teachers from NEU union to strike in England and Wales\n\n6 minutes ago\n\nPA Media\n\nBy Vanessa Clarke\n\nEducation reporter\n\nTeach...",
"title_snippet": "Teachers from NEU union to strike in England and Wales",
"content_snippet": "Teachers from NEU union to strike in England and Wales\n\n6 minutes ago\n\nPA Media\n\nBy Vanessa Clarke\n\nEducation...",
"root_url": "https://www.bbc.co.uk/",
"domain_url": "http://bbc.co.uk/",
"host_url": "http://www.bbc.co.uk/",
"parent_url": "https://www.bbc.co.uk/news/education-64265029",
"lang": "en",
"porn_level": 0,
"fluency_level": 100,
"DEPRECATED_spam_level": 0,
"sentiment": -5,
"source_type": ["ONLINENEWS", "ONLINENEWS_TVRADIO"],
"post_type": ["TEXT"],
"tokens_title": [
"NEU union",
"and Wales",
"Teachers",
"Teachers",
"Wales",
"Wales",
"England",
"England",
"strike",
"union"
],
"tokens_content": [
"E-ACT Parkwood Academy",
"Branwen Jeffreys Matthew",
"National Education Union",
"Vanessa Clarke Education",
"Jeffreys Matthew Tyers",
"Jeffreys Matthew",
"Parkwood Academy",
"Vanessa Clarke",
"E-ACT Parkwood",
"Clarke Education"
],
"tokens_mention": ["@BBC_HaveYourSayUpload"],
"images": [
{
"url": "https://ichef.bbci.co.uk/news/976/cpsprodpb/0DAA/production/_128289430_matthew.jpg"
},
{
"url": "https://ichef.bbci.co.uk/news/976/cpsprodpb/D66A/production/_128309845_4f4a8a862119bcf8e92eba7dbf73489d0f2723f0.jpg"
}
],
"cluster_id": "http://twitter.com/FionaSimpson21/status/1615032006005358594",
"tags_internal": ["containsImage", "hasImage", "isQuestion"],
"article_extended_attributes": {
"facebook_shares": 736,
"facebook_likes": 1745,
"twitter_shares": 610,
"num_comments": 1675,
"facebook_reactions_total": 1745
},
"source_extended_attributes": {
"alexa_pageviews": 286000000,
"alexa_unique_visitors": 69417480,
"semrush_pageviews": 635249915,
"semrush_unique_visitors": 125264954
},
"extra_author_attributes": {
"id": "ex:www.bbc.co.uk-147942660",
"name": "vanessa clarke education reporter",
"gender": "FEMALE"
},
"extra_source_attributes": {
"world_data": {
"continent": "Europe",
"country": "United Kingdom",
"region": "Greater London",
"city": "London",
"longitude": -0.1153564453125,
"latitude": 51.50115966796875,
"country_code": "uk",
"resolution": "COUNTRY"
},
"id": "ex:www.bbc.co.uk",
"name": "www.bbc.co.uk"
},
"engagement": 4766,
"reach": 125264954,
"entity_url": [
{
"url": "https://www.bbc.co.uk/news/education-64238689"
},
{
"url": "http://www.bbc.co.uk/usingthebbc/privacy-policy/"
},
{
"url": "https://www.gov.uk/government/publications/handling-strike-action-in-schools",
"resolved_url": "https://www.gov.uk/government/publications/handling-strike-action-in-schools"
},
{
"url": "http://www.bbc.co.uk/usingthebbc/terms/"
},
{
"url": "https://ichef.bbci.co.uk/news/976/cpsprodpb/D66A/production/_128309845_4f4a8a862119bcf8e92eba7dbf73489d0f2723f0.jpg"
},
{
"url": "https://api.whatsapp.com/send?phone=+447756165803"
},
{
"url": "https://www.bbc.co.uk/news/uk-northern-ireland-63291611"
},
{
"url": "http://twitter.com/BBC_HaveYourSay"
},
{
"url": "https://www.bbc.co.uk/news/uk-scotland-64209973"
},
{
"url": "https://ichef.bbci.co.uk/news/976/cpsprodpb/0DAA/production/_128289430_matthew.jpg"
},
{
"url": "https://www.bbc.co.uk/send/u16904890?ptrt=https://www.bbc.co.uk/news/10725415"
}
],
"matched": {
"appearance": "UPDATED"
},
"word_count": 697,
"trending_score": 4
},
"highlighted_data": [
{
"title_snippet": "Teachers from NEU union to strike in England and Wales",
"content_snippet": "...work not worrying about bills, gas, [or] <b>electric</b> but we can come with our sole focus - of improving students'...",
"matched": {
"rule_id": "rule_2",
"stream_id": "stream_1"
}
}
]
}
}
}
{
"chunk_type": "CT_RESULT",
"chunk_result": {
"data": {
"data": {
"url": "https://theathletic.com/4099180/2023/01/17/tom-brady-bucs-cowboys-recap/",
"indexed": 1673942271920,
"search_indexed": 1673942281434,
"published": 1673939645000,
"title": "Jones: Tom Brady didn’t get his storybook ending, and the next chapter is a mystery",
"content": "Monday’s 31-14 throttling at the hands of the Dallas Cowboys in an NFC wild-card playoff game was not what Brady envisioned when he ended his 40-day retirement late last winter for one more stab at glory with the Tampa B...",
"title_snippet": "Jones: Tom Brady didn’t get his storybook ending, and the next chapter is a mystery",
"content_snippet": "Monday’s 31-14 throttling at the hands of the Dallas Cowboys in an NFC wild-card playoff game was not what Brady envisioned when he ended his 40-day retirement late last winter for one more stab at glory with the Tampa Bay Buccaneers.\n\nBut storybook...",
"root_url": "https://theathletic.com/",
"domain_url": "http://theathletic.com/",
"host_url": "http://theathletic.com/",
"parent_url": "https://theathletic.com/4099180/2023/01/17/tom-brady-bucs-cowboys-recap/",
"lang": "en",
"porn_level": 0,
"fluency_level": 100,
"DEPRECATED_spam_level": 0,
"sentiment": -5,
"source_type": [
"ONLINENEWS",
"ONLINENEWS_OTHER"
],
"post_type": [
"TEXT"
],
"tokens_title": [
"Tom Brady didn",
"Tom Brady",
"Tom Brady",
"next chapter",
"his storybook",
"Brady didn",
"Jones",
"Jones",
"Brady",
"Brady"
],
"tokens_content": [
"Raymond James Stadium",
"New York Jets",
"Tampa Bay Buccaneers",
"Public Service Announcement",
"Tampa Bay",
"Tampa Bay",
"Tampa Bay",
"Dan Quinn",
"Tom Brady",
"Tom Terrific"
],
"tokens_mention": [
"@NFL",
"@NFLResearch",
"@SportsCenter"
],
"images": [
{
"url": "https://cdn.theathletic.com/app/uploads/2023/01/17011150/GettyImages-1456930345-1024x683.jpg"
}
],
"tags_internal": [
"containsImage",
"hasImage",
"isQuestion"
],
"source_extended_attributes": {
"alexa_pageviews": 13900000,
"alexa_unique_visitors": 6587678,
"semrush_pageviews": 31925992,
"semrush_unique_visitors": 18330606
},
"extra_author_attributes": {
"id": "ex:theathletic.com-1508819341",
"name": "mike jones",
"gender": "MALE"
},
"extra_source_attributes": {
"world_data": {
"continent": "North America",
"country": "United States",
"region": "Washington, D.C.",
"city": "Washington, D.C.",
"longitude": -77.0086669921875,
"latitude": 38.89984130859375,
"country_code": "us",
"resolution": "COUNTRY"
},
"id": "ex:theathletic.com",
"name": "theathletic.com"
},
"engagement": 0,
"reach": 18330606,
"entity_url": [
{
"url": "https://theathletic.com/nfl/player/mike-evans-zEpIAizdGk76yimT/"
},
{
"url": "https://twitter.com/SportsCenter/status/1615206213276389376?ref_src=twsrc%5Etfw"
},
{
"url": "https://twitter.com/NFL/status/1615208038721363970?ref_src=twsrc%5Etfw"
},
{
"url": "https://theathletic.com/nfl/team/raiders/"
},
{
"url": "pic.twitter.com/pSqCW5xzUD"
},
{
"url": "https://theathletic.com/nfl/team/buccaneers/"
},
{
"url": "https://theathletic.com/4086768/2023/01/16/cowboys-buccaneers-nfl-wild-card-streaming-takeaways/"
},
{
"url": "https://theathletic.com/nfl/"
},
{
"url": "https://t.co/pSqCW5xzUD"
},
{
"url": "https://theathletic.com/nfl/player/julio-jones-kakQorhjd9fNgZTu/"
},
{
"url": "https://theathletic.com/nfl/team/cowboys/"
},
{
"url": "https://theathletic.com/nfl/player/dak-prescott-1NrLfCowWevnKYEe/"
},
{
"url": "https://t.co/gv5PmG7iCO"
},
{
"url": "https://theathletic.com/nfl/team/jets/"
},
{
"url": "https://theathletic.com/nfl/team/patriots/"
},
{
"url": "https://theathletic.com/nfl/player/cameron-brate-jZpNv26wSUZ165Ct/"
},
{
"url": "https://theathletic.com/3998150/2022/12/16/tom-brady-2023-free-agency/"
},
{
"url": "https://twitter.com/NFLResearch/status/1615171025477701632?ref_src=twsrc%5Etfw"
},
{
"url": "pic.twitter.com/gv5PmG7iCO"
}
],
"matched": {
"appearance": "UPDATED"
},
"word_count": 1545
},
"highlighted_data": [
{
"title_snippet": "Jones: Tom Brady didn’t get his storybook ending, and the next chapter is a mystery",
"content_snippet": "...better part of the last two decades), the atmosphere at Raymond James Stadium was <b>electric</b>.\n\n“Allow me to reintroduce myself,” Jay-Z boomed over the sound system, and Brady jogged along the home sideline and into the North end zone, where he punched...",
"matched": {
"rule_id": "rule_2",
"stream_id": "stream_1"
}
}
]
}
}
}
```json
{
"chunk_type": "CT_CONTROL",
"chunk_control": {
"stream": [
{
"id": "stream_1",
"status": "active"
}
],
"connection_id": "#rome42r1ymtx#"
}
}
If you mentioned multiple rules in your stream definition, the matching rule will be returned in the object highlighted_data
and the searched keywords will be surrounded by the HTML tag <b>
:
"highlighted_data": [
{
"title_snippet": "Teachers from NEU union to strike in England and Wales",
"content_snippet": "...work not worrying about bills, gas, [or] <b>electric</b> but we can come with our sole focus - of improving students'...",
"matched": {
"rule_id": "rule_2",
"stream_id": "stream_1"
}
}
]
The snippet can be used for direct display in your system.
When you launch this query, the connection will remain opened. So if you configure a timeout, you may receive an error message from your infrastructure even if the stream works properly.
###Listen to a stream based on additional keywords or query filters. You can add an additional set of keywords or filter rules to an existing stream by adding an additional “q” parameter to your call:
curl -X GET 'https://api.talkwalker.com/api/v3/stream/s/<stream_id>/results?access_token=<access_token>&q=<rule_3>'
In this sample, all new articles which match (rule_1 AND rule_3) OR (rule_2 AND rule_3)
will be pushed to the stream.
In the result, the additional q parameter will not be part of the object highlighted_data
but the corresponding keywords will be surrounded by the HTML tag <b>
:
Control your credits or test a stream
If you want to limit the number of results in the transaction:
- To control the number of credits consumed by your stream.
- To test the outcome of your stream during your tests.
You can add an additional parameter max_hits
to control the number of results.
curl -X GET 'https://api.talkwalker.com/api/v3/stream/s/<stream_id>/results?max_hits=<number of results>&access_token=<access_token>'
Once you reach the number of results, the transactions will end. New articles or mentonned will not be pushed anymore to the stream.
If I want to avoid any loss of data
Persisted data
If it’s more important to not lose any article or mention you monitor, then you can send them into a collector (it will act like a queue) and listen to the collector like a stream.
The main difference is that once the collector is created and active, all new articles and mentions will be pushed to the collector (and consume credits) even if you are not listening to the collector. The data will remain accessible in the collector for 7 days.
Create/update a collector on a stream
You can specify multiple streams which will be pushed to the collector.
You can add additional keywords to monitor in the collector.
curl -X PUT 'https://api.talkwalker.com/api/v3/stream/c/<collector_id>?access_token=<access_token>' \
--header 'Content-Type: application/json' \
--data-raw '{
"collector_query": {
"queries": [
{
"id": "rule_1",
"query": "bitcoin"
}
],
"streams": [
"stream_1"
]
}
}'
In this sample, all documents which match the stream_1 AND the collector rule_1
will be pushed in the collector.
if the collector ID already exists, the request will override the rules.
Read the collector.
Please refer to the section How to read the collector.