Skip to main content

Talkwalker Search Histogram outside of any project

https://api.talkwalker.com/api/v1/search/histogram/<type>

How it works

With the Talkwalker Search Histogram API, you can retrieve the distribution of the number of search results for a given search query. Histograms can be made for distribution over time or over specific metrics (number of comments, number of shares, reach, retweets etc.). By setting min and max a histogram can be limited to a specific range (min_include and max_include control if those bounds are included). This can be a time range for published/search_indexed or just some upper and lower cap for e.g. engagement histograms. interval defines the width of the bins, the accepted values are long integers for metrics or duration values (like 7d for 7 days) for published and search_indexed dates. When using a bin size of entire days, timezone allows to set a timezone to specify the begin and end of the days.

note

The 30-day-limitation on global data also holds for histograms.

Histogram types

typeDescriptionRepresentation
publishedTimestamp of publication (epoch time in milliseconds)Histogram
search_indexedTimestamp of indexation in Talkwalker (epoch time in milliseconds)Histogram
reachThe reach of an article/post represents the number of people who were reached by this article/post.Histogram
engagementThe engagement of an article/post is the sum of actions made by others on that article/post.Histogram
facebook_sharesNumber of Facebook shares an article hasHistogram
facebook_likesNumber of Facebook likes an article hasHistogram
twitter_retweetsNumber of Twitter retweets an article hasHistogram
twitter_sharesNumber of Twitter share an article hasHistogram
twitter_likesNumber of Twitter likes an article hasHistogram
twitter_followersNumber of Twitter followers a source hasHistogram
twitter_impressionsNumber of Twitter impressions an article hasHistogram
twitter_video_viewsNumber of Twitter video views an article hasHistogram
instagram_likesNumber of Instagram likes an article hasHistogram
youtube_viewsNumber of YouTube views a video hasHistogram
youtube_likesNumber of YouTube likes a video hasHistogram
youtube_dislikesNumber of YouTube dislikes a video hasHistogram
comment_countNumber of comments an article hasHistogram
languageNumber of documents written in a languageTop-N Distribution
countryNumber of documents with a source from a certain countryTop-N Distribution
source_regionNumber of documents with a source from a certain region, depends on geolocation resolutionTop-N Distribution
source_cityNumber of documents with a source from a certain city, depends on geolocation resolutionTop-N Distribution
genderNumber of documents written by an author of a particular genderTop-N Distribution
ageNumber of documents written by an author in a predefined age groupDistribution
unique_authorTotal number of different authorsDistribution
hashtagNumber of documents containing a particular hashtagTop-N Distribution
emojiNumber of documents containing a particular emoji codeTop-N Distribution
theme_cloudpercent of documents containing a particular word or hashtagTop-N Distribution
interestNumber of documents within a particular interest groupTop-N Distribution
occupationNumber of documents within a particular occupation groupTop-N Distribution
sentimentNumber of documents with a particular sentimentDistribution

Parameters

parameterdescriptionrequired?allowed valuesdefault value
access_tokena read/write token specified in the API applicationrequired
qThe query to search forrequiredTalkwalker query syntax
minMinimum value for binsoptionalLong Integer valueFor published: tomorrow - 8 days or max - 8 days
maxMaximum value for binsoptionalLong Integer valueFor published: tomorrow or min + 8 days
min_includeInclude min valueoptionaltrue / falsetrue
max_includeInclude max valueoptionaltrue / falsefalse
intervalBin IntervaloptionalDuration for published and search_indexed / Integer for histogram / not used for distributiondynamic
timezoneTimezone (for interval)optionaltz database: timezone name (e.g. Europe/Luxembourg, Australia/Perth)UTC
breakdownNested histogramoptionalsentiment, sourcetype, country-
value_typeNested metric for time based histogramsoptionalmetric historgram types-
top_nSize limiter for demographic distributionoptionalInteger value in ]0, 100]10
percentage_relationSpecify the relation for theme cloudsoptionalbreakdown, query, totalbreakdown
tokenizing_modeTokenizing mode for theme cloud histogramsoptionalnormal, two_grams, three_grams, noun_phrase, verb_phrasenormal
time_rangeTime range filter in the format number + a time unit character (e.g. 30d for 30 days.)optional
  • Possible values for interval when creating a histogram over published or search_indexed: year, quarter, month, week, day, hour, minute, second as well as numeric values with the units w (week), d (day), h (hours), m (minutes), and s (seconds). (e.g. 5d for 5 days or 2w for 2 weeks).
  • The maximum number of histogram bins is 400, if the min, max and interval parameters result in a larger number of bins, an error message (HTTP 400) is returned. Try reducing the range or increasing the interval.
  • value_type allows specifying a type for nested statistics per bin in a histogram over published or search_indexed.
  • Possible values for time_range as time unit characters are: s for seconds, m for minutes, h for hours, d for days, w for weeks and M for months.

Since some parameters are only used by certain histogram types, the following table provides an overview of all working combinations.

access_token qmin max min_include max_include intervaltimezone forecast_days value_typebreakdowntop_npercentage_relation tokenizing_mode
publishedxxxx
search_indexedxxxx
engagementxx
reachxx
facebook_sharesxx
facebook_likesxx
twitter_sharesxx
twitter_retweetsxx
twitter_followersxx
twitter_likesxx
twitter_impressionsxx
twitter_video_viewsxx
youtube_likesxx
youtube_dislikesxx
youtube_viewsxx
instagram_likesxx
cluster_sizexx
comment_countxx
sentimentx
interestxx
occupationxx
theme_cloudxxxx
hashtagxx
emojixx
unique_authorx
languagexx
countryxx
source_regionxx
source_cityxx
genderxx
agex

Credits

10 credits per call.

Histogram Examples

Get a histogram over the last 8 days of results containing the word "cats" for Australian time

Set the query to cats. For type published the Talkwalker Search Histogram API returns results over the last seven days by default.

Command
curl 'https://api.talkwalker.com/api/v1/search/histogram/published?access_token=demo&timezone=Australia/Perth&q=cats&interval=day&pretty=true'
Response
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v1/search/histogram?access_token=demo&q=cats&interval=day",
"result_histogram": {
"header": {
"v": ["Number Results"]
},
"data": [
{
"t": 1417478400000,
"v": [4366.0]
},
{
"t": 1417564800000,
"v": [3385.0]
},
{
"t": 1417651200000,
"v": [4233.0]
},
{
"t": 1417737600000,
"v": [4071.0]
},
{
"t": 1417824000000,
"v": [2571.0]
},
{
"t": 1417910400000,
"v": [2191.0]
},
{
"t": 1417996800000,
"v": [3275.0]
},
{
"t": 1418083200000,
"v": [1140.0]
}
]
}
}

t indicates the time-based lower bound of the current bucket, while v is the number of elements inside that bucket.

Get a histogram with a resolution of 6 hours over the last 7 days of results containing the word "cats"

Set interval to 6h for 4 values per day.

Command
curl 'https://api.talkwalker.com/api/v1/search/histogram/published?access_token=demo&q=cats&interval=6h'

The interval parameter accepts the values year, quarter, month, week, day, hour, minute, second as well as numeric values with the units w (week), d (day), h (hours), m (minutes), and s (seconds).

Get a histogram over a specific time window

important

Due to the 30-day-limitation on global data, please replace the timestamps in the following example by recent values.

Set min to 1601510400000 and max to 1601856000000 to get a histogram of results published between 01.10.2020 and 05.01.2020 with start timestamp included and end timestamp excluded (default values).

Command
curl 'https://api.talkwalker.com/api/v1/search/histogram/published?access_token=demo&q=cats&min=1601510400000&max=1601856000000'
Response
  "status_code": "0",
"status_message": "OK",
"request": "GET /api/v1/search/histogram/published?access_token=demo&q=cats&min=1601510400000&max=1601856000000",
"result_histogram": {
"header": {
"v": [
"Number Results"
]
},
"data": [
{
"t": 1601510400000,
"v": [
19123.0
]
},
{
"t": 1601596800000,
"v": [
18855.0
]
},
{
"t": 1601683200000,
"v": [
20678.0
]
},
{
"t": 1601769600000,
"v": [
14820.0
]
}
]
}
}

The min and max parameters accept timestamps in epoch format (milliseconds after 1.1.1970 UTC).

Special attention needs to be paid when working with the timezone parameter. In the above example, we get one result value for each started day in the respective timezone, amounting to a total of 4 values.

We repeat the call from before, but this time, we set timezone to Asia/Tokyo (UTC+9).

Command
curl 'https://api.talkwalker.com/api/v1/search/histogram/published?access_token=demo&q=cats&min=1601510400000&max=1601856000000&timezone=Asia%2FTokyo'
Response
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v1/search/histogram/published?access_token=demo&q=cats&min=1601510400000&max=1601856000000&timezone=Asia%2FTokyo",
"result_histogram": {
"header": {
"v": ["Number Results"]
},
"data": [
{
"t": 1601478000000,
"v": [10329.0]
},
{
"t": 1601564400000,
"v": [19244.0]
},
{
"t": 1601650800000,
"v": [22390.0]
},
{
"t": 1601737200000,
"v": [15045.0]
},
{
"t": 1601823600000,
"v": [6468.0]
}
]
}
}

This time, we get 5 result values. This is due to min and max, which resolve to different times. In the first example, min and max resolve to 01.10.2020 00:00:00 UTC and 05.10.2020 00:00:00 UTC respectively. In this second example, they resolve to 01.10.2020 09:00:00 JST and 05.10.2020 09:00:00 JST respectively.

In other words, we get 15 instead of 24 hours worth of data for 01.10.2020 and we get 9 hours worth of data for 05.10.2020. When changing timezone, min and max need to be adjusted accordingly.

Get a histogram and statistics over engagement

For types different from published and search_indexed, the histogram API also returns statistics (average, minimum, maximum and sum) over every bin.

(For published and search_indexed, we can specify an additional metric for statistics with the value_type parameter.) We can use the min and max parameters to only consider documents whose engagement lies within a specific range.

Command
curl 'https://api.talkwalker.com/api/v1/search/histogram/engagement?access_token=demo&q=cats&min=1000&max=2000'
response
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v1/search/histogram/engagement?access_token=demo&q=cats&min=1000&max=2000",
"result_histogram": {
"data": [
{
"v": [
33.0
],
"k": 1000.0,
"val": [
{
"count": 33,
"min": 1002.0,
"max": 1097.0,
"avg": 1051.121212121212,
"sum": 34687.0
}
]
}, (...) {
"v": [
14.0
],
"k": 1900.0,
"val": [
{
"count": 14,
"min": 1906.0,
"max": 1992.0,
"avg": 1944.9285714285713,
"sum": 27229.0
}
]
}
]
}
}

k indicates the number-based lower bound of the current bucket, while v is the number of elements inside that bucket. The content of val contains additional information about the elements of that bucket.

Get a histogram with a breakdown over sentiment

For time-based histograms (type: published or search_indexed), it is possible to add a breakdown parameter, set to either sentiment, sourcetype or country. The header then contains the different values for the chosen breakdown type, while the data field v lists in the same order the number of matching elements from a bucket.

Command
curl 'https://api.talkwalker.com/api/v1/search/histogram/published?access_token=demo&q=cats&breakdown=sentiment'
response
{
"status_code" : "0",
"status_message" : "OK",
"request" : "GET /api/v1/search/histogram/published?access_token=demo&q=cats&breakdown=sentiment&pretty=true",
"result_histogram" : {
"header" : {
"v" : [ "POSITIVE", "NEUTRAL", "NEGATIVE" ]
},
"data" : [ {
"t" : 1577923200000,
"v" : [ 3944.0, 10488.0, 1732.0 ]
} ...
// truncated
{
"t" : 1578528000000,
"v" : [ 922.0, 2814.0, 573.0 ]
} ]
}
}

Top N distribution examples

While it is possible to specify N by using the top_n parameter, a default value of 10 is set. The output for Top N distributions is very similar to the histogram output, where the main differences are:

  • The total hit number is contained in the header
  • The key is stored in the ks field

Get the Top 3 languages

Command
curl 'https://api.talkwalker.com/api/v1/search/histogram/language?access_token=demo&q=cats&top_n=3&pretty=true'
Response
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v1/search/histogram/language?access_token=demo&q=cats&top_n=3&pretty=true",
"result_histogram": {
"data": [
{
"v": [550242.0],
"ks": "en"
},
{
"v": [6918.0],
"ks": "de"
},
{
"v": [1882.0],
"ks": "ja"
},
{
"v": [1711.0],
"ks": "fr"
}
],
"total_hits": 570382
},
"request_id": "#qx0h3vup7r8d#"
}

ks serves as string-based indicator of a bucket, while v represents the number of elements inside. The number of total results is provided in the total_hits.

Get the Top 3 themes

The top N themes differ from other Top N distributions in the way that they are calculated based on a sample of the total documents. Thus, the result does not contain the number of documents from the sample, which include a certain token, but the percentage.

Command
curl 'https://api.talkwalker.com/api/v1/search/histogram/theme_cloud?access_token=demo&q=cats&top_n=3&pretty=true'
response
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v1/search/histogram/theme_cloud?access_token=demo&q=cats&top_n=3&pretty=true",
"result_histogram": {
"data": [
{
"v": [0.996],
"ks": "cats"
},
{
"v": [0.265],
"ks": "cat"
},
{
"v": [0.193],
"ks": "dogs"
}
],
"total_hits": 570399
},
"request_id": "#qx0h7rc0i29n#"
}

ks serves as string-based indicator of a bucket, while v represents the percentage of elements inside results. The number of total results is provided in the total_hits.

Get the Top 3 emoji

Command
curl 'https://api.talkwalker.com/api/v1/search/histogram/emoji?access_token=demo&q=cats&top_n=3&pretty=true'
response
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v1/search/histogram/emoji?access_token=demo&q=cats&top_n=3&pretty=true",
"result_histogram": {
"data": [
{
"v": [5289.0],
"ks": "❤"
},
{
"v": [3349.0],
"ks": "😂"
},
{
"v": [2428.0],
"ks": "🐱"
}
],
"total_hits": 570083
},
"request_id": "#qx0br7wh57pp#"
}

ks serves as string-based indicator of a bucket (JAVA source code encoded), while v represents the number of elements inside. The number of total results is provided in the total_hits.

Get the Top 3 regions

Command
curl 'https://api.talkwalker.com/api/v1/search/histogram/source_region?access_token=demo&q=cats&top_n=3&pretty=true'
response
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v1/search/histogram/source_region?access_token=demo&q=cats&top_n=3&pretty=true",
"result_histogram": {
"data": [
{
"v": [2006.0],
"key": {
"country_code": "us",
"region": "California",
"short_id": "california_us"
}
},
{
"v": [1599.0],
"key": {
"country_code": "ph",
"region": "Isabela (province)",
"short_id": "isabela_ph"
}
},
{
"v": [786.0],
"key": {
"country_code": "us",
"region": "New York",
"short_id": "newyork_us"
}
}
],
"total_hits": 17914
},
"request_id": "#qyc8lsx8lof6#"
}

region serves as string-based indicator of a bucket, while v represents the number of elements inside. The number of total results is provided in the total_hits.

Get the Top 3 cities

Command
curl 'https://api.talkwalker.com/api/v1/search/histogram/source_city?access_token=demo&q=cats&top_n=3&pretty=true'
response
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v1/search/histogram/source_city?access_token=demo&q=cats&top_n=3&pretty=true",
"result_histogram": {
"data": [
{
"v": [1598.0],
"key": {
"country_code": "ph",
"region": "Isabela (province)",
"city": "Jones, Isabela",
"short_id": "jones_isabela_ph"
}
},
{
"v": [948.0],
"key": {
"country_code": "us",
"region": "California",
"city": "San Francisco",
"short_id": "sanfrancisco_california_us"
}
},
{
"v": [737.0],
"key": {
"country_code": "us",
"region": "South Dakota",
"city": "Sioux Falls, South Dakota",
"short_id": "siouxfalls_southdakota_us"
}
}
],
"total_hits": 12971
},
"request_id": "#qyc8uo6qqxiz#"
}

city serves as string-based indicator of a bucket, while v represents the number of elements inside. The number of total results is provided in the total_hits.

Distribution examples

Contrary to the Top N distribution, no parameter is required. Other than that, both share the same structure:

  • The total hit number is contained in the header
  • The key is stored in the ks field

Get the sentiment distribution

Command
curl 'https://api.talkwalker.com/api/v1/search/histogram/sentiment?access_token=demo&q=cats&pretty=true'
response
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v1/search/histogram/sentiment?access_token=demo&q=cats&pretty=true",
"result_histogram": {
"data": [
{
"v": [327698.0],
"ks": "NEUTRAL"
},
{
"v": [149214.0],
"ks": "POSITIVE"
},
{
"v": [93485.0],
"ks": "NEGATIVE"
},
{
"v": [0.0],
"ks": "NONE"
}
],
"total_hits": 570397
},
"request_id": "#qx0hczxspbq8#"
}

Get the number of unique authors

While the number of unique authors is a single value, it shares the same structure than distributions, thus the categorization.

Command
curl 'https://api.talkwalker.com/api/v1/search/histogram/unique_author?access_token=demo&q=cats&pretty=true'
response
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v1/search/histogram/unique_author?access_token=demo&q=cats&pretty=true",
"result_histogram": {
"data": [
{
"v": [306021.0],
"ks": "value"
}
],
"total_hits": 570392
},
"request_id": "#qx0hf66t2vwz#"
}

Rate Limit

This endpoint is limited to 60 calls per minute.

Multiple query examples

It is possible to enter multiple queries when working with histograms. All histogram types are compatible will multiple queries. The result structure is similar to that of a histogram with breakdown parameter: query contains all entered queries, while v and in some cases val contain the respective results in the same order.

Simple example

Command
curl 'https://api.talkwalker.com/api/v1/search/histogram/published?access_token=demo&q=cats&q=dogs'
response
{
"status_code" : "0",
"status_message" : "OK",
"request" : "GET /api/v1/search/histogram/published?access_token=demo&q=cats&q=dogs&pretty=true",
"result_histogram" : {
"header" : {
"v" : [ "Number Results" ]
},
"data" : [ {
"t" : 1577923200000,
"v" : [ 123.0, 456.0 ]
} ...
// truncated
{
"t" : 1578528000000,
"v" : [ 789.0, 1011.0 ]
} ],
"query" : [ "cats", "dogs" ]
}
}

Top N example

Top N histogram types like language can also be used with multiple queries. The result contains all top N values for the different queries, which may be more than N in total, as shown in the example below.

Command
curl 'https://api.talkwalker.com/api/v1/search/histogram/language?access_token=demo&q=dogs&q=cats&top_n=3'
response
{
"status_code" : "0",
"status_message" : "OK",
"request" : "GET /api/v1/search/histogram/language?access_token=demo&q=dogs&q=cats&pretty=true",
"result_histogram" : {
"data" : [
{
v: 377.0
v: 483.0
ks: "en"
}, {
v: 215.0
v: 251.0
ks: "ja"
}, {
v: -1.0
v: 123.0
ks: "ar"
}, {
v: 87.0
v: -1.0
ks: "es"
}
],
"total_hits" : 1700
"total_query_hits" : [ 721, 1234]
"query" : [ "dogs", "cats" ]
}
}

Searching for the top 3 languages for each query, the third place is different for both, so the result contains 4 keys in total. A -1.0 marker is placed where one query does not have a value belonging to the top N. The results are sorted by the sum of the values, not taking the -1.0 markers into consideration.

Multiple queries + breakdown example

Command
curl 'https://api.talkwalker.com/api/v1/search/histogram/published?access_token=demo&q=dogs&q=cats&breakdown=sentiment'
response
{
"status_code" : "0",
"status_message" : "OK",
"request" : "GET /api/v1/search/histogram/published?access_token=demo&q=dogs&q=cats&pretty=true",
"result_histogram" : {
"header" : {
"v" : [ "POSITIVE", "NEUTRAL", "NEGATIVE" ]
},
"data" : [ {
"t" : 1577923200000,
"v" : [ 41.0, 38.0, 44.0, 152.0, 140.0, 164.0 ]
} ...
// truncated
{
"t" : 1578528000000,
"v" : [ 200.0, 250.0, 339.0, 333.0, 350.0, 328.0 ]
} ],
"query" : [ "dogs", "cats" ]
}
}

The content of v is ordered by query first and breakdown second. For two queries q1, q2 and a breakdown over sentiment, this results in the following ordering: [ <q1, POSITIVE>, <q1, NEUTRAL>, <q1, NEGATIVE>, <q2, POSITIVE>, <q2, NEUTRAL>, <q2, NEGATIVE> ]

As an illustration, compare the values for t = 1577923200000 in the table below with the example above.

DogsCats
Positive41.0152.0
Neutral38.0140.0
Negative44.0164.0