Talkwalker Search Histogram outside of any project
https://api.talkwalker.com/api/v1/search/histogram/<type>
How it works
With the Talkwalker Search Histogram API, you can retrieve the distribution of the number of search results for a given search query.
Histograms can be made for distribution over time or over specific metrics (number of comments, number of shares, reach, retweets etc.).
By setting min
and max
a histogram can be limited to a specific range (min_include
and max_include
control if those bounds are included).
This can be a time range for published
/search_indexed
or just some upper and lower cap for e.g. engagement
histograms.
interval
defines the width of the bins, the accepted values are long integers for metrics or duration values (like 7d
for 7 days) for published
and search_indexed
dates.
When using a bin size of entire days, timezone
allows to set a timezone to specify the begin and end of the days.
The 30-day-limitation on global data also holds for histograms.
Histogram types
type | Description | Representation |
---|---|---|
published | Timestamp of publication (epoch time in milliseconds) | Histogram |
search_indexed | Timestamp of indexation in Talkwalker (epoch time in milliseconds) | Histogram |
reach | The reach of an article/post represents the number of people who were reached by this article/post. | Histogram |
engagement | The engagement of an article/post is the sum of actions made by others on that article/post. | Histogram |
facebook_shares | Number of Facebook shares an article has | Histogram |
facebook_likes | Number of Facebook likes an article has | Histogram |
twitter_retweets | Number of Twitter retweets an article has | Histogram |
twitter_shares | Number of Twitter share an article has | Histogram |
twitter_likes | Number of Twitter likes an article has | Histogram |
twitter_followers | Number of Twitter followers a source has | Histogram |
twitter_impressions | Number of Twitter impressions an article has | Histogram |
twitter_video_views | Number of Twitter video views an article has | Histogram |
instagram_likes | Number of Instagram likes an article has | Histogram |
youtube_views | Number of YouTube views a video has | Histogram |
youtube_likes | Number of YouTube likes a video has | Histogram |
youtube_dislikes | Number of YouTube dislikes a video has | Histogram |
comment_count | Number of comments an article has | Histogram |
language | Number of documents written in a language | Top-N Distribution |
country | Number of documents with a source from a certain country | Top-N Distribution |
source_region | Number of documents with a source from a certain region, depends on geolocation resolution | Top-N Distribution |
source_city | Number of documents with a source from a certain city, depends on geolocation resolution | Top-N Distribution |
gender | Number of documents written by an author of a particular gender | Top-N Distribution |
age | Number of documents written by an author in a predefined age group | Distribution |
unique_author | Total number of different authors | Distribution |
hashtag | Number of documents containing a particular hashtag | Top-N Distribution |
emoji | Number of documents containing a particular emoji code | Top-N Distribution |
theme_cloud | percent of documents containing a particular word or hashtag | Top-N Distribution |
trending_top_theme | percent of documents containing a particular word or hashtag as compared to previous week | Top-N Distribution |
smart_theme | percent of documents within a particular Smart Theme group | Top-N Distribution |
interest | Number of documents within a particular interest group | Top-N Distribution |
occupation | Number of documents within a particular occupation group | Top-N Distribution |
sentiment | Number of documents with a particular sentiment | Distribution |
Parameters
parameter | description | required? | allowed values | default value |
---|---|---|---|---|
access_token | a read/write token specified in the API application | required | ||
q | The query to search for | required | Talkwalker query syntax | |
min | Minimum value for bins | optional | Long Integer value | For published : tomorrow - 8 days or max - 8 days |
max | Maximum value for bins | optional | Long Integer value | For published : tomorrow or min + 8 days |
min_include | Include min value | optional | true / false | true |
max_include | Include max value | optional | true / false | false |
interval | Bin Interval | optional | Duration for published and search_indexed / Integer for histogram / not used for distribution | dynamic |
timezone | Timezone (for interval) | optional | tz database: timezone name (e.g. Europe/Luxembourg , Australia/Perth ) | UTC |
breakdown | Nested histogram | optional | sentiment , sourcetype , country | - |
value_type | Nested metric for time based histograms | optional | metric historgram types | - |
top_n | Size limiter for demographic distribution | optional | Integer value in ]0, 100] | 10 |
percentage_relation | Specify the relation for theme clouds | optional | breakdown , query , total | breakdown |
tokenizing_mode | Tokenizing mode for theme cloud histograms | optional | normal, two_grams, three_grams, noun_phrase, verb_phrase | normal |
time_range | Time range filter in the format number + a time unit character (e.g. 30d for 30 days.) | optional | ||
smart_theme | Smart Theme | required | brands, celebrities, emotions, events | normal |
- Possible values for interval when creating a histogram over
published
orsearch_indexed
:year
,quarter
,month
,week
,day
,hour
,minute
,second
as well as numeric values with the unitsw
(week),d
(day),h
(hours),m
(minutes), ands
(seconds). (e.g.5d
for 5 days or2w
for 2 weeks). - The maximum number of histogram bins is 400, if the
min
,max
andinterval
parameters result in a larger number of bins, an error message (HTTP 400) is returned. Try reducing the range or increasing the interval. value_type
allows specifying a type for nested statistics per bin in a histogram overpublished
orsearch_indexed
.- Possible values for
time_range
as time unit characters are:s
for seconds,m
for minutes,h
for hours,d
for days,w
for weeks andM
for months. smart_theme
parameter is only required for smart_theme API endpoint.
Since some parameters are only used by certain histogram types, the following table provides an overview of all working combinations.
access_token q | min max min_include max_include interval | timezone forecast_days value_type | breakdown | top_n | percentage_relation tokenizing_mode | |
---|---|---|---|---|---|---|
published | x | x | x | x | ||
search_indexed | x | x | x | x | ||
engagement | x | x | ||||
reach | x | x | ||||
facebook_shares | x | x | ||||
facebook_likes | x | x | ||||
twitter_shares | x | x | ||||
twitter_retweets | x | x | ||||
twitter_followers | x | x | ||||
twitter_likes | x | x | ||||
twitter_impressions | x | x | ||||
twitter_video_views | x | x | ||||
youtube_likes | x | x | ||||
youtube_dislikes | x | x | ||||
youtube_views | x | x | ||||
instagram_likes | x | x | ||||
cluster_size | x | x | ||||
comment_count | x | x | ||||
sentiment | x | |||||
interest | x | x | ||||
occupation | x | x | ||||
theme_cloud | x | x | x | x | ||
trending_top_theme | x | x | x | |||
smart_theme | x | x | ||||
hashtag | x | x | ||||
emoji | x | x | ||||
unique_author | x | |||||
language | x | x | ||||
country | x | x | ||||
source_region | x | x | ||||
source_city | x | x | ||||
gender | x | x | ||||
age | x |
Credits
10 credits per call.
Histogram Examples
Get a histogram over the last 8 days of results containing the word "cats" for Australian time
Set the query to cats
. For type published
the Talkwalker Search Histogram API returns results over the last seven days by default.
curl 'https://api.talkwalker.com/api/v1/search/histogram/published?access_token=demo&timezone=Australia/Perth&q=cats&interval=day&pretty=true'
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v1/search/histogram?access_token=demo&q=cats&interval=day",
"result_histogram": {
"header": {
"v": ["Number Results"]
},
"data": [
{
"t": 1417478400000,
"v": [4366.0]
},
{
"t": 1417564800000,
"v": [3385.0]
},
{
"t": 1417651200000,
"v": [4233.0]
},
{
"t": 1417737600000,
"v": [4071.0]
},
{
"t": 1417824000000,
"v": [2571.0]
},
{
"t": 1417910400000,
"v": [2191.0]
},
{
"t": 1417996800000,
"v": [3275.0]
},
{
"t": 1418083200000,
"v": [1140.0]
}
]
}
}
t
indicates the time-based lower bound of the current bucket, while v
is the number of elements inside that bucket.
Get a histogram with a resolution of 6 hours over the last 7 days of results containing the word "cats"
Set interval
to 6h
for 4 values per day.
curl 'https://api.talkwalker.com/api/v1/search/histogram/published?access_token=demo&q=cats&interval=6h'
The interval
parameter accepts the values year
, quarter
, month
, week
, day
, hour
, minute
, second
as well as numeric values with the units w
(week), d
(day), h
(hours), m
(minutes), and s
(seconds).
Get a histogram over a specific time window
Due to the 30-day-limitation on global data, please replace the timestamps in the following example by recent values.
Set min
to 1601510400000
and max
to 1601856000000
to get a histogram of results published between 01.10.2020 and 05.01.2020 with start timestamp included and end timestamp excluded (default values).
curl 'https://api.talkwalker.com/api/v1/search/histogram/published?access_token=demo&q=cats&min=1601510400000&max=1601856000000'
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v1/search/histogram/published?access_token=demo&q=cats&min=1601510400000&max=1601856000000",
"result_histogram": {
"header": {
"v": [
"Number Results"
]
},
"data": [
{
"t": 1601510400000,
"v": [
19123.0
]
},
{
"t": 1601596800000,
"v": [
18855.0
]
},
{
"t": 1601683200000,
"v": [
20678.0
]
},
{
"t": 1601769600000,
"v": [
14820.0
]
}
]
}
}
The min
and max
parameters accept timestamps in epoch format (milliseconds after 1.1.1970 UTC).
Special attention needs to be paid when working with the timezone
parameter.
In the above example, we get one result value for each started day in the respective timezone, amounting to a total of 4 values.
We repeat the call from before, but this time, we set timezone
to Asia/Tokyo (UTC+9).
curl 'https://api.talkwalker.com/api/v1/search/histogram/published?access_token=demo&q=cats&min=1601510400000&max=1601856000000&timezone=Asia%2FTokyo'
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v1/search/histogram/published?access_token=demo&q=cats&min=1601510400000&max=1601856000000&timezone=Asia%2FTokyo",
"result_histogram": {
"header": {
"v": ["Number Results"]
},
"data": [
{
"t": 1601478000000,
"v": [10329.0]
},
{
"t": 1601564400000,
"v": [19244.0]
},
{
"t": 1601650800000,
"v": [22390.0]
},
{
"t": 1601737200000,
"v": [15045.0]
},
{
"t": 1601823600000,
"v": [6468.0]
}
]
}
}
This time, we get 5 result values.
This is due to min
and max
, which resolve to different times.
In the first example, min
and max
resolve to 01.10.2020 00:00:00 UTC and 05.10.2020 00:00:00 UTC respectively.
In this second example, they resolve to 01.10.2020 09:00:00 JST and 05.10.2020 09:00:00 JST respectively.
In other words, we get 15 instead of 24 hours worth of data for 01.10.2020 and we get 9 hours worth of data for 05.10.2020.
When changing timezone
, min
and max
need to be adjusted accordingly.
Get a histogram and statistics over engagement
For types different from published
and search_indexed
, the histogram API also returns statistics (average, minimum, maximum and sum) over every bin.
(For published
and search_indexed
, we can specify an additional metric for statistics with the value_type
parameter.)
We can use the min
and max
parameters to only consider documents whose engagement
lies within a specific range.
curl 'https://api.talkwalker.com/api/v1/search/histogram/engagement?access_token=demo&q=cats&min=1000&max=2000'
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v1/search/histogram/engagement?access_token=demo&q=cats&min=1000&max=2000",
"result_histogram": {
"data": [
{
"v": [
33.0
],
"k": 1000.0,
"val": [
{
"count": 33,
"min": 1002.0,
"max": 1097.0,
"avg": 1051.121212121212,
"sum": 34687.0
}
]
}, (...) {
"v": [
14.0
],
"k": 1900.0,
"val": [
{
"count": 14,
"min": 1906.0,
"max": 1992.0,
"avg": 1944.9285714285713,
"sum": 27229.0
}
]
}
]
}
}
k
indicates the number-based lower bound of the current bucket, while v
is the number of elements inside that bucket.
The content of val
contains additional information about the elements of that bucket.
Get a histogram with a breakdown over sentiment
For time-based histograms (type: published
or search_indexed
), it is possible to add a breakdown parameter, set to either sentiment
, sourcetype
or country
.
The header then contains the different values for the chosen breakdown type, while the data field v
lists in the same order the number of matching elements from a bucket.
curl 'https://api.talkwalker.com/api/v1/search/histogram/published?access_token=demo&q=cats&breakdown=sentiment'
{
"status_code" : "0",
"status_message" : "OK",
"request" : "GET /api/v1/search/histogram/published?access_token=demo&q=cats&breakdown=sentiment&pretty=true",
"result_histogram" : {
"header" : {
"v" : [ "POSITIVE", "NEUTRAL", "NEGATIVE" ]
},
"data" : [ {
"t" : 1577923200000,
"v" : [ 3944.0, 10488.0, 1732.0 ]
} ...
// truncated
{
"t" : 1578528000000,
"v" : [ 922.0, 2814.0, 573.0 ]
} ]
}
}
Top N distribution examples
While it is possible to specify N by using the top_n parameter, a default value of 10 is set. The output for Top N distributions is very similar to the histogram output, where the main differences are:
- The total hit number is contained in the header
- The key is stored in the
ks
field
Get the Top 3 languages
curl 'https://api.talkwalker.com/api/v1/search/histogram/language?access_token=demo&q=cats&top_n=3&pretty=true'
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v1/search/histogram/language?access_token=demo&q=cats&top_n=3&pretty=true",
"result_histogram": {
"data": [
{
"v": [550242.0],
"ks": "en"
},
{
"v": [6918.0],
"ks": "de"
},
{
"v": [1882.0],
"ks": "ja"
},
{
"v": [1711.0],
"ks": "fr"
}
],
"total_hits": 570382
},
"request_id": "#qx0h3vup7r8d#"
}
ks
serves as string-based indicator of a bucket, while v
represents the number of elements inside.
The number of total results is provided in the total_hits
.
Get the Top 3 themes
The top N themes differ from other Top N distributions in the way that they are calculated based on a sample of the total documents. Thus, the result does not contain the number of documents from the sample, which include a certain token, but the percentage.
curl 'https://api.talkwalker.com/api/v1/search/histogram/theme_cloud?access_token=demo&q=cats&top_n=3&pretty=true'
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v1/search/histogram/theme_cloud?access_token=demo&q=cats&top_n=3&pretty=true",
"result_histogram": {
"data": [
{
"v": [0.996],
"ks": "cats"
},
{
"v": [0.265],
"ks": "cat"
},
{
"v": [0.193],
"ks": "dogs"
}
],
"total_hits": 570399
},
"request_id": "#qx0h7rc0i29n#"
}
ks
serves as string-based indicator of a bucket, while v
represents the percentage of elements inside results.
The number of total results is provided in the total_hits
.
Get the Top 3 emoji
curl 'https://api.talkwalker.com/api/v1/search/histogram/emoji?access_token=demo&q=cats&top_n=3&pretty=true'
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v1/search/histogram/emoji?access_token=demo&q=cats&top_n=3&pretty=true",
"result_histogram": {
"data": [
{
"v": [5289.0],
"ks": "❤"
},
{
"v": [3349.0],
"ks": "😂"
},
{
"v": [2428.0],
"ks": "🐱"
}
],
"total_hits": 570083
},
"request_id": "#qx0br7wh57pp#"
}
ks
serves as string-based indicator of a bucket (JAVA source code encoded), while v
represents the number of elements inside.
The number of total results is provided in the total_hits
.
Get the Top 3 regions
curl 'https://api.talkwalker.com/api/v1/search/histogram/source_region?access_token=demo&q=cats&top_n=3&pretty=true'
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v1/search/histogram/source_region?access_token=demo&q=cats&top_n=3&pretty=true",
"result_histogram": {
"data": [
{
"v": [2006.0],
"key": {
"country_code": "us",
"region": "California",
"short_id": "california_us"
}
},
{
"v": [1599.0],
"key": {
"country_code": "ph",
"region": "Isabela (province)",
"short_id": "isabela_ph"
}
},
{
"v": [786.0],
"key": {
"country_code": "us",
"region": "New York",
"short_id": "newyork_us"
}
}
],
"total_hits": 17914
},
"request_id": "#qyc8lsx8lof6#"
}
region
serves as string-based indicator of a bucket, while v
represents the number of elements inside.
The number of total results is provided in the total_hits
.
Get the Top 3 cities
curl 'https://api.talkwalker.com/api/v1/search/histogram/source_city?access_token=demo&q=cats&top_n=3&pretty=true'
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v1/search/histogram/source_city?access_token=demo&q=cats&top_n=3&pretty=true",
"result_histogram": {
"data": [
{
"v": [1598.0],
"key": {
"country_code": "ph",
"region": "Isabela (province)",
"city": "Jones, Isabela",
"short_id": "jones_isabela_ph"
}
},
{
"v": [948.0],
"key": {
"country_code": "us",
"region": "California",
"city": "San Francisco",
"short_id": "sanfrancisco_california_us"
}
},
{
"v": [737.0],
"key": {
"country_code": "us",
"region": "South Dakota",
"city": "Sioux Falls, South Dakota",
"short_id": "siouxfalls_southdakota_us"
}
}
],
"total_hits": 12971
},
"request_id": "#qyc8uo6qqxiz#"
}
city
serves as string-based indicator of a bucket, while v
represents the number of elements inside.
The number of total results is provided in the total_hits
.
Distribution examples
Contrary to the Top N distribution, no parameter is required. Other than that, both share the same structure:
- The total hit number is contained in the header
- The key is stored in the
ks
field
Get the sentiment distribution
curl 'https://api.talkwalker.com/api/v1/search/histogram/sentiment?access_token=demo&q=cats&pretty=true'
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v1/search/histogram/sentiment?access_token=demo&q=cats&pretty=true",
"result_histogram": {
"data": [
{
"v": [327698.0],
"ks": "NEUTRAL"
},
{
"v": [149214.0],
"ks": "POSITIVE"
},
{
"v": [93485.0],
"ks": "NEGATIVE"
},
{
"v": [0.0],
"ks": "NONE"
}
],
"total_hits": 570397
},
"request_id": "#qx0hczxspbq8#"
}
Get the number of unique authors
While the number of unique authors is a single value, it shares the same structure than distributions, thus the categorization.
curl 'https://api.talkwalker.com/api/v1/search/histogram/unique_author?access_token=demo&q=cats&pretty=true'
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v1/search/histogram/unique_author?access_token=demo&q=cats&pretty=true",
"result_histogram": {
"data": [
{
"v": [306021.0],
"ks": "value"
}
],
"total_hits": 570392
},
"request_id": "#qx0hf66t2vwz#"
}
Rate Limit
This endpoint is limited to 60 calls per minute.
Multiple query examples
It is possible to enter multiple queries when working with histograms.
All histogram types are compatible will multiple queries.
The result structure is similar to that of a histogram with breakdown parameter: query
contains all entered queries, while v
and in some cases val
contain the respective results in the same order.
Simple example
curl 'https://api.talkwalker.com/api/v1/search/histogram/published?access_token=demo&q=cats&q=dogs'
{
"status_code" : "0",
"status_message" : "OK",
"request" : "GET /api/v1/search/histogram/published?access_token=demo&q=cats&q=dogs&pretty=true",
"result_histogram" : {
"header" : {
"v" : [ "Number Results" ]
},
"data" : [ {
"t" : 1577923200000,
"v" : [ 123.0, 456.0 ]
} ...
// truncated
{
"t" : 1578528000000,
"v" : [ 789.0, 1011.0 ]
} ],
"query" : [ "cats", "dogs" ]
}
}
Top N example
Top N histogram types like language
can also be used with multiple queries.
The result contains all top N values for the different queries, which may be more than N in total, as shown in the example below.
curl 'https://api.talkwalker.com/api/v1/search/histogram/language?access_token=demo&q=dogs&q=cats&top_n=3'
{
"status_code" : "0",
"status_message" : "OK",
"request" : "GET /api/v1/search/histogram/language?access_token=demo&q=dogs&q=cats&pretty=true",
"result_histogram" : {
"data" : [
{
v: 377.0
v: 483.0
ks: "en"
}, {
v: 215.0
v: 251.0
ks: "ja"
}, {
v: -1.0
v: 123.0
ks: "ar"
}, {
v: 87.0
v: -1.0
ks: "es"
}
],
"total_hits" : 1700
"total_query_hits" : [ 721, 1234]
"query" : [ "dogs", "cats" ]
}
}
Searching for the top 3 languages for each query, the third place is different for both, so the result contains 4 keys in total.
A -1.0
marker is placed where one query does not have a value belonging to the top N.
The results are sorted by the sum of the values, not taking the -1.0
markers into consideration.
Multiple queries + breakdown example
curl 'https://api.talkwalker.com/api/v1/search/histogram/published?access_token=demo&q=dogs&q=cats&breakdown=sentiment'
{
"status_code" : "0",
"status_message" : "OK",
"request" : "GET /api/v1/search/histogram/published?access_token=demo&q=dogs&q=cats&pretty=true",
"result_histogram" : {
"header" : {
"v" : [ "POSITIVE", "NEUTRAL", "NEGATIVE" ]
},
"data" : [ {
"t" : 1577923200000,
"v" : [ 41.0, 38.0, 44.0, 152.0, 140.0, 164.0 ]
} ...
// truncated
{
"t" : 1578528000000,
"v" : [ 200.0, 250.0, 339.0, 333.0, 350.0, 328.0 ]
} ],
"query" : [ "dogs", "cats" ]
}
}
The content of v
is ordered by query first and breakdown second.
For two queries q1
, q2
and a breakdown over sentiment, this results in the following ordering:
[ <q1, POSITIVE>, <q1, NEUTRAL>, <q1, NEGATIVE>, <q2, POSITIVE>, <q2, NEUTRAL>, <q2, NEGATIVE> ]
As an illustration, compare the values for t
= 1577923200000 in the table below with the example above.
Dogs | Cats | |
---|---|---|
Positive | 41.0 | 152.0 |
Neutral | 38.0 | 140.0 |
Negative | 44.0 | 164.0 |