Fields
field_name | datatype | deprecated³ | accepted dataformat | required | writable | default | Example |
---|---|---|---|---|---|---|---|
url | string | no | url¹ | yes | yes | - | "url": "http://www.example.com/example. html"` |
published | long | no | timestamp in ms | yes | yes | - | "published": 1392821902000 |
title | string | no | <500 chars | no | yes | - | "title": "Lorem ipsum dolor" |
content | string | no | <50,000 chars | yes | yes | - | "content": "Lorem ipsum dolor sit amet, consectetur adipiscing elit" |
indexed | long | no | - | no | no | - | "indexed": 1392821902000 |
search_indexed | long | no | - | no | no | - | "search_indexed": 1392821902000 |
title_snippet | string | no | - | no | no | - | "title_snippet": "Lorem ipsum dolor" |
content_snippet | string | no | - | no | no | - | "content_snippet": "Lorem ipsum dolor sit amet, consectetur adipiscing elit" |
root_url | string | no | - | no | no | extracted from url | "root_url": "https://www.example.com/" |
domain_url | string | no | - | no | no | extracted from url | "domain_url": "http://example.com/" |
host_url | string | no | - | no | no | extracted from url | "host_url": "http://www.example.com/" |
parent_url | string | no | - | no | yes | - | "parent_url": "https://www.example.com/example.html" |
lang | string | no | 2 char iso | no | yes | detected from content | "lang": "de" |
porn_level | integer | no | 0..100 | no | yes | - | "porn_level": 25 |
fluency_level | integer | yes | 0..100 | no | yes | - | "fluency_level": 42 |
spam_level | integer | yes | 0..100 | no | yes | - | "spam_level": 31 |
noise_level | integer | no | 0..100 | no | yes | - | "noise_level": 77 |
noise_category | string | no | see list² | no | yes | - | "noise_category": "promotions" |
sentiment | integer | no | -5..5 | no | yes | 0 | "sentiment": -2 |
reach | integer | no | >0 | no | yes | - | "reach": 31415926 |
engagement | integer | no | >0 | no | yes | - | "engagement": 271828 |
rating | integer | no | 0..10 | no | yes | - | "rating": 7 |
fakenews_level | integer | no | 0..100 | no | yes | - | "fakenews_level": 77 |
provider | string | no | a-z0-9_ <100 chars | no | yes | - | "provider": "company X" |
source_type | list of string | no | see list² | no | yes | "OTHER" | "source_type": "ONLINENEWS_NEWSPAPER" |
post_type | list of string | no | see list² | no | yes | "TEXT" | "post_type": "TEXT" |
cluster_id | string | no | - | no | no | - | - |
meta_cluster_id | string | no | - | no | no | - | - |
tags_internal | list of string | no | - | no | no | - | "tags_internal": ["hasComment", "hasImage"] |
tags_marking | list of string | no | see list² | no | yes | - | "tags_marking": ["important", "read"] |
tags_customer | list of string | no | see⁴ | no | yes | - | "tags_customer": ["tag1", "tag2"] |
tags_plugin | list of string | no | see⁴ | no | yes | - | "tags_plugin": ["tag1", "tag2"] |
matched_query | string | no | no | no | - | ||
matched_profile | string | no | no | no | - | ||
images | list of image | no | image object | no | see below | - | |
videos | list of video | no | video object | no | see below | - | |
article_extended_attributes | article_extended_attributes | no | article_extended_attributes object | no | see below | - | |
source_extended_attributes | source_extended_attributes | no | source_extended_attributes object | no | see below | - | |
extra_article_attributes | extra_article_attributes | no | extra_article_attributes object | no | see below | - | |
extra_author_attributes | extra_author_attributes | no | extra_author_attributes object | no | see below | - | |
extra_source_attributes | extra_source_attributes | no | extra_source_attributes object | no | see below | - | |
customer_entities | list of customer_entity | no | customer_entities object | no | see below | - | |
entity_url | list of entities | no | no | no | - | "entity_url": [{"url": "pic.twitter.com/ex1"}, {"url": "https://twitter.com/ex2"}] | |
word_count | integer | no | >0 | no | no | - | "word_count": 664 |
copyright | string | no | no | yes | - | "copyright": "Copyright 2019, example.com, All Rights Reserved." |
See the chapter on Protocols, Encodings and Value Field Options for possible values for the fields sourcetype
, lang
, or geo
.
¹ Cannot be changed after creating a new document.
² See list of value options.
³ Deprecated fields values are not used anymore by the backend. These fields can be removed in a future release.
⁴ tags_customer
: a-zA-Z0-9-_
or space, supports hierarchy using /
, can only be set in project specific documents, not in general document import. tags_plugin
: have to be in the form <vendor_id>_<vendor_field>:<value>
Evolution and stability of document fields
The structure of the documents will not be changed. Existing fields will not be removed and their formatting will not be changed. Occasionally, new fields will be added to the documents and the order of fields can change, please take this into account when implementing a custom client.