API Documentation

Our REST API is a package of artificial intelligence and blockchain-powered solutions for analyzing and extracting various kinds of information from unstructured text data, videos and images.

This documentation allows you to start working with the API and provides you information about the API methods and options.

Endpoint

The main endpoint for all API calls:

https://www.summarizebot.com/api/

API Key

To use our API you will need an API key. Please, register to get your personal API key for 14 days trial period.

You should add your API key as a parameter for every request sent to our API:

[main endpoint]/[method]? apiKey=[api key]

Get Started

Once you have your personal API key, you can use the API in the following way:

Select the API method you are interested in from this documentation

Send HTTP GET or POST requests to the main endpoint, e.g. for a document summarization call the full URL would be: https://www.summarizebot.com/api/summarize?[options]

Also you can test-drive our API methods by importing the Postman Collection below. This is a quick and easy way to become more familiar with the SummarizeBot API and how it works

Usage Examples

URLs Processing You can use the following Python code to process weblinks: import requests # API URL # You can change 'summarize' to different endpoints: sentiment, keywords, etc. api_url = "https://www.summarizebot.com/api/summarize?apiKey=YOUR_API_KEY&size=20&keywords=10&fragments=15&url=URL_FOR_PROCESSING" r = requests.get(api_url) json_res = r.json() print json_res cURL request: curl -X GET "https://www.summarizebot.com/api/summarize?apiKey=YOUR_API_KEY&size=20&keywords=10&fragments=15&url=URL_FOR_PROCESSING" Files Processing To process files you can use our POST API endpoints. POST body should be specified as 'application/octet-stream' and include file content in binary form. In Python you can use the following code: import requests # Read binary data from the file with open('test.txt', mode='rb') as file: post_body = file.read() # API URL # You can change 'summarize' to different endpoints: sentiment, keywords, etc. api_url = "https://www.summarizebot.com/api/summarize?apiKey=your_API_key&size=20&keywords=10&fragments=15&filename=test.txt" # HTTP header header = {'Content-Type': "application/octet-stream"} r = requests.post(api_url, headers = header, data = post_body) json_res = r.json() print json_res cURL request: curl -H "Content-Type:application/octet-stream" --data-binary @test.txt https://www.summarizebot.com/api/summarize?apiKey=your_API_key&size=20&keywords=10&fragments=15&filename=test.txt Plain Text Processing To process text strings you need to represent them as binary data (bytes) and send bytes as POST body in POST requests. In Python you can use the following code: import requests # Text for processing in UTF-8 encoding text_for_processing = u"Planet has only until 2030 to stem catastrophic climate change, experts warn." # Create bytes representation of the text post_body = bytes(text_for_processing.encode('utf-8')) # API URL # You can change 'summarize' to different endpoints: sentiment, keywords, etc. api_url = "https://www.summarizebot.com/api/summarize?apiKey=your_API_key&size=20&keywords=10&fragments=15&filename=1.txt" # HTTP header header = {'Content-Type': "application/octet-stream"} r = requests.post(api_url, headers = header, data = post_body) json_res = r.json() print json_res cURL request: curl -H "Content-Type:application/octet-stream" --data "Planet has only until 2030 to stem catastrophic climate change, experts warn." https://www.summarizebot.com/api/summarize?apiKey=your_API_key&size=20&keywords=10&fragments=15&filename=1.txt

Error Codes

The API methods may return the following errors:

400 - bad request

401 - API key is invalid or expired

402 - maximum file size limit is exceeded

403 - http header isn't specified as 'application/octet-stream'

404 - http header isn't specified as 'application/json'

429 - too many requests (rate limit exceeds)

500 - internal server error

Language Support Document summarization and keywords extraction features are available for almost every language including English, Chinese, Russian, Japanese, Arabic, German, Spanish, French, Portuguese, etc. Please see full list here. Sentiment analysis method supports English, French, German, Italian, Portuguese, Spanish and Russian languages. Named entity recognition method supports major European and Asian languages including English, French, German, Italian, Portuguese, Spanish, Russian, Japanese, etc. Fake news detection method supports English language only. For audio recognition the API supports the following languages: English, Russian, Chinese, French, German, Italian, Spanish, Japanese, Swedish, Finnish, Arabic. For text extraction from images our API supports the following languages: English, Latvian, French, German, Russian, Italian, Dutch, Spanish, Portuguese, Swedish, Finnish. File Formats The text analysis API methods support most of the text, image and audio formats: .html, .pdf, .doc, .docx, .csv, .eml, .epub, .gif, .jpg, .jpeg, .mp3, .msg, .odt, .ogg, .png, .pptx, .ps, .rtf, .tiff, .tif, .txt, .wav, .xlsx, .xls, .psv, .tsv, .tff, .aif, .aiff, .avr, .cdr, .wv, .au, .flac, .snd, .vox. The article extraction and language detection methods can only process text files and scanned documents (e.g. PDF files with images). The video identification and comments extraction features deal only with hypertext files (.html, .xml, etc.).

Summarization

The summarization method automatically extracts the most important information, keywords and keyphrases from weblinks, documents, audio files and images. With the help of summarization API you can create general or topic-oriented summaries for different domains. Just add 'domain' option with specific parameter in your request and the output summary will consist of the sentences, which are mostly relevant to a given domain.

Supported Domains Summarization API supports the following domains: accounting , agriculture , art , automotive , beauty , business , construction , culture , demographics , economics , education , electronics , energy , environment , european_union , finance , fisheries , foods , forestry , gardening , geography , healthcare , human_resources , industries , insurance , intellectual_property , international_organizations , international_relations , investments , it , legal , literature , management , marketing , parliament , pets , politics , production , religion , science , social_issues , sports , taxes , technology , trade , transportation_and_cargo , travel , weather .

Caution The language of text documents will be detected automatically. For audio files and images it should be specified for each request. If the value for language is undefined, then the default language for audio and image processing will be set to English .

Create a summary from weblinks GET /summarize Summarize file from a given url. Example URI GET https://www.summarizebot.com/api /summarize URI Parameters Hide Show apiKey string (required) API Key size integer (optional, default = 16) Summary length as percentage of original document url string (required) Article or web page url keywords integer (optional, default = 10) Maximum count of keywords to return fragments integer (optional, default = 15) Maximum count of key fragments to return domain string (optional) Domain identifier for topic-oriented summarization language string (optional for text files, required for audio files and images) A language of text files will be detected automatically. For audio files it should be specified from the list of supported languages, e.g. language=German. isocr boolean (optional, default = false) use optical character recognition for PDF documents processing (documents with images). If isocr is set to true, the document language should be specified from the list of supported languages, e.g. language=English (see the Language Support section for more details). Response 200 Hide Show Headers Content-Type : application/json Schema [ { "summary" : [ { "id" : 0, "weight" : 2.43, "sentence" : "Artificial intelligence (AI, also machine intelligence, MI) is intelligence displayed by machines, in contrast with the natural intelligence (NI) displayed by humans and other animals." }, { "id" : 1, "weight" : 2.04, "sentence" : "AI research is defined as the study of \\"intelligent agents\\": any device that perceives its environment and takes actions that maximize its chance of success at some goal." } ] }, { "keywords" : [ { "keyword" : "artificial intelligence", "weight" : 0.87, "ids" : [ 1, 6 ] }, { "keyword" : "machines", "weight" : 0.71, "ids" : [ 0, 4 ] } ] }, { "fragments" : [ { "fragment" : "optical character recognition", "ids" : [ 5 ], "weight" : 0.15 } ] } ] Create a summary from binary data POST /summarize Summarize file from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'. Example URI POST https://www.summarizebot.com/api /summarize URI Parameters Hide Show apiKey string (required) API Key size integer (optional, default = 16) Summary length as percentage of original document filename string (required) Name of the file, e.g. filename=1.pdf keywords integer (optional, default = 10) Maximum count of keywords to return fragments integer (optional, default = 15) Maximum count of key fragments to return domain string (optional) Domain identifier for topic-oriented summarization language string (optional for text files, required for audio files and images) A language of text files will be detected automatically. For audio files it should be specified from the list of supported languages, e.g. language=German. isocr boolean (optional, default = false) use optical character recognition for PDF documents processing (documents with images). If isocr is set to true, the document language should be specified from the list of supported languages, e.g. language=English (see the Language Support section for more details). Response 200 Hide Show Headers Content-Type : application/json Schema [ { "summary" : [ { "id" : 0, "weight" : 2.43, "sentence" : "Artificial intelligence (AI, also machine intelligence, MI) is intelligence displayed by machines, in contrast with the natural intelligence (NI) displayed by humans and other animals." }, { "id" : 1, "weight" : 2.04, "sentence" : "AI research is defined as the study of \\"intelligent agents\\": any device that perceives its environment and takes actions that maximize its chance of success at some goal." } ] }, { "keywords" : [ { "keyword" : "artificial intelligence", "weight" : 0.87, "ids" : [ 1, 6 ] }, { "keyword" : "machines", "weight" : 0.71, "ids" : [ 0, 4 ] } ] }, { "fragments" : [ { "fragment" : "optical character recognition", "ids" : [ 5 ], "weight" : 0.15 } ] } ]

Sentiment Analysis

The sentiment analysis method analyzes text to return the sentiment as positive, negative or neutral. Additionally it provides an overall score of the aggregate sentiment for the entire text and a list of aspects that are mentioned in a document (negative or positive words and phrases).

Sentiment analysis API identifies user sentiment not only on document-level, but also detects sentence-level and object-level sentiment. With the help of sentiment analysis API you can correctly detect concrete sentiment objects and opinion phrases and understand the meaning of user reviews.

Caution The sentiment analysis method is available for English, French, German, Italian, Portuguese, Spanish and Russian languages.

Analyze sentiment from weblinks GET /sentiment Analyze text for positive or negative sentiment from a given url. Example URI GET https://www.summarizebot.com/api /sentiment URI Parameters Hide Show apiKey string (required) API Key url string (required) Article or web page url language string (optional) Document language in the ISO 639-1 format. If the value for language is undefined the document language will be detected automatically Response 200 Hide Show Headers Content-Type : application/json Schema [ { "document sentiment": { "polarity": "negative", "weight": -1.99 } }, { "sentiment aspects": [ { "features": [ { "polarity": "negative", "weight": -0.5, "sentiment object": { "start offset": 0, "object": "The burger", "end offset": 10 }, "end offset": 28, "start offset": 15, "phrase": "uncooked , raw" }, { "polarity": "negative", "phrase": "left", "end offset": 38, "weight": -0.56, "start offset": 34 }, { "polarity": "negative", "weight": -0.64, "sentiment object": { "start offset": 76, "object": "person", "end offset": 82 }, "end offset": 75, "start offset": 71, "phrase": "poor" }, { "polarity": "negative", "phrase": "be severely poisoned", "end offset": 114, "weight": -0.5, "start offset": 94 } ], "sentence": "The burger was uncooked, raw, but left out in the sun waiting for some poor person to eat and be severely poisoned." } ] } ] Analyze sentiment from binary data POST /sentiment Analyze text for positive or negative sentiment from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'. Example URI POST https://www.summarizebot.com/api /sentiment URI Parameters Hide Show apiKey string (required) API Key filename string (required) Name of the file, e.g. filename=1.html language string (optional) Document language in the ISO 639-1 format. If the value for language is undefined the document language will be detected automatically Response 200 Hide Show Headers Content-Type : application/json Schema [ { "document sentiment": { "polarity": "negative", "weight": -1.99 } }, { "sentiment aspects": [ { "features": [ { "polarity": "negative", "weight": -0.5, "sentiment object": { "start offset": 0, "object": "The burger", "end offset": 10 }, "end offset": 28, "start offset": 15, "phrase": "uncooked , raw" }, { "polarity": "negative", "phrase": "left", "end offset": 38, "weight": -0.56, "start offset": 34 }, { "polarity": "negative", "weight": -0.64, "sentiment object": { "start offset": 76, "object": "person", "end offset": 82 }, "end offset": 75, "start offset": 71, "phrase": "poor" }, { "polarity": "negative", "phrase": "be severely poisoned", "end offset": 114, "weight": -0.5, "start offset": 94 } ], "sentence": "The burger was uncooked, raw, but left out in the sun waiting for some poor person to eat and be severely poisoned." } ] } ]

News Aggregation

The news aggregation method returns news headlines and searches for articles from over 50,000 sources. Retrieval results include details like main image of the news article, article title and direct url, publication date, and relevancy score to search request.

News API endpoints support 100+ languages, that are specified in the ISO 639-1 format.

Thousands of news sources has been indexed and analyzed by our custom artificial intelligence modules to give the perfect search accuracy in natural language mode.

Return latest news for a specific language GET /news Return live and top news for different languages. Example URI GET https://www.summarizebot.com/api /news URI Parameters Hide Show apiKey string (required) API Key language string (optional, default=en) Language code in the ISO 639-1 format count integer (optional, default=10, maximum value=50) Maximum count of news to return Response 200 Hide Show Headers Content-Type : application/json Schema { "results": [ { "url": "https://www.theaustralian.com.au/sport/cricket/jaques-was-last-man-standing-but-a-nsw-pedigree-hard-to-go-past/news-story/86a3ed596aa5766bfb562f912dfa227e", "publication_date": "2018-05-29 14:05:46", "image_url": "https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcR-v0h1BL_w2ILuDVC07L926nHGIIxb8bGWYdwZAh8K6UJsu-DqnTJ7b9Z1cFLZRQqWHjGPXrNInQ", "language": "en", "title": "Jaques was last man standing but a NSW pedigree hard to go past" }, { "url": "https://gulfnews.com/sport/uae/football/own-goal-sinks-defending-champions-al-taher-1.2228752", "publication_date": "2018-05-29 14:00:50", "image_url": "https://static.gulfnews.com/polopoly_fs/1.2228830!/image/4040701382.jpg_gen/derivatives/box_460346/4040701382.jpg", "language": "en", "title": "Own goal sinks defending champions Al Taher" }, { "url": "https://www.forbes.com/sites/robinandrews/2018/05/29/this-is-why-han-solo-may-owe-his-life-to-a-polish-donut/", "publication_date": "2018-05-29 14:00:00", "image_url": "https://blogs-images.forbes.com/robinandrews/files/2018/05/PIA22085large-1200x675.jpg?width=0&height=600", "language": "en", "title": "This Is Why Han Solo May Owe His Life To A Polish Donut" } ] } Search news articles based on a specific query for different languages POST /news Returns a list of news articles relevant to the query. POST body should include the query in the JSON format, e.g. { "query" : "Donald Trump"}. The HTTP header should be specified as 'application/json'. Example URI POST https://www.summarizebot.com/api /news URI Parameters Hide Show apiKey string (required) API Key language string (optional, default=en) Language code in the ISO 639-1 format count integer (optional, default=10, maximum value=50) Maximum count of news to return Response 200 Hide Show Headers Content-Type : application/json Schema { "results": [ { "language": "en", "title": "Diplomatic duels: What now for the Donald Trump-Kim Jong Un summit?", "url": "https://economictimes.indiatimes.com/news/defence/diplomatic-duels-what-now-for-the-dinald-trump-kim-jong-un-summit/articleshow/64351498.cms", "score": 13.17083740234375, "image_url": "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSv0-NPFkf98_pIa9-1aUMeCksBDD7GdPrN4RdWziokhu1kb1yk7EmtyRlozeQgOMT6bqRIq7yr_0U", "publication_date": "2018-05-28 06:59:00" }, { "language": "en", "title": "US Team In North Korea For Summit Talks, Says Donald Trump", "url": "https://www.ndtv.com/world-news/us-team-in-north-korea-for-summit-talks-says-donald-trump-1858532", "score": 12.415493965148926, "image_url": "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTDk0Idr_6tGHP5Ur7U1ZXZxICebGR0K-2kTcWtgJ589b_hLb1BvBIV7dJCbw_wLbgp8oXbyXUPUhU", "publication_date": "2018-05-28 05:17:44" }, { "language": "en", "title": "Donald Trump Jr is in high political demand – for now", "url": "https://www.theguardian.com/us-news/2018/may/28/donald-trump-jr-high-demand-conservative-groups-wary", "score": 12.384574890136719, "image_url": "https://i.guim.co.uk/img/media/5394c2707b62a7a882047907cf3beab4a5e3d2a5/0_126_4200_2519/master/4200.jpg?w=140&q=55&auto=format&usm=12&fit=max&s=1697b507f5ae8b8f7eda9e3c91929d69", "publication_date": "2018-05-28 05:00:45" } ] }

Fake News Detection

The fake news detection method analyzes news articles to identify whether they are likely to be real news or not. With the help of custom AI classifiers, it can detect different types of fake information, such as propaganda, conspiracy, pseudoscience, bias, irony.

News analysis algorithm uses a wide range of components in order to successfully solve the fake news detection problem: custom machine learning models trained on fake and biased articles, proprietary multi-language summarization technology to extract only important information and remove information noise, historical news data search to check the story relevancy and misleading facts, database of trusted and biased websites created by our experts.

Detect fake news from weblinks GET /checkfake Analyze news content and detect fake news from a given url. Example URI GET https://www.summarizebot.com/api /checkfake URI Parameters Hide Show apiKey string (required) API Key url string (required) Article or web page url Response 200 Hide Show Headers Content-Type : application/json Schema { "predictions": [ { "confidence": 0.36, "type": "real" }, { "confidence": 0.64, "type": "fake", "categories": [ { "confidence": 0.2, "type": "bias" }, { "confidence": 0.1, "type": "conspiracy" }, { "confidence": 0, "type": "propaganda" }, { "confidence": 0.6, "type": "pseudoscience" }, { "confidence": 0.1, "type": "irony" } ] } ] }

Linguistic Processor

Linguistic processor is the custom natural language processing solution for deep linguistic analysis of unstructured data that supports 39+ languages covering all European, major Asian and Arabic languages. It automatically detects tokens and sentences, identifies parts of speech tags (PoS), lemmas, noun phrases, and extracts semantic relations for each sentence.

Linguistic analysis API performs the following steps of text analysis:

- sentence and word segmentation stage transforms a text to a list of sentences and words with punctuation marks;

- lemmatization stage for canonization of words to their initial forms;

- part-of-speech (POS) tagger annotates each word with a unique part-of-speech tag using Penn Treebank tagset. The part of speech tagger is based on state-of-the-art machine learning algorithms and provides high level of accuracy for different languages;

- chunker transforms the input sequence of tagged words to high-level word structures such as noun phases, verb phrases, etc.;

- semantic relations extraction that automatically extracts semantic relations between detected word chunks such as subject-verb(action)-object relations.

Extract linguistic analysis results from weblinks GET /syntax Extract linguistic analysis results from a given url. Example URI GET https://www.summarizebot.com/api /syntax URI Parameters Hide Show apiKey string (required) API Key url string (required) Article or web page url language string (optional) Document language in the ISO 639-1 format. If the value for language is undefined the document language will be detected automatically Response 200 Hide Show Headers Content-Type : application/json Schema [ { "tokens": [ { "lemma": "culture", "tag": "NNP", "word": "Culture", "end offset": 7, "start offset": 0 }, { "lemma": "minister", "tag": "NNP", "word": "Minister", "end offset": 16, "start offset": 8 }, { "lemma": "alberto", "tag": "NNP", "word": "Alberto", "end offset": 24, "start offset": 17 }, { "lemma": "bonisoli", "tag": "NNP", "word": "Bonisoli", "end offset": 33, "start offset": 25 }, { "lemma": "describe", "tag": "VBD", "word": "described", "end offset": 43, "start offset": 34 }, { "lemma": "the", "tag": "DT", "word": "the", "end offset": 47, "start offset": 44 }, { "lemma": "finding", "tag": "NN", "word": "finding", "end offset": 55, "start offset": 48 }, { "lemma": "as", "tag": "IN", "word": "as", "end offset": 58, "start offset": 56 }, { "lemma": "a", "tag": "DT", "word": "a", "end offset": 60, "start offset": 59 }, { "lemma": "discovery", "tag": "NN", "word": "discovery", "end offset": 70, "start offset": 61 }, { "lemma": "that", "tag": "WDT", "word": "that", "end offset": 75, "start offset": 71 }, { "lemma": "fill", "tag": "VBZ", "word": "fills", "end offset": 81, "start offset": 76 }, { "lemma": "him", "tag": "PRP", "word": "him", "end offset": 85, "start offset": 82 }, { "lemma": "with", "tag": "IN", "word": "with", "end offset": 90, "start offset": 86 }, { "lemma": "pride", "tag": "NN", "word": "pride", "end offset": 96, "start offset": 91 }, { "lemma": ".", "tag": ".", "word": ".", "end offset": 97, "start offset": 96 } ], "chunks": [ { "chunk": "Culture Minister Alberto Bonisoli", "start index": 0, "type": "NP", "chunk head": "Bonisoli", "end index": 4 }, { "chunk": "described", "start index": 4, "type": "VP", "chunk head": "described", "end index": 5 }, { "chunk": "the finding", "start index": 5, "type": "NP", "chunk head": "finding", "end index": 7 }, { "chunk": "a discovery", "start index": 8, "type": "NP", "chunk head": "discovery", "end index": 10 }, { "chunk": "fills", "start index": 11, "type": "VP", "chunk head": "fills", "end index": 12 }, { "chunk": "him", "start index": 12, "type": "NP", "chunk head": "him", "end index": 13 }, { "chunk": "pride", "start index": 14, "type": "NP", "chunk head": "pride", "end index": 15 } ], "relations": [ { "verb": { "phrase": "described", "start index": 4, "end index": 5 }, "object": { "phrase": "the finding", "start index": 5, "end index": 7 }, "subject": { "phrase": "Culture Minister Alberto Bonisoli", "start index": 0, "end index": 4 } } ], "sentence": "Culture Minister Alberto Bonisoli described the finding as a discovery that fills him with pride." } ] Extract linguistic analysis results from binary data POST /syntax Extract linguistic analysis results from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'. Example URI POST https://www.summarizebot.com/api /syntax URI Parameters Hide Show apiKey string (required) API Key filename string (required) Name of the file, e.g. filename=1.pdf language string (optional) Document language in the ISO 639-1 format. If the value for language is undefined the document language will be detected automatically Response 200 Hide Show Headers Content-Type : application/json Schema [ { "tokens": [ { "lemma": "culture", "tag": "NNP", "word": "Culture", "end offset": 7, "start offset": 0 }, { "lemma": "minister", "tag": "NNP", "word": "Minister", "end offset": 16, "start offset": 8 }, { "lemma": "alberto", "tag": "NNP", "word": "Alberto", "end offset": 24, "start offset": 17 }, { "lemma": "bonisoli", "tag": "NNP", "word": "Bonisoli", "end offset": 33, "start offset": 25 }, { "lemma": "describe", "tag": "VBD", "word": "described", "end offset": 43, "start offset": 34 }, { "lemma": "the", "tag": "DT", "word": "the", "end offset": 47, "start offset": 44 }, { "lemma": "finding", "tag": "NN", "word": "finding", "end offset": 55, "start offset": 48 }, { "lemma": "as", "tag": "IN", "word": "as", "end offset": 58, "start offset": 56 }, { "lemma": "a", "tag": "DT", "word": "a", "end offset": 60, "start offset": 59 }, { "lemma": "discovery", "tag": "NN", "word": "discovery", "end offset": 70, "start offset": 61 }, { "lemma": "that", "tag": "WDT", "word": "that", "end offset": 75, "start offset": 71 }, { "lemma": "fill", "tag": "VBZ", "word": "fills", "end offset": 81, "start offset": 76 }, { "lemma": "him", "tag": "PRP", "word": "him", "end offset": 85, "start offset": 82 }, { "lemma": "with", "tag": "IN", "word": "with", "end offset": 90, "start offset": 86 }, { "lemma": "pride", "tag": "NN", "word": "pride", "end offset": 96, "start offset": 91 }, { "lemma": ".", "tag": ".", "word": ".", "end offset": 97, "start offset": 96 } ], "chunks": [ { "chunk": "Culture Minister Alberto Bonisoli", "start index": 0, "type": "NP", "chunk head": "Bonisoli", "end index": 4 }, { "chunk": "described", "start index": 4, "type": "VP", "chunk head": "described", "end index": 5 }, { "chunk": "the finding", "start index": 5, "type": "NP", "chunk head": "finding", "end index": 7 }, { "chunk": "a discovery", "start index": 8, "type": "NP", "chunk head": "discovery", "end index": 10 }, { "chunk": "fills", "start index": 11, "type": "VP", "chunk head": "fills", "end index": 12 }, { "chunk": "him", "start index": 12, "type": "NP", "chunk head": "him", "end index": 13 }, { "chunk": "pride", "start index": 14, "type": "NP", "chunk head": "pride", "end index": 15 } ], "relations": [ { "verb": { "phrase": "described", "start index": 4, "end index": 5 }, "object": { "phrase": "the finding", "start index": 5, "end index": 7 }, "subject": { "phrase": "Culture Minister Alberto Bonisoli", "start index": 0, "end index": 4 } } ], "sentence": "Culture Minister Alberto Bonisoli described the finding as a discovery that fills him with pride." } ]

Intent Analysis

The intent analysis method automatically classifies search keywords according to the user search intention. The method identifies the following search intent categories: Transactional, Commercial (Opinion/Quality), Commercial (Comparison), Commercial (Reviews/Complain), Informational, Navigational.

Most available solutions on the market identify only three categories of search intents: Transactional, Navigational and Informational. But marketers need more details to make the right decisions. That's why our intent analysis API can identify six categories instead of three categories.

We’ve implemented custom artificial intelligence classifier which is based on semantic features extracted from search keywords. Unlike other competitors we've trained the feature-based classification model based on the output from our multilingual Linguistic Processor. Using the output of semantic oriented Linguistic Processor as input for machine learning algorithms helped us to significantly increase the intent classification accuracy. Our feature-based AI classifier uses a wide range of linguistic (part of speech tags, lemmas), syntactical (lexical chunks), semantic (action-verb relations) and expert features (intent important keywords and patterns: products, brand names, action words, sentiment/opinion keywords, specific keyword structures, etc.). It supports 35+ languages and has intent classification accuracy from 80% to 96% depending on the language.

Detect search intent of a keyword POST /intents Detect intent of search keywords. POST body should include JSON data in the following format: { "keywords" : [ { "keyword" : "search keyword", "id" : "1" }, { "keyword" : "search keyword", "id" : "2" }, ... ]}, where "keyword" - keyword text, "id" - unique identifier. Maximum keywords count per one request is 100. The HTTP header should be specified as 'application/json'. Example URI POST https://www.summarizebot.com/api /intents URI Parameters Hide Show apiKey string (required) API Key language string (optional) Keywords language in the ISO 639-1 format. If the value for language is undefined the language of each keyword will be detected automatically with the help of our short text language identification method Response 200 Hide Show Headers Content-Type : application/json Schema { "keywords": [ { "category" : "Transactional", "confidence" : 0.75, "keyword" : "online shopping clothes pakistan", "id" : "1", "language" : "en" }, { "category" : "Commercial>Comparison", "confidence" : 0.9, "keyword" : "what is the best shampoo", "id" : "2", "language" : "en" } ] }

Named Entity Recognition

The named entity extraction method automatically detects persons, companies, locations, organizations, adresses, phone numbers, emails, currencies, credit card numbers and other various type of entities in any type of text. It supports URL (GET) and files (POST) processing endpoints.

Our named entity detection algorithm combines deep neural network models with linguistic rules optimized for identification of entities in documents. It supports 20+ different languages and covers major European and Asian languages.

Extract named entities from weblinks GET /entities Extract named entities from a given url. Example URI GET https://www.summarizebot.com/api /entities URI Parameters Hide Show apiKey string (required) API Key url string (required) Article or web page url language string (optional) Document language in the ISO 639-1 format. If the value for language is undefined the document language will be detected automatically Response 200 Hide Show Headers Content-Type : application/json Schema { "entities": { "persons" : [ { "entity" : "Adam Mount", "offsets" : [ { "start" : 55, "end": 65 }, { "start" : 223, "end" : 226 } ] }, { "entity" : "Trump", "offsets" : [ { "start" : 0, "end": 5 } { "start" : 1445, "end" : 1450 } { "start" : 2658, "end" : 2663 } ] } ], "locations" : [ { "entity" : "Seoul", "offsets" : [ { "start" : 3942, "end" : 3947 }, { "start" : 4144, "end" : 4149 } ] } ] } } Extract named entities from binary data POST /entities Extract named entities from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'. Example URI POST https://www.summarizebot.com/api /entities URI Parameters Hide Show apiKey string (required) API Key filename string (required) Name of the file, e.g. filename=1.pdf language string (optional) Document language in the ISO 639-1 format. If the value for language is undefined the document language will be detected automatically Response 200 Hide Show Headers Content-Type : application/json Schema { "entities": { "persons" : [ { "entity" : "Adam Mount", "offsets" : [ { "start" : 55, "end": 65 }, { "start" : 223, "end" : 226 } ] }, { "entity" : "Trump", "offsets" : [ { "start" : 0, "end": 5 } { "start" : 1445, "end" : 1450 } { "start" : 2658, "end" : 2663 } ] } ], "locations" : [ { "entity" : "Seoul", "offsets" : [ { "start" : 3942, "end" : 3947 }, { "start" : 4144, "end" : 4149 } ] } ] } }

Keywords Extraction

The keywords extraction method automatically extracts the most important keywords from weblinks, documents, audio files and images.

Caution The language of text documents will be detect automatically. For audio files and images it should be specified for each request. If the value for language is undefined, then the default language for audio and image processing will be set to English .

Extract keywords from weblinks GET /keywords Extract keywords from a given url. Example URI GET https://www.summarizebot.com/api /keywords URI Parameters Hide Show apiKey string (required) API Key url string (required) Article or web page url keywords integer (optional, default = 10) Maximum count of keywords to return language string (optional for text files, required for audio files and images) A language of text files will be detected automatically. For audio files it should be specified from the list of supported languages, e.g. language=German. Response 200 Hide Show Headers Content-Type : application/json Schema { "keywords" : [ { "keyword" : "artificial intelligence", "weight" : 0.87, "ids" : [ 1, 6 ] }, { "keyword" : "machines", "weight" : 0.71, "ids" : [ 0, 4 ] } ] } Extract keywords from binary data POST /keywords Extract keywords from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'. Example URI POST https://www.summarizebot.com/api /keywords URI Parameters Hide Show apiKey string (required) API Key filename string (required) Name of the file, e.g. filename=1.pdf keywords integer (optional, default = 10) Maximum count of keywords to return language string (optional for text files, required for audio files and images) A language of text files will be detected automatically. For audio files it should be specified from the list of supported languages, e.g. language=German. Response 200 Hide Show Headers Content-Type : application/json Schema { "keywords" : [ { "keyword" : "artificial intelligence", "weight" : 0.87, "ids" : [ 1, 6 ] }, { "keyword" : "machines", "weight" : 0.71, "ids" : [ 0, 4 ] } ] }

Article Extraction

The article extraction method is used to extract clean article text from a file that you provide to the API. For hypertext documents it also identifies different metadata such as title, main article image, publish date, author, meta description, etc.

Caution The article extraction method can handle only text files and scanned documents (e.g. PDF files with images).

Extract plain article text and metadata from weblinks GET /extract Extract article text and metadata from a given url. Example URI GET https://www.summarizebot.com/api /extract URI Parameters Hide Show apiKey string (required) API Key url string (required) Article or web page url language string (optional for text files, required for scanned documents) For scanned documents (e.g. PDF files with images) it should be specified from the list of supported languages, e.g. language=German. isocr boolean (optional, default = false) use optical character recognition for PDF documents processing. If isocr is set to true, the document language should be specified from the list of supported languages, e.g. language=English (see the Language Support section for more details). Response 200 Hide Show Headers Content-Type : application/json Schema { "text": "(CNN) Night has fallen on a toe-numbing English winter's day. In a manor house, where spirits of aristocrats are rumored to roam ancient hallways, are some of England's finest young athletes.



In a dimly lit, oak-paneled room at Bisham Abbey, 30 miles west of London, these 18 to twentysomethings have gathered for another chapter in their learning.



A grand-looking Victorian lady, framed in gold, peers down on the assembled players and coaches. On these same dark walls hang the works of Raphael.", "article title": "How to build a rugby player -- Inside England's Under-20s camp", "meta information": { "meta description": "England's Under-20s give CNN Sport exclusive access as they prepare for the Under-20 Six Nations, a championship they have won six times in 10 years.", "publish date": "2018-02-03T10:28:00Z", "image": "https://cdn.cnn.com/cnnnext/dam/assets/ 180129105453-owen-farrell-super-tease.jpg", "authors": [ "Aimee Lewis" ], "meta keywords": "sport, Six Nations 2018, training camp" } } Extract plain article text and metadata from binary data POST /extract Extract article text and metadata from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'. Example URI POST https://www.summarizebot.com/api /extract URI Parameters Hide Show apiKey string (required) API Key filename string (required) Name of the file, e.g. filename=1.html language string (optional for text files, required for scanned documents) For scanned documents (e.g. PDF files with images) it should be specified from the list of supported languages, e.g. language=German. isocr boolean (optional, default = false) use optical character recognition for PDF documents processing. If isocr is set to true, the document language should be specified from the list of supported languages, e.g. language=English (see the Language Support section for more details). Response 200 Hide Show Headers Content-Type : application/json Schema { "text": "(CNN) Night has fallen on a toe-numbing English winter's day. In a manor house, where spirits of aristocrats are rumored to roam ancient hallways, are some of England's finest young athletes.



In a dimly lit, oak-paneled room at Bisham Abbey, 30 miles west of London, these 18 to twentysomethings have gathered for another chapter in their learning.



A grand-looking Victorian lady, framed in gold, peers down on the assembled players and coaches. On these same dark walls hang the works of Raphael.", "article title": "How to build a rugby player -- Inside England's Under-20s camp", "meta information": { "meta description": "England's Under-20s give CNN Sport exclusive access as they prepare for the Under-20 Six Nations, a championship they have won six times in 10 years.", "publish date": "2018-02-03T10:28:00Z", "image": "https://cdn.cnn.com/cnnnext/dam/assets/ 180129105453-owen-farrell-super-tease.jpg", "authors": [ "Aimee Lewis" ], "meta keywords": "sport, Six Nations 2018, training camp" } }

Short Text Language Detection

The short text language detection method analyzes a short piece of text (search keywords, user messages, tweets, etc.) and accurately recognizes the language of the small text. The method returns the language code conform to ISO 639-1 identifiers.

Most of language detection solutions work well on fulltext documents, but lack on short texts, especially on search keywords, tweets, user messages in chats, etc. Short texts are too short to extract their N-gram features properly, they use «unnatural» language, have misspellings and often contain words written in multiply languages.

For short text language identification we’ve implemented optimized version of support-vector machines classifier (SVM). Our classification algorithm takes into account a lot of specific features of short texts, supports 70+ different languages and has language detection accuracy on small messages from 91% to 98% depending on the language.

Detect language of a short text POST /shortlang Detect language of short texts. POST body should include JSON data in the following format: { "documents" : [ { "text" : "short message text", "id" : "1" }, { "text" : "short message text", "id" : "2" }, ... ]}, where "text" - short text, "id" - unique identifier. Maximum short texts count per one request is 100. The HTTP header should be specified as 'application/json'. Example URI POST https://www.summarizebot.com/api /shortlang URI Parameters Hide Show apiKey string (required) API Key Response 200 Hide Show Headers Content-Type : application/json Schema { "documents": [ { "text" : "subaru xv prix", "id" : "1", "language" : "fr" }, { "text" : "vendita appartamento lago maggiore", "id" : "2", "language" : "it" } ] }

Language Detection

The language detection method analyzes a fulltext document that you provide and recognizes the language of the text. The method returns the language code conform to ISO 639-1 identifiers.

Detect language of a text from weblinks GET /language Detect text language from a given url. Example URI GET https://www.summarizebot.com/api /language URI Parameters Hide Show apiKey string (required) API Key url string (required) Article or web page url Response 200 Hide Show Headers Content-Type : application/json Schema { "language": "en" } Detect language of a text from binary data POST /language Detect text language from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'. Example URI POST https://www.summarizebot.com/api /language URI Parameters Hide Show apiKey string (required) API Key filename string (required) Name of the file, e.g. filename=1.html Response 200 Hide Show Headers Content-Type : application/json Schema { "language": "en" }

Face Detection

The face detection method analyzes an image file to find faces. The method returns a list of items, each of which contains the coordinates of a face that was detected in the file.

Caution The face detection method processes only image files (.jpeg, .png, etc.).

Detect faces from image weblinks GET /faces Detect faces from a given image url. Example URI GET https://www.summarizebot.com/api /faces URI Parameters Hide Show apiKey string (required) API Key url string (required) Image url Response 200 Hide Show Headers Content-Type : application/json Schema { "faces": [ { "y": "371", "x": "370", "height": "137", "width": "137" }, { "y": "190", "x": "474", "height": "149", "width": "149" }, { "y": "210", "x": "598", "height": "155", "width": "155" }, { "y": "399", "x": "706", "height": "146", "width": "146" } ] } Detect faces from image binary data POST /faces Detect faces from image binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'. Example URI POST https://www.summarizebot.com/api /faces URI Parameters Hide Show apiKey string (required) API Key Response 200 Hide Show Headers Content-Type : application/json Schema { "faces": [ { "y": "371", "x": "370", "height": "137", "width": "137" }, { "y": "190", "x": "474", "height": "149", "width": "149" }, { "y": "210", "x": "598", "height": "155", "width": "155" }, { "y": "399", "x": "706", "height": "146", "width": "146" } ] }

Image Recognition

The image recognition method classifies the contents of an entire image into thousands of categories (e.g., "basketball", "lion", "shark"). It returns a list of tags (labels) for an image along with a confidence score which indicates how confident the system is about the assignment.

Recognize an image content from weblinks GET /images Image recognition from a given url. Example URI GET https://www.summarizebot.com/api /images URI Parameters Hide Show apiKey string (required) API Key url string (required) Image url tags string (optional, default = 5) Maximum count of image tags to return Response 200 Hide Show Headers Content-Type : application/json Schema { "tags": [ { "confidence": 0.9, "name": "great white shark, white shark" }, { "confidence": 0.05, "name": "tiger shark" }, { "confidence": 0.03, "name": "killer whale" } ] } Recognize an image content from binary data POST /images Image recognition from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'. Example URI POST https://www.summarizebot.com/api /images URI Parameters Hide Show apiKey string (required) API Key filename string (required) Name of the file, e.g. filename=1.jpg tags string (optional, default = 5) Maximum count of image tags to return Response 200 Hide Show Headers Content-Type : application/json Schema { "tags": [ { "confidence": 0.9, "name": "great white shark, white shark" }, { "confidence": 0.05, "name": "tiger shark" }, { "confidence": 0.03, "name": "killer whale" } ] }

Stemming

The stemming method automatically reduces inflected words to their base or root form and removes stop words from text documents. It supports English, French, German, Spanish, Italian, Russian, Swedish, Danish, Finnish, Dutch, Hungarian, Norwegian, Portuguese and Romanian languages.

Stem and remove stop words from text data POST /stem Stem and remove stop words from text data. POST body should include the text in the JSON format, e.g. { "text" : "document text"}. The HTTP header should be specified as 'application/json'. The language of text documents will be detected automatically. Example URI POST https://www.summarizebot.com/api /stem URI Parameters Hide Show apiKey string (required) API Key Response 200 Hide Show Headers Content-Type : application/json Schema { "stemmed": "lawyer post video sign languag danger ponzi scheme post went viral hundr deaf peopl got touch legal troubl fraud domest violenc uncov huge communiti need help tang shuai simpli tri improv legal knowledg among deaf communiti post video china wechat messag app februari instant hit mr tang flood mani friend request ask wechat boost friend limit 5,000 10,000 strike chord answer goe way beyond legal difficulti complex world sign languag china", "language": "en" }

Comments Extraction

The comments extraction method automatically structures and extracts reviews and comments from web pages.

Extract comments from weblinks GET /comments Extract comments from a given url. Example URI GET https://www.summarizebot.com/api /comments URI Parameters Hide Show apiKey string (required) API Key url string (required) Article or web page url Response 200 Hide Show Headers Content-Type : application/json Schema { "comments": [ "Well, the Hotel is very central, perfect for shopping, sightseeing or nightlife. Friendly welcome on arrival, a complimentary birthday drink brought to us in the comfy lounge area.", "Would definately stay at this hotel again and recommended this to others.", "Cleanliness of bedrooms is always very high. Complimentary breakfast is a welcome feature.", "Nice room on the second floor at the far end of the hall. Very quiet room. Comfortable bed, nice shower with hot, hot water.", "the amazing breakfast! I cannot find a fault with 5his new hotel it competes and is better than most high end expensive hotels in the city!" ] } Extract comments from binary data POST /comments Extract comments from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'. Example URI POST https://www.summarizebot.com/api /comments URI Parameters Hide Show apiKey string (required) API Key filename string (required) Name of the file, e.g. filename=1.html Response 200 Hide Show Headers Content-Type : application/json Schema { "comments": [ "Well, the Hotel is very central, perfect for shopping, sightseeing or nightlife. Friendly welcome on arrival, a complimentary birthday drink brought to us in the comfy lounge area.", "Would definately stay at this hotel again and recommended this to others.", "Cleanliness of bedrooms is always very high. Complimentary breakfast is a welcome feature.", "Nice room on the second floor at the far end of the hall. Very quiet room. Comfortable bed, nice shower with hot, hot water.", "the amazing breakfast! I cannot find a fault with 5his new hotel it competes and is better than most high end expensive hotels in the city!" ] }

Video Identification

The video identification method automatically extracts detailed video information from hypertext pages: direct video url, video provider, video width and height.

Caution The video identification method processes only hypertext files (.html, .xml, etc.).