5.4 – search

The search API retrieves articles (titles, teaser, URLs, etc …) for complex queries. We follow the Apache SOLR standards.

Request

Method	URL
GET	idata/search/

Parameters

Params		Values
app	String	Values: • finsents [default]
q	String	Supports words as well as queries like ( “word1” OR “word2” ) AND ( “word3” NOT “word4” )
count	Number	• Number of results returns • Default value is 50.
offset	Number	• Result offset • Default value 0
date_max	String	The maximum date. Format is YYYY-MM-DD
date_min	String	The minimum date. Format is YYYY-MM-DD
last_update	Number	• 0 means, there is no valid last updated data. • Non zero means current updated data is newer then last updated data
Region	String	[case insensitive] Values: • Asia • Europe • Africa • North America • South America • Oceana Default all included
category	String	[case insensitive] Values: • Politics • Economics • Health • Legal • Security • Sports • Technology Default all included

Supported SOLR Query Parameters

Supported Query Operators

Boolean Operator	Alternative Symbol	Description
AND	`&&`	Requires both terms on either side of the Boolean operator to be present for a match.
NOT	`!`	Requires that the following term not be present.
OR	`\|\|`	Requires that either term (or both terms) be present for a match.
	`+`	Requires that the following term be present.
	`-`	Prohibits the following term (that is, matches on fields or documents that do not include that term). The `-` operator is functionally similar to the Boolean operator `!`. Because it’s used by popular search engines such as Google, it may be more familiar to some user communities.

Boolean operators allow terms to be combined through logic operators. Lucene supports AND, “+”, OR, NOT and “-” as Boolean operators.

Wildcard Searches

Wildcard Search Type	Special Character	Example
Single character (matches a single character)	?	The search string `te?t` would match both test and text.
Multiple characters (matches zero or more sequential characters)	*	The wildcard search: `tes` would match test, testing, and tester. You can also use wildcard characters in the middle of a term. For example: `tet` would match test and text. `*est` would match pest and test.

Fuzzy Searches

Solr’s standard query parser supports fuzzy searches based on the Damerau-Levenshtein Distance or Edit Distance algorithm. Fuzzy searches discover terms that are similar to a specified term without necessarily being an exact match. To perform a fuzzy search, use the tilde ~ symbol at the end of a single-word term. For example, to search for a term similar in spelling to “roam,” use the fuzzy search:

roam~

This search will match terms like roams, foam, & foams. It will also match the word “roam” itself.

An optional distance parameter specifies the maximum number of edits allowed, between 0 and 2, defaulting to 2. For example:

roam~1

This will match terms like roams & foam – but not foams since it has an edit distance of “2”.

Proximity Searches

A proximity search looks for terms that are within a specific distance from one another.

To perform a proximity search, add the tilde character ~ and a numeric value to the end of a search phrase. For example, to search for a “apache” and “jakarta” within 10 words of each other in a document, use the search:

"jakarta apache"~10

The distance referred to here is the number of term movements needed to match the specified phrase. In the example above, if “apache” and “jakarta” were 10 spaces apart in a field, but “apache” appeared before “jakarta”, more than 10 term movements would be required to move the terms together and position “apache” to the right of “jakarta” with a space in between.

Range Searches

A range search specifies a range of values for a field (a range with an upper bound and a lower bound). The query matches documents whose values for the specified field or fields fall within the range. Range queries can be inclusive or exclusive of the upper and lower bounds. Sorting is done lexicographically, except on numeric fields. For example, the range query below matches all documents whose popularity field has a value between 52 and 10,000, inclusive.

popularity:[52 TO 10000]

Range queries are not limited to date fields or even numerical fields. You could also use range queries with non-date fields:

title:{Aida TO Carmen}

This will find all documents whose titles are between Aida and Carmen, but not including Aida and Carmen.

The brackets around a query determine its inclusiveness.

Square brackets [ & ] denote an inclusive range query that matches values including the upper and lower bound.
Curly brackets { & } denote an exclusive range query that matches values between the upper and lower bounds, but excluding the upper and lower bounds themselves.
You can mix these types so one end of the range is inclusive and the other is exclusive. Here’s an example: count:{1 TO 10]

Boosting a Term with ^

Lucene/Solr provides the relevance level of matching documents based on the terms found. To boost a term use the caret symbol ^ with a boost factor (a number) at the end of the term you are searching. The higher the boost factor, the more relevant the term will be.

Boosting allows you to control the relevance of a document by boosting its term. For example, if you are searching for

“jakarta apache” and you want the term “jakarta” to be more relevant, you can boost it by adding the ^ symbol along with the boost factor immediately after the term. For example, you could type:

jakarta^4 apache

This will make documents with the term jakarta appear more relevant. You can also boost Phrase Terms as in the example:

"jakarta apache"^4 "Apache Lucene"

By default, the boost factor is 1. Although the boost factor must be positive, it can be less than 1 (for example, it could be 0.2).

Constant Score with `^=`

Constant score queries are created with <query_clause>^=<score>, which sets the entire clause to the specified score for any documents matching that clause. This is desirable when you only care about matches for a particular clause and don’t want other relevancy factors such as term frequency (the number of times the term appears in the field) or inverse document frequency (a measure across the whole index for how rare a term is in a field).

Example:

(description:blue OR color:blue)^=1.0 text:shoes

Response

Status	Response
200	Success

{
     "last_update" : <integer timestamp> // value ‘0’ will return current data
     "data" : 
          {
               "total" : <total number of results>,
               "offset" : <offset of the current result set, in the total results>,
               "count : <number of entries in the current result set> 
               "results" : [
                    { 
                         "title" : <title>,
                         "source_type" : <source_type>,
                         "ticker" : <ticker>,
                         "bloomberg_tickers" : <bloomberg_tickers>,
                         "category" : <category>,
                         "region" : <region>,
                         "entities" : <entities>,
                         "cid" : <cid>,
                         "image_name" : <image_name>,
                         "url" : <url>,
                         "date" : <date string in format dd:mm:yyyy hh:mm>,
                         "teaser" : <teaser text> ,
                         "domain" : domain_name 
                    },
                    { … }
               ]
          }
}