API queries using Elasticsearch

This article provides an overview of the use of Elasticsearch as part of a query within an API call.

Introduction

At Cortex we use the world-leading Elasticsearch capability to facilitate queries within API calls. This article provides an overview of its capabilities and syntax. For more detail on the depth of Elasticsearch, please refer to the Elastic website - Elasticsearch Guide.

Important note

Using Elasticsearch with wildcards and the type of complex search structures discussed below can be extremely demanding of computational resources. Elasticsearch is primarily intended for use in powering more orthodox requests, such as the search of a website or for finding a specific text string within an article.

In this respect, it is significantly more efficient to use a search limited to a specific endpoint rather than requesting a search of everything. For example, it is preferable to find articles on the basis of a category slug using the articles endpoint, like this:

GET 💻 https://article-cms.cortextech.io/v2/articles?clientId=DEMO&categorySlug=news

than it is to search for the category slug text using a generic search structure like this:

GET 💻 https://stage-article-cms.cortextech.io/v2/articles/search?clientId=DEMO&query=news

Using ElasticSearch

A query can be made against specified sort fields as part of an API call’s options, like this:

GET 💻 https://{environment-id}/v2/articles/search?clientId={clientId}&query={query_string}

In which

{environment-id} is the URL for stage (test) or production (live).

{clientId} is your client ID, assigned during onboarding.

{query_string} is the text to search for in selecting articles to fetch.

For example, this call will return all of the articles that match to the client with ID “DEMO”, containing the search string: “Matchday”:

GET 💻 https://article-cms.cortextech.io/v2/articles/search?clientId=DEMO&query=Matchday

However, because the query={query_string} parameter uses Elasticsearch, there are a range of options that can be applied. Elasticsearch uses a syntax to parse and split the provided query string based on operators, such as AND or NOT. The query then analyses each section of the search independently before returning matching documents.

You can also create a complex search that includes wildcard characters, searches across multiple fields, and more. While versatile, the query is strict and returns an error if the query string includes any invalid syntax.

The query string is parsed into a series of terms and operators.

A term can be a single word, for example quick or brown, or a phrase, surrounded by double quotes, such as "quick brown fox" which searches for all the words in the phrase, in the same order.

Operators allow you to customize the search. The key options we use are as follows.

  • status:active returns results where the status field contains active
  • title:(quick OR brown) returns results where the title field contains a string, for example: quick or brown
  • author:"John Smith" returns results where the author field contains an exact phrase, in this case "john smith"
  • book.\*:(quick OR brown) returns results where any of the fields book.title, book.content or book.date contains a string, such as: quick or brown (note how we need to escape the wildcard character * - with a backslash):
  • _exists_:title returns results where the field title has any non-null value.

Wildcards

Wildcard searches can be run on individual terms, using ? to replace a single character, and * to replace zero or more characters. For example qu?ck bro*


Be aware that wildcard queries can use a large amount of memory and poorly structured queries will perform badly; for example, think how many terms need to be queried to match the query string `"a* b* c*"`.

Pure wildcards `\*` are rewritten to the `exists` query format for efficiency. As a consequence, the wildcard `"field:*"` would match documents with an empty value like the following: `{"field": ""}` ...and would **not** match if the field is missing or set with an explicit null value like the following: `{"field": null}`