You are currently viewing How To Query Elasticsearch Using Lucene Query

How To Query Elasticsearch Using Lucene Query

How To Query Elasticsearch Using Lucene Query

Hello Everyone

Welcome to CloudAffaire and this is Debjeet.

In this series, we will explore one of the most popular log management tools in DevOps better known as ELK (E=Elasticserach, L=Logstash, K=Kibana) stack.

What Is Lucene Query?

Apache Lucene is a free and open-source search engine software library, originally written completely in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License. While suitable for any application that requires full text indexing and searching capability, Lucene is recognized for its utility in the implementation of Internet search engines and local, single-site searching. Lucene includes a feature to perform a fuzzy search based on edit distance. Lucene has also been used to implement recommendation systems. Elasticsearch is a search engine based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents.

Elasticsearch Lucene Query Syntax:

Field name:

You can specify fields to search in the query syntax.

Wildcards:

Wildcard searches can be run on individual terms, using ? to replace a single character, and * to replace zero or more characters.

Regular expressions:

Regular expression patterns can be embedded in the query string by wrapping them in forward-slashes (“/”). Elasticsearch uses Apache Lucene’s regular expression engine to parse these queries.

Reserved characters:

Lucene’s regular expression engine supports all Unicode characters. However, the following characters are reserved as operators:

. ? + * | { } [ ] ( ) ” \

Depending on the optional operators enabled, the following characters may also be reserved:

# @ & < > ~

Standard operators:

Lucene’s regular expression engine does not use the Perl Compatible Regular Expressions (PCRE) library, but it does support the following standard operators.

  1. . Matches any character. For example: ab. # matches ‘aba’, ‘abb’, ‘abz’, etc.
  2. ? Repeat the preceding character zero or one times. For example: abc? # matches ‘ab’ and ‘abc’
  3. + Repeat the preceding character one or more times. For example: ab+ # matches ‘abb’, ‘abbb’, ‘abbbb’, etc.
  4. Repeat the preceding character zero or more times. For example: ab* # matches ‘ab’, ‘abb’, ‘abbb’, ‘abbbb’, etc.
  5. {} Minimum and maximum number of times the preceding character can repeat. For example: a{2} # matches ‘aa’, a{2,4} # matches ‘aa’, ‘aaa’, and ‘aaaa’
  6. | OR operator. The match will succeed if the longest pattern on either the left side OR the right side matches. For example: abc|xyz # matches ‘abc’ and ‘xyz’
  7. ( … ) Forms a group. You can use a group to treat part of the expression as a single character. For example: abc(def)? # matches ‘abc’ and ‘abcdef’ but not ‘abcd’
  8. [ … ] Match one of the characters in the brackets. For example: [abc] # matches ‘a’, ‘b’, ‘c’, [a-c] # matches ‘a’, ‘b’, or ‘c’

Fuzziness:

We can search for terms that are similar to, but not exactly like our search terms, using the “fuzzy” operator (~<distance>). Fuzziness uses the Damerau-Levenshtein distance to find all terms with a maximum of two changes, where a change is the insertion, deletion or substitution of a single character, or transposition of two adjacent characters. The default edit distance is 2, but an edit distance of 1 should be sufficient to catch 80% of all human misspellings. It can be specified as: cload~1

Proximity searches:

While a phrase query (eg “john smith”) expects all of the terms in exactly the same order, a proximity query allows the specified words to be further apart or in a different order. In the same way that fuzzy queries can specify a maximum edit distance for characters in a word, a proximity search allows us to specify a maximum edit distance of words in a phrase:

“fox quick”~5

The closer the text in a field is to the original order specified in the query string, the more relevant that document is considered to be. When compared to the above example query, the phrase “quick fox” would be considered more relevant than “quick brown fox”.

Ranges:

Ranges can be specified for date, numeric or string fields. Inclusive ranges are specified with square brackets [min TO max] and exclusive ranges with curly brackets {min TO max}.

Below are some example of range query

date:[2012-01-01 TO 2012-12-31] All days in 2012

count:[1 TO 5] Numbers 1..5

tag:{alpha TO omega} Tags between alpha and omega, excluding alpha and omega:

count:[10 TO *] Numbers from 10 upwards

date:{* TO 2012-01-01} Dates before 2012

#Curly and square brackets can be combined:

count:[1 TO 5} Numbers from 1 up to but not including 5

#Ranges with one side unbounded can use the following syntax:

age:>10

age:>=10

age:<10

age:<=10

Boosting:

Use the boost operator ^ to make one term more relevant than another. For instance, if we want to find all documents about foxes, but we are especially interested in quick foxes:

quick^2 fox

The default boost value is 1, but can be any positive floating point number. Boosts between 0 and 1 reduce relevance. Boosts can also be applied to phrases or to groups:

“john smith”^2 (foo bar)^4

Boolean Operators:

Elasticsearch supports the AND, OR, and NOT Boolean operators.

To get more details on ELK, please refer below documentation.

https://www.elastic.co/guide/index.html

 

Leave a Reply