How To Query Elasticsearch Using Lucene Query
Hello Everyone
Welcome to CloudAffaire and this is Debjeet.
In this series, we will explore one of the most popular log management tools in DevOps better known as ELK (E=Elasticserach, L=Logstash, K=Kibana) stack.
What Is Lucene Query?
Apache Lucene is a free and open-source search engine software library, originally written completely in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License. While suitable for any application that requires full text indexing and searching capability, Lucene is recognized for its utility in the implementation of Internet search engines and local, single-site searching. Lucene includes a feature to perform a fuzzy search based on edit distance. Lucene has also been used to implement recommendation systems. Elasticsearch is a search engine based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents.
Elasticsearch Lucene Query Syntax:
Field name:
You can specify fields to search in the query syntax.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
################################### ## Lucene Query In Elasticsearch ## ################################### ## Prerequisite: elasticsearch cluster is configured with sample data ## https://cloudaffaire.com/how-to-create-elasticsearch-cluster-in-aws/ ############################### ## Basic Lucene Query Syntax ## ############################### ## ----------- ## Field names ## ----------- ## You can specify fields to search in the query syntax ## Get all employees whose first name is 'ASHLI' curl -X GET "$AWS_ES_ENDPOINT/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "query_string" : { "query" : "(FirstName:ASHLI)" } } } ' ## Get all employees who are Unmarried curl -X GET "$AWS_ES_ENDPOINT/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "query_string" : { "query" : "(MaritalStatus : Unmarried)" } } } ' ## List all employees who have some extracurricular Interests (Interests field is non empty) curl -X GET "$AWS_ES_ENDPOINT/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "query_string" : { "query" : "(_exists_ : Interests)" } } } ' |
Wildcards:
Wildcard searches can be run on individual terms, using ? to replace a single character, and * to replace zero or more characters.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
## --------- ## Wildcards ## --------- ## Wildcard searches can be run on individual terms, using ? ## to replace a single character, and * to replace zero or more characters ## List all employees whose name has AS curl -X GET "$AWS_ES_ENDPOINT/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "query_string" : { "query" : "(FirstName:AS?LI)" } } } ' ## List all employees who are managers curl -X GET "$AWS_ES_ENDPOINT/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "query_string" : { "query" : "(Designation:*Manager*)" } } } ' |
Regular expressions:
Regular expression patterns can be embedded in the query string by wrapping them in forward-slashes (“/”). Elasticsearch uses Apache Lucene’s regular expression engine to parse these queries.
Reserved characters:
Lucene’s regular expression engine supports all Unicode characters. However, the following characters are reserved as operators:
. ? + * | { } [ ] ( ) ” \
Depending on the optional operators enabled, the following characters may also be reserved:
# @ & < > ~
Standard operators:
Lucene’s regular expression engine does not use the Perl Compatible Regular Expressions (PCRE) library, but it does support the following standard operators.
- . Matches any character. For example: ab. # matches ‘aba’, ‘abb’, ‘abz’, etc.
- ? Repeat the preceding character zero or one times. For example: abc? # matches ‘ab’ and ‘abc’
- + Repeat the preceding character one or more times. For example: ab+ # matches ‘abb’, ‘abbb’, ‘abbbb’, etc.
- Repeat the preceding character zero or more times. For example: ab* # matches ‘ab’, ‘abb’, ‘abbb’, ‘abbbb’, etc.
- {} Minimum and maximum number of times the preceding character can repeat. For example: a{2} # matches ‘aa’, a{2,4} # matches ‘aa’, ‘aaa’, and ‘aaaa’
- | OR operator. The match will succeed if the longest pattern on either the left side OR the right side matches. For example: abc|xyz # matches ‘abc’ and ‘xyz’
- ( … ) Forms a group. You can use a group to treat part of the expression as a single character. For example: abc(def)? # matches ‘abc’ and ‘abcdef’ but not ‘abcd’
- [ … ] Match one of the characters in the brackets. For example: [abc] # matches ‘a’, ‘b’, ‘c’, [a-c] # matches ‘a’, ‘b’, or ‘c’
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
## ------------------- ## Regular expressions ## ------------------- ## Regular expression patterns can be embedded in the query string by wrapping them in forward-slashes ("/") ## https://www.elastic.co/guide/en/elasticsearch/reference/7.5/regexp-syntax.html ## List all employees whoes firstname is AL and [A,B,C,D,E,F,G,H,I,J or K] curl -X GET "$AWS_ES_ENDPOINT/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "query_string" : { "query" : "(FirstName:/AL([A-K])/)" } } } ' |
Fuzziness:
We can search for terms that are similar to, but not exactly like our search terms, using the “fuzzy” operator (~<distance>). Fuzziness uses the Damerau-Levenshtein distance to find all terms with a maximum of two changes, where a change is the insertion, deletion or substitution of a single character, or transposition of two adjacent characters. The default edit distance is 2, but an edit distance of 1 should be sufficient to catch 80% of all human misspellings. It can be specified as: cload~1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
## --------- ## Fuzziness ## --------- ## We can search for terms that are similar to, but not exactly like our search terms, using the “fuzzy” operator (~) ## List all employees whoes firstname is similar to ASHLI curl -X GET "$AWS_ES_ENDPOINT/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "query_string" : { "query" : "(FirstName:ASHLI~2)" } } } ' |
Proximity searches:
While a phrase query (eg “john smith”) expects all of the terms in exactly the same order, a proximity query allows the specified words to be further apart or in a different order. In the same way that fuzzy queries can specify a maximum edit distance for characters in a word, a proximity search allows us to specify a maximum edit distance of words in a phrase:
“fox quick”~5
The closer the text in a field is to the original order specified in the query string, the more relevant that document is considered to be. When compared to the above example query, the phrase “quick fox” would be considered more relevant than “quick brown fox”.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
## ------------------ ## Proximity searches ## ------------------ ## In the same way that fuzzy queries can specify a maximum edit distance for characters in a word, ## a proximity search allows us to specify a maximum edit distance of words in a phrase. ## List all employees with designation close to senior engineer curl -X GET "$AWS_ES_ENDPOINT/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "query_string" : { "query" : "(Designation:Senior Engineer~3)" } } } ' |
Ranges:
Ranges can be specified for date, numeric or string fields. Inclusive ranges are specified with square brackets [min TO max] and exclusive ranges with curly brackets {min TO max}.
Below are some example of range query
date:[2012-01-01 TO 2012-12-31] All days in 2012
count:[1 TO 5] Numbers 1..5
tag:{alpha TO omega} Tags between alpha and omega, excluding alpha and omega:
count:[10 TO *] Numbers from 10 upwards
date:{* TO 2012-01-01} Dates before 2012
#Curly and square brackets can be combined:
count:[1 TO 5} Numbers from 1 up to but not including 5
#Ranges with one side unbounded can use the following syntax:
age:>10
age:>=10
age:<10
age:<=10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
## ------ ## Ranges ## ------ ## Ranges can be specified for date, numeric or string fields. Inclusive ranges are specified with ## square brackets [min TO max] and exclusive ranges with curly brackets {min TO max}. ## List all employees whoes Salary is in 50000 range curl -X GET "$AWS_ES_ENDPOINT/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "query_string" : { "query" : "(Salary:[50000 TO 59000])" } } } ' ## List all employees whoes Salary is greater than equal 70000 curl -X GET "$AWS_ES_ENDPOINT/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "query_string" : { "query" : "(Salary:>=70000)" } } } ' |
Boosting:
Use the boost operator ^ to make one term more relevant than another. For instance, if we want to find all documents about foxes, but we are especially interested in quick foxes:
quick^2 fox
The default boost value is 1, but can be any positive floating point number. Boosts between 0 and 1 reduce relevance. Boosts can also be applied to phrases or to groups:
“john smith”^2 (foo bar)^4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
## -------- ## Boosting ## -------- ## Use the boost operator ^ to make one term more relevant than another. For instance, ## if we want to find all documents about foxes, but we are especially interested in quick foxes ## List all employees whoes designation has Manager with priority to Delivery Manager curl -X GET "$AWS_ES_ENDPOINT/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "query_string" : { "query" : "(Designation:Delivery^2 Manager)" } } } ' |
Boolean Operators:
Elasticsearch supports the AND, OR, and NOT Boolean operators.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
## ------- ## Boolean ## ------- ## Elasticsearch supports the AND, OR, and NOT Boolean operators. ## List all employees whoes firstname is ASHLI and lastname is CUJAS curl -X GET "$AWS_ES_ENDPOINT/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "query_string" : { "query" : "(FirstName:ASHLI AND LastName : CUJAS)" } } } ' |
To get more details on ELK, please refer below documentation.
https://www.elastic.co/guide/index.html