What Is Elasticsearch?
Hello Everyone
Welcome to CloudAffaire and this is Debjeet.
In this series, we will explore one of the most popular log management tools in DevOps better known as ELK (E=Elasticserach, L=Logstash, K=Kibana) stack.
What Is ELK Stack In DevOps?
The ELK Stack is a collection of three open-source products — Elasticsearch, Logstash, and Kibana — all developed, managed and maintained by Elastic. Elasticsearch is an open-source, full-text search and analysis engine, based on the Apache Lucene search engine. Logstash is a log aggregator that collects data from various input sources, executes different transformations and enhancements and then ships the data to various supported output destinations. Kibana is a visualization layer that works on top of Elasticsearch, providing users with the ability to analyze and visualize the data. Together, these different components are most commonly used for monitoring, troubleshooting and securing IT environments, business intelligence, and web analytics.
What Is Elasticsearch?
Elasticsearch is a distributed, open-source search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured. Elasticsearch is built on Apache Lucene and was first released in 2010 by Elasticsearch N.V. (now known as Elastic). Known for its simple REST APIs, distributed nature, speed, and scalability, Elasticsearch is the central component of the Elastic Stack, a set of open source tools for data ingestion, enrichment, storage, analysis, and visualization. Commonly referred to as the ELK Stack (after Elasticsearch, Logstash, and Kibana), the Elastic Stack now includes a rich collection of lightweight shipping agents known as Beats for sending data to Elasticsearch.
How does Elasticsearch work?
Raw data flows into Elasticsearch from a variety of sources, including logs, system metrics, and web applications. Data ingestion is the process by which this raw data is parsed, normalized, and enriched before it is indexed in Elasticsearch. Once indexed in Elasticsearch, users can run complex queries against their data and use aggregations to retrieve complex summaries of their data. From Kibana, users can create powerful visualizations of their data, share dashboards, and manage the Elastic Stack.
Elasticsearch components:
Fields:
Fields are the smallest individual unit of data in Elasticsearch. These are customizable and could include, for example: firstname, lastname, address, designation, salary, gender, etc. Each field has a defined datatype and contains a single piece of data. Those datatypes include the core datatypes (strings, numbers, dates, booleans), complex datatypes (object and nested), geo datatypes (get_point and geo_shape), and specialized datatypes (token count, join, rank feature, dense vector, flattened, etc.)
There are different kinds of fields and ways to manage them. Fields are one of several mechanisms for Elasticsearch mapping.
- Multi-fields: These fields can be indexed in more than one way to produce more search results.
- Meta-fields: Meta-fields deal with a document’s metadata.
1 2 3 4 5 6 7 8 9 10 11 |
############################## ## Elasticsearch Components ## ############################## ## Prerequisite: ## elasticsearch cluster is configured with sample data ## https://cloudaffaire.com/how-to-create-elasticsearch-cluster-in-aws/ ## AWS CLI installed and configured with proper access ## https://cloudaffaire.com/category/aws/aws-cli/ ## Get feild details curl -X GET "$AWS_ES_ENDPOINT/cloudaffairempldb/_mapping/field/DateOfJoining?pretty" |
Documents:
Documents are JSON objects that are stored within an Elasticsearch index and are considered the base unit of storage. In the world of relational databases, documents can be compared to a row in table. For example, in an employee dataset, each employee details may be considered as a document. Data in documents is defined with fields comprised of keys and values. A key is the name of the field, and a value can be an item of many different types such as a string, a number, a boolean expression, another object, or an array of values. Documents also contain reserved fields that constitute the document metadata such as:
- _index – the index where the document resides
- _type – the type that the document represents
- _id – the unique identifier for the document
1 2 3 4 5 6 7 8 9 10 |
## Get a single document curl -X GET "$AWS_ES_ENDPOINT/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "query_string" : { "query" : "(FirstName:ASHLI AND LastName:CUJAS)" } } } ' |
Type:
Type is a logical grouping of the documents within the index. In the previous example of product index, we can further group documents into types like electronics, fashion, furniture, etc. Types are defined based on documents having similar properties in it. It is difficult to decide when to use the type over index. Indices has more overheads so sometimes it is better to use different types in the same index which yields better performance. There are couple of restrictions using types as well. Two fields having the same name in different type of document should be of same datatype (string, date, etc.).
Mapping:
Mapping is the process of defining how a document, and the fields it contains, are stored and indexed. For instance, use mappings to define:
- which string fields should be treated as full text fields.
- which fields contain numbers, dates, or geolocations.
- the format of date values.
- custom rules to control the mapping for dynamically added fields.
Mapping Type: Each index has one mapping type which determines how the document will be indexed
- Dynamic mapping: Fields and mapping types do not need to be defined before being used.
- Explicit mappings: Fields and mapping types needs to be explicetly defined.
Note: mapping is deprecated from Elasticsearch version 6.
1 2 |
## Get mapping details curl -X GET "$AWS_ES_ENDPOINT/cloudaffairempldb/_mapping?pretty" |
Index:
An index is like a ‘database’ in a relational database. It has a mapping which defines multiple types. An index is a logical namespace which maps to one or more primary shards and can have zero or more replica shards. Below is a comparison of Elasticsearch with RDBMS
- RDBMS => Databases => Tables => Columns/Rows
- Elasticsearch => Indices => Types => Documents with Properties
1 2 |
## Get Index details curl -XGET "$AWS_ES_ENDPOINT/_cat/indices" |
Shards:
Under the covers, an Elasticsearch index is really just a logical grouping of one or more physical shards, where each shard is actually a self-contained index. By distributing the documents in an index across multiple shards, and distributing those shards across multiple nodes, Elasticsearch can ensure redundancy, which both protects against hardware failures and increases query capacity as nodes are added to a cluster.
Replica:
As the cluster grows (or shrinks), Elasticsearch automatically migrates shards to rebalance the cluster. There are two types of shards: primaries and replicas. Each document in an index belongs to one primary shard. A replica shard is a copy of a primary shard. Replicas provide redundant copies of your data to protect against hardware failure and increase capacity to serve read requests like searching or retrieving a document. The number of primary shards in an index is fixed at the time that an index is created, but the number of replica shards can be changed at any time, without interrupting indexing or query operations.
1 2 3 |
## Get shard details curl -X GET "$AWS_ES_ENDPOINT/_cat/shards?pretty" curl -X GET "$AWS_ES_ENDPOINT/_cat/shards/cloudaffairempldb?pretty" |
Instances and Nodes:
The heart of any ELK setup is the Elasticsearch instance, which has the crucial task of storing and indexing data. In a cluster, different responsibilities are assigned to the various node types:
- Data nodes: stores data and executes data-related operations such as search and aggregation
- Master nodes: in charge of cluster-wide management and configuration actions such as adding and removing nodes
- Client nodes: forwards cluster requests to the master node and data-related requests to data nodes
- Tribe nodes: act as a client node, performing read and write operations against all of the nodes in the cluster
- Ingestion nodes (this is new in Elasticsearch 5.0): for pre-processing documents before indexing
- Machine Learning nodes (Basic License): These are nodes available under Elastic’s Basic License that enable machine learning tasks. Machine learning nodes have xpack.ml.enabled and node.ml set to true.
1 2 |
## Get node details curl -X GET "$AWS_ES_ENDPOINT/_nodes?pretty" |
Cluster:
One or more nodes (servers) collectively becomes a cluster which holds your entire data and provides indexing and search capabilities. A Cluster can be as small as a single node or can scale to hundreds or thousands of nodes. Each cluster is identified by a unique name.
1 2 |
## Get cluster details curl -X GET "$AWS_ES_ENDPOINT/_cat/health?v&pretty" |
To get more details on ELK, please refer below documentation.
https://www.elastic.co/guide/index.html