How To Install And Configure Elasticsearch Cluster In Linux
Hello Everyone
Welcome to CloudAffaire and this is Debjeet.
In this series, we will explore one of the most popular log management tools in DevOps better known as ELK (E=Elasticserach, L=Logstash, K=Kibana) stack.
What Is Elasticsearch?
Elasticsearch is a distributed, open source search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured. Elasticsearch is built on Apache Lucene and was first released in 2010 by Elasticsearch N.V. (now known as Elastic). Known for its simple REST APIs, distributed nature, speed, and scalability, Elasticsearch is the central component of the Elastic Stack, a set of open source tools for data ingestion, enrichment, storage, analysis, and visualization. Commonly referred to as the ELK Stack (after Elasticsearch, Logstash, and Kibana), the Elastic Stack now includes a rich collection of lightweight shipping agents known as Beats for sending data to Elasticsearch.
Installing Elasticsearch Cluster:
Step 1: Configure yum repository for elasticsearch.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
######################################################### ## How To Install And Configure Elasticsearch In Linux ## ######################################################### ## Prerequisites: One Linux system with internet access ## Linux OS: CentOs 7 ## IP: 192.168.0.10 ## ------------------------ ## Configure yum repository ## ------------------------ ## Download and install the public signing key sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch ## Create the repository file sudo vi /etc/yum.repos.d/elasticsearch.repo --------------------- [elasticsearch] name=Elasticsearch repository for 7.x packages baseurl=https://artifacts.elastic.co/packages/7.x/yum gpgcheck=1 gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch enabled=1 autorefresh=1 type=rpm-md --------------------- :wq |
Step 2: Install elasticsearch cluster.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
## --------------------- ## Install elasticsearch ## --------------------- ## Install elasticsearch sudo yum install elasticsearch ## Enable and start elasticsearch sudo systemctl daemon-reload sudo systemctl enable elasticsearch sudo systemctl start elasticsearch sudo systemctl status elasticsearch ## Check if elasticsearch installed successfully curl -X GET "localhost:9200/?pretty" |
Configuring Elasticsearch Cluster:
Elasticsearch ships with good defaults and requires very little configuration. Most settings can be changed on a running cluster using the Cluster update settings API.
Config files location:
- elasticsearch.yml for configuring Elasticsearch
- jvm.options for configuring Elasticsearch JVM settings
- log4j2.properties for configuring Elasticsearch logging
Note: These files are located in the config directory, whose default location depends on whether or not the installation is from an archive distribution (tar.gz or zip) or a package distribution (Debian or RPM packages).
For RPM based installation, Elasticsearch defaults to using /etc/elasticsearch for runtime configuration. Elasticsearch loads its configuration from the /etc/elasticsearch/elasticsearch.yml. The RPM also has a system configuration file located in /etc/sysconfig/elasticsearch
Elasticsearch Configuration Options:
Using /etc/elasticsearch/elasticsearch.yml:
- path.data: Path to directory where to store the data (separate multiple locations by comma), default value: /var/lib/elasticsearch
- path.log: Path to log files, default value: /var/log/elasticsearch
- cluster.name: Name of your elasticsearch cluster, default value: my-application
- node.name: Elasticsearch uses node.name as a human readable identifier for a particular instance of Elasticsearch so it is included in the response of many APIs. It defaults to the hostname that the machine has when Elasticsearch starts but can be configured explicitly in elasticsearch.yml using node.name.
- network.host: By default, Elasticsearch binds to loopback addresses only — e.g. 127.0.0.1 and [::1]. This is sufficient to run a single development node on a server, in order to form a cluster with nodes on other servers, your node will need to bind to a non-loopback address using network.host: <HOST_IP_ADDRESS>
- http.port: Elasticsearch uses port 9200 by default, you can change the default port using http.port:<CUSTOM_PORT>
- discovery.seed_hosts: Pass an initial list of hosts to perform discovery when this node is started. Default value: 127.0.0.1, [::1].
- cluster.initial_master_nodes: Bootstrap the cluster using an initial set of master-eligible nodes.
- bootstrap.memory_lock: Lock the memory on startup
- gateway.recover_after_nodes: Block initial recovery after a full cluster restart until N nodes are started
Using /etc/sysconfig/elasticsearch:
- JAVA_HOME: Set a custom Java path to be used.
- MAX_OPEN_FILES: Maximum number of open files, defaults to 65535.
- MAX_LOCKED_MEMORY: Maximum locked memory size. Set to unlimited if you use the bootstrap.memory_lock option in elasticsearch.yml.
- MAX_MAP_COUNT: Maximum number of memory map areas a process may have. If you use mmapfs as index store type, make sure this is set to a high value. For more information, check the linux kernel documentation about max_map_count. This is set via sysctl before starting Elasticsearch. Defaults to 262144.
- ES_PATH_CONF: Configuration file directory (which needs to include elasticsearch.yml, jvm.options, and log4j2.properties files); defaults to /etc/elasticsearch.
- ES_JAVA_OPTS: Any additional JVM system properties you may want to apply.
- ES_HOME: Elasticsearch home directory, default value: /usr/share/elasticsearch
- PID_DIR: Elasticsearch PID directory, default value: /var/run/elasticsearch
- RESTART_ON_UPGRADE: Configure restart on package upgrade, defaults value: false.
Note: Distributions that use systemd require that system resource limits be configured via systemd rather than via the /etc/sysconfig/elasticsearch file.
Elasticsearch Directory Layout (RPM based):
- home: Elasticsearch home directory or $ES_HOME, default location: /usr/share/elasticsearch
- bin: Binary scripts including elasticsearch to start a node and elasticsearch-plugin to install plugins, default location: /usr/share/elasticsearch/bin
- conf: Configuration files including elasticsearch.yml, default location: /etc/elasticsearch
- conf: Environment variables including heap size, file descriptors, default location /etc/sysconfig/elasticsearch
- data: The location of the data files of each index / shard allocated on the node. Can hold multiple locations, default location: /var/lib/elasticsearch
- jdk: The bundled Java Development Kit used to run Elasticsearch. Can be overridden by setting the JAVA_HOME environment variable in /etc/sysconfig/elasticsearch. Default location: /usr/share/elasticsearch/jdk
- logs: Log files location, default location: /var/log/elasticsearch
- plugins: Plugin files location. Each plugin will be contained in a subdirectory, default location: /usr/share/elasticsearch/plugins
Step 3: View and change elasticsearch cluster configuration.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
## ------------------------------------------- ## View And Change Elasticsearch Configuration ## ------------------------------------------- ## View elasticsearch configuration sudo cat /etc/elasticsearch/elasticsearch.yml sudo cat /etc/sysconfig/elasticsearch ## Change default cluster and node name sudo vi /etc/elasticsearch/elasticsearch.yml ----------------------- cluster.name: mycluster node.name: mynode1 ----------------------- :wq ## Restart elasticsearch cluster sudo systemctl restart elasticsearch |
Step 4: Get elasticsearch cluster details.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
## ------------------------- ## Get Elasticsearch details ## ------------------------- ## Get elasticsearch cluster details curl -XGET 'localhost:9200/_cluster/stats?human&pretty' ## Get elasticsearch cluster health curl -XGET 'localhost:9200/_cluster/health?pretty' ## Get elasticsearch node details curl -XGET 'localhost:9200/_cat/nodes?pretty' curl -XGET 'localhost:9200/_nodes/stats?human&pretty' ## Get a specific node details curl -XGET 'localhost:9200/_nodes/mynode1/stats?pretty' ## Get master node details curl -XGET 'localhost:9200/_cat/master?v&pretty' ## Get all running tasks curl -XGET 'localhost:9200/_cat/tasks?v&pretty' curl -XGET 'localhost:9200/_tasks?pretty' ## Get cluster pending tasks curl -XGET 'localhost:9200/_cluster/pending_tasks?pretty' ## Get all elasticsearch index names curl -XGET 'localhost:9200/_cat/indices' ## Get shard details curl -XGET 'localhost:9200/_cat/shards?pretty' |
Step 5: Get and Put data into your elasticsearch cluster.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
## ------------------------------------------- ## Insert Data Into Your Elasticsearch Cluster ## ------------------------------------------- ## Download some sample data and extract the files ## sudo yum install git -y git clone https://github.com/CloudAffaire/sample_data.git && cp sample_data/Employees* . ## Bulk insert data into your elasticsearch cluster curl -XPUT 'localhost:9200/cloudaffairempldb/_bulk' \ -H 'Content-Type: application/json' \ --data-binary @Employees25K.json ## Get mapping details for cloudaffairempldb index curl -XGET 'localhost:9200/cloudaffairempldb/_mapping?pretty' ## Get a single/multiple documents curl -XGET 'localhost:9200/_search?pretty' -H 'Content-Type: application/json' -d' { "query": { "query_string" : { "query" : "(FirstName:ASHLI AND LastName:CUJAS)" } } } ' curl -XGET 'localhost:9200/_search?pretty' -H 'Content-Type: application/json' -d' { "query": { "query_string" : { "query" : "(Designation:Senior Software Engineer)" } } } ' |
To get more details on ELK, please refer below documentation.
https://www.elastic.co/guide/index.html