How To Create A Pipeline In Logstash

How To Create A Pipeline In Logstash

How To Create A Pipeline In Logstash

Hello Everyone

Welcome to CloudAffaire and this is Debjeet.

In this series, we will explore one of the most popular log management tools in DevOps better known as ELK (E=Elasticserach, L=Logstash, K=Kibana) stack.

How To Create A Pipeline In Logstash

What Is A Logstash Pipeline:

The Logstash event processing pipeline has three stages: inputs ==> filters ==> outputs. Inputs generate events, filters modify them and outputs ship them elsewhere. Inputs and outputs support codecs that enable you to encode or decode the data as it enters or exits the pipeline without having to use a separate filter. In layman term, you can compare a Logstash with an ETL tool in modern RDBMS systems.

Logstash Pipeline Stages:

Inputs:

Inputs are used to get data into Logstash. Logstash supports different input as your data source, it can be a plain file, syslogs, beats, cloudwatch, kinesis, s3, etc.

Filters:

Filters are intermediary processing devices in the Logstash pipeline. You can combine filters with conditionals to perform an action on an event if it meets certain criteria. Logstash supports different types of filters for data processing like gork, mutate, aggregate, csv, json, etc.

Outputs:

Outputs are the final phase of the Logstash pipeline. An event can pass through multiple outputs, but once all output processing is complete, the event has finished its execution. Logstash supports different types of outputs to store or send the final processed data like elasticsearch, cloudwatch, csv, file, mongodb, s3, sns, etc.

Codecs:

Codecs are basically stream filters that can operate as part of an input or output. Codecs enable you to easily separate the transport of your messages from the serialization process. Logstash supports different types of codecs like json, msgpack, and plain (text), etc.

Example Pipeline Config File Format:

Structure Of A Pipeline Config File:

A Logstash config file has a separate section for each type of plugin you want to add to the event processing pipeline. Each section contains the configuration options for one or more plugins. If you specify multiple filters, they are applied in the order of their appearance in the configuration file.

Plugin Configuration:

The configuration of a plugin consists of the plugin name followed by a block of settings for that plugin. The settings you can configure vary according to the plugin type. For information about each plugin, see Input Plugins, Output Plugins, Filter Plugins, and Codec Plugins.

Value Type:

A plugin can require that the value for a setting be a certain type, such as boolean, list, or hash. The following value types are supported.

  • Comments: Comments are the same as in perl, ruby, and python. A comment starts with a # character, and does not need to be at the beginning of a line.
  • Escape Sequences: By default, escape sequences are not enabled. If you wish to use escape sequences in quoted strings, you will need to set config.support_escapes: true in your logstash.yml.
  • String: A string must be a single character sequence. Note that string values are enclosed in quotes, either double or single.
  • Path: A path is a string that represents a valid operating system path.
  • URI: A URI can be anything from a full URL like http://elastic.co/ to a simple identifier like foobar. If the URI contains a password such as http://user:pass@example.net the password portion of the URI will not be logged or printed.
  • Password: A password is a string with a single value that is not logged or printed.
  • Number: Numbers must be valid numeric values (floating point or integer).
  • Hash: A hash is a collection of key value pairs specified in the format “field1” => “value1”. Note that multiple key value entries are separated by spaces rather than commas.
  • Codec: A codec is the name of Logstash codec used to represent the data. Codecs can be used in both inputs and outputs.
  • Bytes: A bytes field is a string field that represents a valid unit of bytes. It is a convenient way to declare specific sizes in your plugin options.
  • Boolean: A boolean must be either true or false. Note that the true and false keywords are not enclosed in quotes.
  • List: Not a type in and of itself, but a property types can have. This makes it possible to type check multiple values.

Input Plugins:

  • azure_event_hubs: Receives events from Azure Event Hubs
  • beats: Receives events from the Elastic Beats framework
  • cloudwatch: Pulls events from the Amazon Web Services CloudWatch API
  • couchdb_changes: Streams events from CouchDB’s _changes URI
  • dead_letter_queue: read events from Logstash’s dead letter queue
  • elasticsearch: Reads query results from an Elasticsearch cluster
  • exec: Captures the output of a shell command as an event
  • file: Streams events from files
  • ganglia: Reads Ganglia packets over UDP
  • gelf: Reads GELF-format messages from Graylog2 as events
  • generator: Generates random log events for test purposes
  • github: Reads events from a GitHub webhook
  • google_cloud_storage: Extract events from files in a Google Cloud Storage bucket
  • google_pubsub: Consume events from a Google Cloud PubSub service
  • graphite: Reads metrics from the graphite tool
  • heartbeat: Generates heartbeat events for testing
  • http: Receives events over HTTP or HTTPS
  • http_poller: Decodes the output of an HTTP API into events
  • imap: Reads mail from an IMAP server
  • irc: Reads events from an IRC server
  • java_generator: Generates synthetic log events
  • java_stdin: Reads events from standard input
  • jdbc: Creates events from JDBC data
  • jms: Reads events from a Jms Broker
  • jmx: Retrieves metrics from remote Java applications over JMX
  • kafka: Reads events from a Kafka topic
  • kinesis: Receives events through an AWS Kinesis stream
  • log4j: Reads events over a TCP socket from a Log4j SocketAppender object
  • lumberjack: Receives events using the Lumberjack protocl
  • meetup: Captures the output of command line tools as an event
  • pipe: Streams events from a long-running command pipe
  • puppet_facter: Receives facts from a Puppet server
  • rabbitmq: Pulls events from a RabbitMQ exchange
  • redis: Reads events from a Redis instance
  • relp: Receives RELP events over a TCP socket
  • rss: Captures the output of command line tools as an event
  • s3: Streams events from files in a S3 bucket
  • s3_sns_sqs: Reads logs from AWS S3 buckets using sqs
  • salesforce: Creates events based on a Salesforce SOQL query
  • snmp: Polls network devices using Simple Network Management Protocol (SNMP)
  • snmptrap: Creates events based on SNMP trap messages
  • sqlite: Creates events based on rows in an SQLite database
  • sqs: Pulls events from an Amazon Web Services Simple Queue Service queue
  • stdin: Reads events from standard input
  • stomp: Creates events received with the STOMP protocol
  • syslog: Reads syslog messages as events
  • tcp: Reads events from a TCP socket
  • twitter: Reads events from the Twitter Streaming API
  • udp: Reads events over UDP
  • unix: Reads events over a UNIX socket
  • varnishlog: Reads from the varnish cache shared memory log
  • websocket: Reads events from a websocket
  • wmi: Creates events based on the results of a WMI query
  • xmpp: Receives events over the XMPP/Jabber protocol

Filter Plugins:

  • aggregate: Aggregates information from several events originating with a single task
  • alter: Performs general alterations to fields that the mutate filter does not handle
  • bytes: Parses string representations of computer storage sizes, such as “123 MB” or “5.6gb”, into their numeric value in bytes
  • cidr: Checks IP addresses against a list of network blocks
  • cipher: Applies or removes a cipher to an event
  • clone: Duplicates events
  • csv: Parses comma-separated value data into individual fields
  • date: Parses dates from fields to use as the Logstash timestamp for an event
  • de_dot: Computationally expensive filter that removes dots from a field name
  • dissect: Extracts unstructured event data into fields using delimiters
  • dns: Performs a standard or reverse DNS lookup
  • drop: Drops all events
  • elapsed: Calculates the elapsed time between a pair of events
  • elasticsearch: Copies fields from previous log events in Elasticsearch to current events
  • environment: Stores environment variables as metadata sub-fields
  • extractnumbers: Extracts numbers from a string
  • fingerprint: Fingerprints fields by replacing values with a consistent hash
  • geoip: Adds geographical information about an IP address
  • grok: Parses unstructured event data into fields
  • http: Provides integration with external web services/REST APIs
  • i18n: Removes special characters from a field
  • java_uuid: Generates a UUID and adds it to each processed event
  • jdbc_static: Enriches events with data pre-loaded from a remote database
  • jdbc_streaming: Enrich events with your database data
  • json: Parses JSON events
  • json_encode: Serializes a field to JSON
  • kv: Parses key-value pairs
  • memcached: Provides integration with external data in Memcached
  • metricize: Takes complex events containing a number of metrics and splits these up into multiple events, each holding a single metric
  • metrics: Aggregates metrics
  • mutate: Performs mutations on fields
  • prune: Prunes event data based on a list of fields to blacklist or whitelist
  • range: Checks that specified fields stay within given size or length limits
  • ruby: Executes arbitrary Ruby code
  • sleep: Sleeps for a specified time span
  • split: Splits multi-line messages into distinct events
  • syslog_pri: Parses the PRI (priority) field of a syslog message
  • threats_classifier: Enriches security logs with information about the attacker’s intent
  • throttle: Throttles the number of events
  • tld: Replaces the contents of the default message field with whatever you specify in the configuration
  • translate: Replaces field contents based on a hash or YAML file
  • truncate: Truncates fields longer than a given length
  • urldecode: Decodes URL-encoded fields
  • useragent: Parses user agent strings into fields
  • uuid: Adds a UUID to events
  • xml: Parses XML into fields

Output Plugins:

  • boundary: Sends annotations to Boundary based on Logstash events
  • circonus: Sends annotations to Circonus based on Logstash events
  • cloudwatch: Aggregates and sends metric data to AWS CloudWatch
  • csv: Writes events to disk in a delimited format
  • datadog: Sends events to DataDogHQ based on Logstash events
  • datadog_metrics: Sends metrics to DataDogHQ based on Logstash events
  • elastic_app_search: Sends events to the Elastic App Search solution
  • elasticsearch: Stores logs in Elasticsearch
  • email: Sends email to a specified address when output is received
  • exec: Runs a command for a matching event
  • file: Writes events to files on disk
  • ganglia: Writes metrics to Ganglia’s gmond
  • gelf: Generates GELF formatted output for Graylog2
  • google_bigquery: Writes events to Google BigQuery
  • google_cloud_storage: Uploads log events to Google Cloud Storage
  • google_pubsub: Uploads log events to Google Cloud Pubsub
  • graphite: Writes metrics to Graphite
  • graphtastic: Sends metric data on Windows
  • http: Sends events to a generic HTTP or HTTPS endpoint
  • influxdb: Writes metrics to InfluxDB
  • irc: Writes events to IRC
  • java_sink: Discards any events received
  • java_stdout: Prints events to the STDOUT of the shell
  • juggernaut: Pushes messages to the Juggernaut websockets server
  • kafka: Writes events to a Kafka topic
  • librato: Sends metrics, annotations, and alerts to Librato based on Logstash events
  • loggly: Ships logs to Loggly
  • lumberjack: Sends events using the lumberjack protocol
  • metriccatcher: Writes metrics to MetricCatcher
  • mongodb: Writes events to MongoDB
  • nagios: Sends passive check results to Nagios
  • nagios_nsca: Sends passive check results to Nagios using the NSCA protocol
  • opentsdb: Writes metrics to OpenTSDB
  • pagerduty: Sends notifications based on preconfigured services and escalation policies
  • pipe: Pipes events to another program’s standard input
  • rabbitmq: Pushes events to a RabbitMQ exchange
  • redis: Sends events to a Redis queue using the RPUSH command
  • redmine: Creates tickets using the Redmine API
  • riak: Writes events to the Riak distributed key/value store
  • riemann: Sends metrics to Riemann
  • s3: Sends Logstash events to the Amazon Simple Storage Service
  • sns: Sends events to Amazon’s Simple Notification Service
  • solr_http: Stores and indexes logs in Solr
  • sqs: Pushes events to an Amazon Web Services Simple Queue Service queue
  • statsd: Sends metrics using the statsd network daemon
  • stdout: Prints events to the standard output
  • stomp: Writes events using the STOMP protocol
  • syslog: Sends events to a syslog server
  • tcp: Writes events over a TCP socket
  • timber: Sends events to the Timber.io logging service
  • udp: Sends events over UDP
  • webhdfs: Sends Logstash events to HDFS using the webhdfs REST API
  • websocket: Publishes messages to a websocket
  • xmpp: Posts events over XMPP
  • zabbix: Sends events to a Zabbix server

Codec Plugin:

  • avro: Reads serialized Avro records as Logstash events
  • cef: Reads the ArcSight Common Event Format (CEF).
  • cloudfront: Reads AWS CloudFront reports
  • cloudtrail: Reads AWS CloudTrail log files
  • collectd: Reads events from the collectd binary protocol using UDP.
  • dots: Sends 1 dot per event to stdout for performance tracking
  • edn: Reads EDN format data
  • edn_lines: Reads newline-delimited EDN format data
  • es_bulk: Reads the Elasticsearch bulk format into separate events, along with metadata
  • fluent: Reads the fluentd msgpack schema
  • graphite: Reads graphite formatted lines
  • gzip_lines: Reads gzip encoded content
  • jdots: Renders each processed event as a dot
  • java_line: Encodes and decodes line-oriented text data
  • java_plain: Processes text data with no delimiters between events
  • json: Reads JSON formatted content, creating one event per element in a JSON array
  • json_lines: Reads newline-delimited JSON
  • line: Reads line-oriented text data
  • msgpack: Reads MessagePack encoded content
  • multiline: Merges multiline messages into a single event
  • netflow: Reads Netflow v5 and Netflow v9 data
  • nmap: Reads Nmap data in XML format
  • plain: Reads plaintext with no delimiting between events
  • protobuf: Reads protobuf messages and converts to Logstash Events
  • rubydebug: Applies the Ruby Awesome Print library to Logstash events

How To Create A Pipeline In Logstash:

Step 1: Install and configure apache webserver. The access log of this webserver will serve our input to Logstash pipeline.

Step 2: Create the pipeline configuration file.

Step 3: Stash apache access logs to Elasticsearch using Logstash Pipeline.

To get more details on ELK, please refer below documentation.

https://www.elastic.co/guide/index.html

 

Leave a Reply

Close Menu