Table of contentsClick link to navigate to the desired location
This content has been automatically translated from Ukrainian.
Elasticsearch is a search and analytics system based on Lucene. It is often called "Google inside your infrastructure" (in the sense of a powerful search engine), because it can instantly find information in huge data sets - logs, texts, metrics, documents, products, etc.
How Elasticsearch works under the hood
Indexing instead of normal storage
Data is not just put "in a table" as in SQL, but indexed: the text is broken down into tokens, normalized and converted into an inverted index. Thanks to this, the search works in milliseconds, even for millions of documents.
Scaling
Elasticsearch is designed as a distributed system. Its main feature is easy horizontal scaling.
Shards - pieces of index
Each index is divided into parts (shards), and each shard is a separate Lucene search engine. They can live on different servers.
Replicas - copies for speed and reliability
Each shard can have copies. It means:
- the system does not fall if one server fails
- reading is getting faster because copies also respond to requests
Horizontal scaling
To increase performance or data quantity, simply adding a new node is enough. Elasticsearch will itself distribute the shards and balance the cluster.
Distributed search
When a request is received, the coordinating node sends it to all shards, collects the results and returns the answer. Search is performed parallel - that's why ES is so fast.
Elasticsearch in Logging: What is ELK Stack
One of the most popular applications of ES is logging. There is a stack for this:
E - Elasticsearch
Stores and indexes all logs.
L - Logstash
Processes logs: reads, filters, parses, structures and sends to Elasticsearch.
K - Kibana
Visualization: search for logs, graphics, filters, dashboards. This makes log analysis convenient and visual.
ELK is used for:
- error search
- analysis of system behavior
- understanding load peaks
- cyber security
- audit
Analytics and monitoring: Kibana + Beats
In addition to logs, Elasticsearch is increasingly used for metrics and monitoring.
Beats - light agents that send data:
- Filebeat - log files
- Metricbeat - CPU, RAM, Network
- Packetbeat - traffic
- Heartbeat - availability of services
- Auditbeat - security events
They send data to Elasticsearch, and Kibana allows you to build dashboards, see system load, analyze API requests, and find slow services.
Elasticsearch <TAG1> is not just a search. It is a distributed indexer which:
- searches quickly by text
- easily scalable
- provides fault tolerance
- supports analytics and aggregations
- it is the basis of popular stacks for logs and monitoring (ELK, Beats)
In the world of big data, Elasticsearch is one of the most convenient tools to quickly find, filter and analyze any information.
Can Elasticsearch be used as a database
Elasticsearch is itself a database. It does not use MySQL, PostgreSQL or any other external DBMS. All data is stored in the form of JSON documents, and physical storage and indexing is provided Lucene.
It can indeed be used as a basic database, but only in cases where the central speed of search and analytics is. It can be a product catalog, a log system, storage of analytical events or time-series data. However, Elasticsearch is not a full-fledged replacement for traditional relational bases. It lacks the classic ACID transactions, entity-to-entity relationships, strict consistency control, and instant search record visibility. Document updates are implemented as deletions and re-indexations, and deletions are not used immediately, but during internal optimizations.
Therefore, it is advisable to use Elasticsearch as a fast search and analytical layer, a separate time-series database or the main repository of logs. For financial transactions, transactions, complex relationships and critically consistent data, traditional DBMSs - PostgreSQL, MySQL, etc. are better suited.
This post doesn't have any additions from the author yet.