What is a time-series database?

22 Nov 12:42

3 min. read

Нотатки про Ruby та RoR

@kovbaska

Post cover: What is a time-series database?

Table of contentsClick link to navigate to the desired location

Why MySQL and PostgreSQL are not suitable for time-series

This content has been automatically translated from Ukrainian.

View Original

Time-series database is a type of storage optimized for time-bound data. Each record in such a system has a timestamp, and time is the main characteristic by which this data is stored, read, and analyzed. Such databases are designed for large streams of events that continuously arrive and are mostly not modified after being recorded.

These databases are used in server monitoring, financial systems, web traffic analytics, IoT devices, or any processes where it is important to see the dynamics of metrics. For example, changes in CPU load over the past hours, sensor temperature every second, or fluctuations in stock indices throughout the day.

Time-series databases provide fast reading of large time ranges, efficient aggregations, compression, and the ability to quickly delete old data without harming performance.

Among popular implementations are InfluxDB, TimescaleDB, Prometheus, VictoriaMetrics, and even Elasticsearch, which is often used for time-series analytics due to its ability to quickly index events and perform aggregated queries. This makes such databases indispensable for monitoring systems, analytics, and working with high-frequency data, where time is a key factor.

MySQL and PostgreSQL can be used for time-series, but they scale poorly under such loads. The reason is not that they are "bad," but that their architecture is not designed for very frequent inserts and huge volumes of data organized specifically by time.

Why MySQL and PostgreSQL are not suitable for time-series

In classical relational databases, records are stored in tables, and indexes are in B-trees. When data is continuously added over time (every second or even every millisecond), the index starts to quickly "inflate," memory segments become fragmented, and frequent INSERTs create a load on page locking and the transaction log. As a result, the database begins to slow down, especially when data is measured not in millions, but in hundreds of millions or billions of rows.

The second drawback is aggregations over large time ranges. For example, getting the "average temperature over three months" in MySQL or PostgreSQL is a slow scan over a huge table. In time-series databases, these operations are performed instantly because they are immediately optimized for range queries and store data in an easily aggregatable format.

Another issue is the deletion of old data. In PostgreSQL, this causes massive bloat and requires VACUUM, which constantly overloads the system. In MySQL, the situation is no better: when a large number of old records are deleted, tables and indexes become fragmented, and performance drops. In time-series databases, this is architecturally resolved: old data is stored in separate "chunks" and simply discarded in whole blocks without load.

Also, regular SQL databases do not have efficient data compression optimized specifically for sequences of values over time. In time-series stores (InfluxDB, VictoriaMetrics, Prometheus), compression allows storing the same data 5–20 times more compactly.

In summary: MySQL/PostgreSQL work great with classical transactional data, but with large streams of telemetry, logs, or sensor readings, their performance quickly drops. Time-series databases are specifically designed for such loads: they optimize the writing, storing, reading, and deleting of time-bound data.