All original content is created in Ukrainian. Not all content has been translated yet. Some posts may only be available in Ukrainian.Learn more

What is Exponential Backoff and Random Jitter?

Post cover: What is Exponential Backoff and Random Jitter?
Table of contentsClick link to navigate to the desired location
This content has been automatically translated from Ukrainian.
In the development of distributed systems, errors are not an exception but a norm. The network may hang, the service may temporarily fail, the database may refuse for a second. And at that moment, a simple yet dangerous question arises: how exactly should the request be retried? If done the same way, the system can easily take itself down.
Angry Spongebob Squarepants GIF.gif
This is where exponential backoff and random jitter come into play.

Exponential Backoff

Exponential Backoff is a retry strategy where each subsequent retry occurs with an increasingly longer delay. The idea is very simple: if something is broken, there’s no need to immediately knock on the same door again. The first retry can be almost instantaneous, the second after one second, the third after two, then four, eight, and so on. The delay grows exponentially.
This gives the system time to "catch its breath." If the service is temporarily overloaded or crashes due to peak load, exponential backoff reduces the pressure rather than intensifying it. Without this strategy, clients start to mass retry simultaneously, and even a healthy service may not withstand such a barrage.
But there’s a sneaky aspect here. Imagine a thousand clients who simultaneously receive an error and use the same backoff formula. They will all wait one second, then two, then four - and hit the server together again. This results in a synchronized crowd coming in waves. This is a familiar problem - the "thundering herd" effect.
That’s why random jitter is almost always added to exponential backoff. Jitter is a small random shift in the delay. Instead of waiting exactly 4 seconds, one client might wait 3.2, another 4.7, and someone else 2.9. All those exponential delays are preserved, but the requests no longer come in simultaneously.
With jitter, the system starts to behave "more lively" and stably. The load is spread out over time, the service can recover more easily, and the likelihood of repeated crashes due to mass retries sharply decreases. This is especially important for APIs, queues, job workers, and any integrations with external services.
Tor Muppets GIF.gif
In summary, exponential backoff answers the question "when to retry?", while jitter answers "how to do this asynchronously with everyone else?” Together, they form the foundational architecture for reliable systems. If you have retries without backoff - that’s a red flag. If there’s backoff without jitter - that’s yellow. But when both are present, the system has a significantly better chance of surviving real, as opposed to laboratory failures.

This post doesn't have any additions from the author yet.

What is Elasticsearch and how does it work?
22 Nov 12:35

What is Elasticsearch and how does it work?

Нотатки про Ruby та RoR
Нотатки про Ruby та RoR@kovbaska
What is a time-series database?
22 Nov 12:42

What is a time-series database?

Нотатки про Ruby та RoR
Нотатки про Ruby та RoR@kovbaska
22 Nov 12:49

What is VACUUM in PostgreSQL?

Нотатки про Ruby та RoR
Нотатки про Ruby та RoR@kovbaska
What is a B-Tree (Balanced Tree)?
22 Nov 12:58

What is a B-Tree (Balanced Tree)?

Нотатки про Ruby та RoR
Нотатки про Ruby та RoR@kovbaska
Fix for the issue of installing Ruby 3.4.3 (and not only) via RVM on macOS (Apple Silicon)
30 Dec 14:05

Fix for the issue of installing Ruby 3.4.3 (and not only) via RVM on macOS (Apple Silicon)

Нотатки про Ruby та RoR
Нотатки про Ruby та RoR@kovbaska
Thundering Herd Problem: what it is and why it breaks production
15 Jan 10:14

Thundering Herd Problem: what it is and why it breaks production

Нотатки про Ruby та RoR
Нотатки про Ruby та RoR@kovbaska
Connecting the Elasticsearch service to a Rails application (Coolify in the cloud, server on Hetzner).
15 Feb 13:45

Connecting the Elasticsearch service to a Rails application (Coolify in the cloud, server on Hetzner).

Нотатки про Ruby та RoR
Нотатки про Ruby та RoR@kovbaska
"No space left on device" - when Docker has consumed the entire disk
15 Feb 19:57

"No space left on device" - when Docker has consumed the entire disk

Нотатки про Ruby та RoR
Нотатки про Ruby та RoR@kovbaska
15 Feb 20:03

Sidekiq 7.3.x and connection_pool 3.0 - incompatibility that breaks workers

Нотатки про Ruby та RoR
Нотатки про Ruby та RoR@kovbaska