Recovered node’s translog usage after replica is promoted to primary due to node loss

hamakim · June 6, 2025, 8:43am

Hi, I have a question regarding shard recovery and translog handling in Elasticsearch.

Let’s say a primary shard goes offline due to a sudden node failure. One of the replica shards is promoted to be the new primary. Later, the previously failed node (which had the original primary shard) comes back and rejoins the cluster.

In this case, I’m curious about whether Elasticsearch uses the translog from the recovered node during shard recovery. What if the translog on the recovered node contains writes that were not replicated to the promoted replica before the failure? Wouldn’t there be a risk of missing or conflicting writes between the promoted primary and the recovered node’s translog?

How does Elasticsearch handle such potential inconsistencies and ensure data safety in this kind of scenario?

Thanks in advance for your help!

DavidTurner · June 6, 2025, 10:33am

The recovering shard will roll back to the last "safe" commit point (i.e. a commit which contains only fully-replicated writes) and then replay only "safe" operations from its local translog (i.e. operations it knows to be fully-replicated).

hamakim · June 7, 2025, 6:21am

If a node that previously held the primary shard rejoins the cluster after its replica has already been promoted to primary, does Elasticsearch completely discard any unreplicated translog operations that exist only on the recovered node?

Thanks again for your help!

DavidTurner · June 7, 2025, 8:42am

Yes. Any such unreplicated translog entries would correspond with write operations that were never acknowledged to the client and thus they are safe to discard.

Topic		Replies	Views
How Translog Work on elastic Elasticsearch	7	545	April 8, 2023
After I restart a node or a node disconnected with cluster, all data will copy from remote, why not recovery from local Elasticsearch	2	814	November 3, 2017
Corrupt primary shard, how to recover from replica shard? Elasticsearch	3	278	March 4, 2024
How does a recovering node validate any shard information/data during recover? Elasticsearch	4	297	July 6, 2017
How does elasticsearch move a primary shard? Elasticsearch	10	5206	January 18, 2019

Recovered node’s translog usage after replica is promoted to primary due to node loss

Related topics