Recovered node’s translog usage after replica is promoted to primary due to node loss

Hi, I have a question regarding shard recovery and translog handling in Elasticsearch.

Let’s say a primary shard goes offline due to a sudden node failure. One of the replica shards is promoted to be the new primary. Later, the previously failed node (which had the original primary shard) comes back and rejoins the cluster.

In this case, I’m curious about whether Elasticsearch uses the translog from the recovered node during shard recovery. What if the translog on the recovered node contains writes that were not replicated to the promoted replica before the failure? Wouldn’t there be a risk of missing or conflicting writes between the promoted primary and the recovered node’s translog?

How does Elasticsearch handle such potential inconsistencies and ensure data safety in this kind of scenario?

Thanks in advance for your help!

The recovering shard will roll back to the last "safe" commit point (i.e. a commit which contains only fully-replicated writes) and then replay only "safe" operations from its local translog (i.e. operations it knows to be fully-replicated).

1 Like

If a node that previously held the primary shard rejoins the cluster after its replica has already been promoted to primary, does Elasticsearch completely discard any unreplicated translog operations that exist only on the recovered node?

Thanks again for your help!

Yes. Any such unreplicated translog entries would correspond with write operations that were never acknowledged to the client and thus they are safe to discard.

2 Likes