Allocation awarenes vs data tiering problems

nisow95612 · June 3, 2025, 12:46pm

Minimal setup is like this:

Node	Data tier	Rack
1	data_hot,data_content	A
2	data_hot,data_content	B
3	data_hot,data_content	B
4	data_cold	A
5	data_cold	B

Node 1 failed.
I expected elastic create new replicas in other rack B node.
Awareness decider refused: there are [2] copies of this shard and [2] values for attribute [rack] ([A, B] from nodes in the cluster and no forced awareness) so there may be at most [1] copies of this shard allocated to nodes with each value, but (including this copy) there would be [2] copies allocated to nodes with [node.attr.rack: B]

Is my configuration wrong or allocation awareness works badly with data tiering?

leandrojmp · June 3, 2025, 12:59pm

What does your configuration looks like? You didn't share anything from your elasticsearch.yml.

Are you using forced awareness on the rack attribute, right?

If so, this is working as designed.

nisow95612 · June 3, 2025, 1:18pm

No force awareness just like awareness decider says.
Can awareness decider maybe count rack A alive because of cold node 4?

nisow95612 · June 3, 2025, 1:25pm

node.name	node.roles	node.attr.rack
n1	data_hot,data_content,master	A
n2	data_hot,data_content,master	B
n3	data_hot,data_content,master	B
n4	data_cold	A
n5	data_cold	B

cluster.routing.allocation.awareness.attributes: rack

leandrojmp · June 3, 2025, 1:31pm

You need to provide more context.

How many primary and replicas does this index have? Can you share the entire allocation explain? And what is the tier preference for this index? Does the index have any unassigned shard?

nisow95612 · June 3, 2025, 1:59pm

OK. And btw thank you for trying to help!

Index is 1 shard, 1 replica, data_hot tier.

This in create index settings

  "index.routing.allocation.include._tier_preference": "data_hot",
  "index.number_of_shards": "1",
  "index.number_of_replicas": "1"

Before n1 fail was one copy on n1 and other copy on n2.
After n1 fail I have one copy on n2 and other copy not allocated.

Full output for n3

    {
      "node_id" : "*****",
      "node_name" : "n3",
      "transport_address" : "*****",
      "node_attributes" : {
        "rack" : "B",
        "xpack.installed" : "true",
        "transform.node" : "false"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "awareness",
          "decision" : "NO",
          "explanation" : "there are [2] copies of this shard and [2] values for attribute [rack] ([A, B] from nodes in the cluster and no forced awareness) so there may be at most [1] copies of this shard allocated to nodes with each value, but (including this copy) there would be [2] copies allocated to nodes with [node.attr.rack: B]"
        }
      ]
    },

leandrojmp · June 3, 2025, 2:09pm

Please share the full responses not just part of it, it is really complicated to troubleshoot without having the full context.

Also share the result of running GET _cat/shards/name-of-the-index?v on Kibana Dev Tools.

From what you shared it is not sure if there is any issue or if it is working as designed.

Also, which version are you running? You didn´t say.

nisow95612 · June 4, 2025, 8:04am

OK this is dumb. Explain allocation api explains different when I send shard name or not.

Explain allocation without shard name

{
  "index" : "newdatab_80",
  "shard" : 0,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "NODE_LEFT",
    "at" : "2025-06-02T15:39:21.158Z",
    "details" : "node_left [*****]",
    "last_allocation_status" : "no_attempt"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions" : [
    {
      "node_id" : "*****",
      "node_name" : "n4",
      "transport_address" : "192.168.0.4:9300",
      "node_attributes" : {
        "rack" : "A",
        "xpack.installed" : "true",
        "transform.node" : "false"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "data_tier",
          "decision" : "NO",
          "explanation" : "index has a preference for tiers [data_hot] and node does not meet the required [data_hot] tier"
        }
      ]
    },
    {
      "node_id" : "*****",
      "node_name" : "n3",
      "transport_address" : "192.168.0.3:9300",
      "node_attributes" : {
        "rack" : "B",
        "xpack.installed" : "true",
        "transform.node" : "false"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "awareness",
          "decision" : "NO",
          "explanation" : "there are [2] copies of this shard and [2] values for attribute [rack] ([A, B] from nodes in the cluster and no forced awareness) so there may be at most [1] copies of this shard allocated to nodes with each value, but (including this copy) there would be [2] copies allocated to nodes with [node.attr.rack: B]"
        }
      ]
    },
    {
      "node_id" : "*****",
      "node_name" : "n2",
      "transport_address" : "192.168.0.2:9300",
      "node_attributes" : {
        "rack" : "B",
        "xpack.installed" : "true",
        "transform.node" : "false"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "a copy of this shard is already allocated to this node [[newdatab_80][0], node[*****], [P], s[STARTED], a[id=yHKkZBPcm7zHY5HZdnBCTh]]"
        },
        {
          "decider" : "awareness",
          "decision" : "NO",
          "explanation" : "there are [2] copies of this shard and [2] values for attribute [rack] ([A, B] from nodes in the cluster and no forced awareness) so there may be at most [1] copies of this shard allocated to nodes with each value, but (including this copy) there would be [2] copies allocated to nodes with [node.attr.rack: B]"
        }
      ]
    },
    {
      "node_id" : "*****",
      "node_name" : "n5",
      "transport_address" : "192.168.0.5:9300",
      "node_attributes" : {
        "rack" : "B",
        "xpack.installed" : "true",
        "transform.node" : "false"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "data_tier",
          "decision" : "NO",
          "explanation" : "index has a preference for tiers [data_hot] and node does not meet the required [data_hot] tier"
        }
      ]
    }
  ]
}

Explain allocation with shard name

{
  "index" : "newdatab_80",
  "shard" : 0,
  "primary" : true,
  "current_state" : "started",
  "current_node" : {
    "id" : "*****",
    "name" : "n2",
    "transport_address" : "192.168.0.2:9300",
    "attributes" : {
      "rack" : "B",
      "xpack.installed" : "true",
      "transform.node" : "false"
    },
    "weight_ranking" : 4	
  },
  "can_remain_on_current_node" : "yes",
  "can_rebalance_cluster" : "no",
  "can_rebalance_cluster_decisions" : [
    {
      "decider" : "rebalance_only_when_active",
      "decision" : "NO",
      "explanation" : "rebalancing is not allowed until all replicas in the cluster are active"
    },
    {
      "decider" : "cluster_rebalance",
      "decision" : "NO",
      "explanation" : "the cluster has unassigned shards and cluster setting [cluster.routing.allocation.allow_rebalance] is set to [indices_all_active]"
    }
  ],
  "can_rebalance_to_other_node" : "no",
  "rebalance_explanation" : "rebalancing is not allowed, even though there is at least one node on which the shard can be allocated",
  "node_allocation_decisions" : [
    {
      "node_id" : "*****",
      "node_name" : "n3",
      "transport_address" : "192.168.0.3:9300",
      "node_attributes" : {
        "rack" : "B",
        "xpack.installed" : "true",
        "transform.node" : "false"
      },
      "node_decision" : "yes",
      "weight_ranking" : 3
    },
    {
      "node_id" : "*****",
      "node_name" : "n4",
      "transport_address" : "192.168.0.4:9300",
      "node_attributes" : {
        "rack" : "A",
        "xpack.installed" : "true",
        "transform.node" : "false"
      },
      "node_decision" : "no",
      "weight_ranking" : 1,
      "deciders" : [
        {
          "decider" : "data_tier",
          "decision" : "NO",
          "explanation" : "index has a preference for tiers [data_hot] and node does not meet the required [data_hot] tier"
        }
      ]
    },
    {
      "node_id" : "*****",
      "node_name" : "n5",
      "transport_address" : "192.168.0.5:9300",
      "node_attributes" : {
        "rack" : "B",
        "xpack.installed" : "true",
        "transform.node" : "false"
      },
      "node_decision" : "no",
      "weight_ranking" : 2,
      "deciders" : [
        {
          "decider" : "data_tier",
          "decision" : "NO",
          "explanation" : "index has a preference for tiers [data_hot] and node does not meet the required [data_hot] tier"
        }
      ]
    }
  ]
}

Explain shards

index       shard prirep state      docs   store ip          node
newdatab_80 0     p      STARTED    1283 139.1kb 192.168.0.2 n2
newdatab_80 0     r      UNASSIGNED

Version 7.17.28. because one of my apps still refuses ES8

leandrojmp · June 4, 2025, 12:43pm

It is because the first explain will get a random shard and explain the allocation for it, normally it gets a random unassigned shard, which is the case, this first explain is for the replica shard.

The second one is for the primary shard.

I do not use awereness, but I'm not sure what is the issue here, from the documentation it seems that it should allocate the shard on the other node in rack B, this is mentioned in the forced awareness part.

With this example configuration, if you have two nodes with node.attr.zone set to zone1 and an index with number_of_replicas set to 1 , Elasticsearch allocates all the primary shards but none of the replicas. It will assign the replica shards once nodes with a different value for node.attr.zone join the cluster. In contrast, if you do not configure forced awareness, Elasticsearch will allocate all primaries and replicas to the two nodes even though they are in the same zone.

So, the primary is in node n2, so with just the awereness configured without using forced awereness I would expect the replica to be allocated to the node n3, not sure why it is not doing that.

You will need to see if someone from elastic can provide more context or maybe open an issue on github.

From the documentation I would expect the replica to be allocated to node 3, but maybe the documentation is wrong.

nisow95612 · June 4, 2025, 1:18pm

Yeah. So thank you for confirming it should allocate on n3 like this.

Off topic

Microsoft account is unfortunately big no for me so I have to hope somebody from elastic comes?

RainTown · June 4, 2025, 1:33pm

Completely confused- what’s Microsoft got to do with this ?

Update: @nisow95612 sent me PM and refuses to use GitHub because GitHub is now owned by Microsoft. So I guess he’s simply not prepared to create any bug reports. Maybe I was dense to not appreciate this level of … inflexibility.

nisow95612 · June 4, 2025, 2:51pm

Off topic

Update: Here is real message I send: Microsoft owns website (github) that leandrojmp recommended for contacting support. Microsoft requires their account to log in.

Means: It is not because they own github. It is because they require microsoft accounts which are pain.

RainTown · June 4, 2025, 3:13pm

Not getting into argument here, in future please answer on forum, not via PM please..

Your position is clear. You might have hit a bug, but you’re not prepared to report it on GitHub for reasons given. This is certainly your right.

I’ve helped resolve a bunch of problems on the forum. Others have done far more than me. It’s rare the people who raise queries are as inflexible.

You could also open a support ticket if you are paying customer. But, wild guess, you’re not?

I hope someone does answer your query cos it’s an interesting possible-bug.

nisow95612 · June 4, 2025, 3:22pm

Off topic

OK, wil remember for next time. Did not want off topic spam in conversation.
Sorry for being unable to report it on GitHub. But lot of iflexibility is with microsoft that makes their accounts pain. I guess you are big tech customer so they go easier on you?

leandrojmp · June 4, 2025, 3:53pm

Not sure if I understand, but you can create a Github account using any email account you have, you do not need a Microsoft Account to have an account on Github.

Topic		Replies	Views
Question shard allocation awareness? Elasticsearch	16	2619	November 14, 2019
Shard and Zone awareness Elasticsearch	9	1628	March 22, 2018
Shard Awareness and Allocation Elasticsearch	3	904	July 5, 2017
Cluster.routing.allocation.awareness.attributes not working as documented Elasticsearch	4	632	July 6, 2017
Just Pushed: Cluster Shard Allocation Awareness Elasticsearch	2	257	July 6, 2017

Allocation awarenes vs data tiering problems

Related topics