Matching windows cluster node ID to physical server

So you’ve built your Windows 2008 Cluster, and you did it sensibly with good logical names like

  • cluster-node-01
  • cluster-node-02
  • cluster-node-03

It’s been running quite happily, you’ve extended it and upgraded it. But now you’ve got a problem, a resource won’t start up when it’s moved to a new node. So you’ve followed the instructions to get the last 5 minutes of Cluster logs by running:

C:\>cluster log /gen /span:5

And you’ve found that the error is something to do with:

00000790.0000545c::2012/01/09-11:16:15.778 ERR   [RCM]s_RcmRpcGetResourceState: ERROR_CLUSTER_GROUP_MOVING(5908)' because of ''Cluster Disk 3' is owned by node 4, not 6.'

But wait a second, what’s this node 4 and 6 business? And which node is the one causing the problem?

The simplest way to find out which node is which is to simply drop to a cmd prompt on one of the nodes and:

C:\>cluster node /status
Listing status for all available nodes:

Node           Node ID Status
-------------- ------- ---------------------
cluster-node-04      1 Up
cluster-node-02      4 Up
cluster-node-01      5 Up
cluster-node-03      6 Up

C:\>

And there you have a nice little lookup table for you.

But why aren’t they aren’t the in nice logical order you installed them in? Well, remember that time you had to evict cluster-node-01 because of the main board fault and rebuilt it on new hardware? Or when you upgraded the hardware by adding in a new ‘better’ node and then removed/replaced the nodes 1 by 1.

Every time the Cluster sees a ‘new’ node it assigns it a shiny new Node number. So when you first built the cluster it was a nice correlation. But the good news is that as it’s going to change fairly infrequently it’s pretty easy to keep a track of it.

 

Stuart Moore

Stuart Moore

Nottingham based IT professional with over 15 years in the industry. Specialising in SQL Server, Infrastructure design, Disaster Recovery and Service Continuity management. Now happily probing the options the cloud provides

When not in front of a computer, is most likely to be found on the saddle of his bike

More Posts - Website

Follow Me:
TwitterLinkedInGoogle Plus

4 Responses to Matching windows cluster node ID to physical server

  1. Yusuf October 20, 2013 at 12:54 #

    When nodes are added to a cluster their NodeID is allocated as an increment to the last one. But in case we are starting to build a cluster, say with 5 or 6 nodes, can we expect/guess the node ID that will be allocated to each of them? This number doesn

    • Stuart Moore
      Stuart Moore October 22, 2013 at 08:00 #

      Hi Yusuf,
      Yes, you can make a good guess as you build your cluster that the first node will be 1, the second 2, and so on.

      The problem arises if you have to evict and rebuild a node. And the longer a cluster survives the more likely it is that the Node IDs won’t be sequential or completely related to the machine name.

      This is also important when I arrive at a client’s site. I won’t have had any prior knowledge of the history of their cluster, but need to make sure I’ve got the Node IDs matched exactly to Physical machines so I can start to solve their problems.

      Stuart

      • Yusuf October 22, 2013 at 08:20 #

        When building with more than a singe node. At the time of stating, I’ve not seen a reliable way to predict which node will get which ID. There is some order but not known to me.
        Its certainly not like first node getting 1 and then the second gets a 2.

        • Stuart Moore
          Stuart Moore October 23, 2013 at 08:38 #

          Note, I said “Good guess” :). I always tend to install nodes sequentially and individually to ensure everything is good before I move on to the next one. After each node’s installed I’ll verify it’s settings and values, recording them in a spreadsheet before moving on to the next one. I may be a bit less rigorous when building test environments that aren’t going to last very long, but Production clusters should be built very carefully.

          The key word is ‘verify’. There are a lot of assumptions that can be made during a Cluster build, but they may not actually be correct every time. A lot of problems I come across with clusters spring from people having made assumptions while building them, or resolution is delayed because they’ve been basing their strategy on assumptions which were wrong. I spend a lot of time on every cluster build documenting everything, so I can point to a document rather than assuming anything. And when I begin to fix an issue on a cluster without documentation then the first thing I’ll do is verify everything and create that missing documentation, which often leads to discovering the underlying cause and a speedy resolution.

Leave a Reply