SZ Cluster node rename

  • 1
  • Question
  • Updated 4 months ago
  • Answered
  • (Edited)
I'm planning some overdue architectural updates to our SZ124 deployment

1-Migrating our current only SZ124 from 1G to 10G
  • Requires new Managment IP
  • Would like to update the hostname (sz01.locationA.mycompany.com)
2-Deployment of a new SZ124 with 10G uplink for failover
  • Will be added to the same cluster as current SZ
  • Will be hosted at a second location for geo-redundancy (sz01.locationB.mycompany.com)
My concern is, without having much experience with clustering, the hostnames and a possibility of conflict.  As far as I can tell it looks like only the hostname(first word before .) is considered as part of the node name and not the full domain.

If I went with our current deployement standards, I'd end up with two sz01 hostnames.  Should I be concerned about this?  Or should I try deploying unique hostnames like sz01a.locationA and sz01b.locationB...

also wondering if there's a way to rename a cluster?  It was deployed before my time using cisco "wlc" as the convention that kinda irks me ;)
Photo of 80211WiGuy

80211WiGuy

  • 51 Posts
  • 5 Reply Likes

Posted 7 months ago

  • 1
Photo of Sanjay Kumar

Sanjay Kumar, Employee

  • 198 Posts
  • 74 Reply Likes
Hi,

Host name and Cluster name need not to be as unique. I have a setup running in lab with hostname as "Ruckus-node" and cluster name as "Cluster". You can keep it differently.
Only the control plane will follow the hostname as "Ruckus-node-C".

Follow the below steps to change the settings from your vSCG (or SCG/SZ100) CLI.

1. To change the hostname\node:
https://support.ruckuswireless.com/articles/000004587

    a.  Go to Enable mode:  enable
    b.  Go to Config mode:   config
    c.  Configure new hostname:   hostname <new-host-name>
    example "sz100-89(config)# hostname Ruckus-node"
        Warning: this will reset some system services, ok to continue (yes/no)?
    d.  Enter 'yes'

You will see the hostname updated.

2. To change the Cluster name :

    a.  Go to Enable mode:  enable
    b.  Go to Config mode:   config
    c.  Configure new Cluster name : cluster-name <new name>
example : Ruckus-node(config)# cluster-name Cluster
Warning: this will reset some system services, ok to continue (yes/no)?
    d.  Enter 'yes'

Note: We do not recommend changing the Cluster name since it is a important database information to keep the cluster online. Changing it may disconnect the cluster nodes.

If you only have one node in the cluster, then you can change the Cluster name. 

If you have 2 nodes, and if one of the node disconnects from the Cluster after changing the Cluster name, then you might have to factory reset the disconnected node to rejoin the cluster again.
Photo of 80211WiGuy

80211WiGuy

  • 51 Posts
  • 5 Reply Likes
Thank you for responding so quickly Sanjay - I think I will keep the cluster name the same but when I rename the node with a new hostname - will it take the full domain or just the first part.

For example

hostname sz01.locationA.company.com
-will this give me an error and force me to do this below?
hostname sz01

-I wish I had a lab to test this on but I don't unfortunately.
Photo of 80211WiGuy

80211WiGuy

  • 51 Posts
  • 5 Reply Likes
I was concerned about having two sz01 showing up in the cluster so based on your feedback I think we'll name them:

sz01-Primary.locationA.company.com
and
sz01-Backup.locationB.company.com

We only intend to deploy a 2 controllers so I think a 1+1backup will be the default cluster function for these.

Thanks!
Photo of Albert Pierson

Albert Pierson, Employee

  • 173 Posts
  • 149 Reply Likes
Hi 80211WiGuy,

Just for your better understanding when you cluster 2 SZ nodes together they are both active and both will be managing Access Points.  The Access Points are configured with a list of available controllers in a pseudo random order so they should be spread out between all nodes.

There is no concept of a primary node in this configuration, both nodes are considered peers.  One node will elect to become the leader primarily to act as single point of time synchronization and also to prevent data base conflicts.  The leader node may change if the present one stops responding.  So labeling one as "primary" and the other "backup" could be misleading.

To be correct this is not 1+1 redundancy (where one device backs up another) but better described as Active/Active or peer clustering. 

While supported, you need to be careful with geographically separated nodes in a single cluster.  The Data Base is shared between the two nodes and communication between the nodes needs to be impeccable with minimal latency to prevent introducing errors into the data base.  Typically the latency should be <50msec for newer versions.

It is certainly up to you to name your nodes anyway that makes sense to you, but I wanted to describe the operation for you information.

I hope this information is helpful.

Cheers,

Albert
Photo of 80211WiGuy

80211WiGuy

  • 51 Posts
  • 5 Reply Likes
Thanks Albert I appreciate your detialed response.
Our SE informed us that the SZ cluster functions in an N+1 manner.

They way I understood it is that in a 2 node cluster, one node takes on the full AP load with the other in sync but not controlling any APs unless the first node fails.  In a 3 node cluster, there is still 1 node in sync but acting as a standby unit while the other two spread the AP load.  Am I mistaken?

Regarding latency, we have dedicated fiber between the two with very low latency >1ms.

Thanks,
Greg
Photo of Albert Pierson

Albert Pierson, Employee

  • 173 Posts
  • 149 Reply Likes
Hi Greg,

Yes, I believe the SE's misuse N+1 to describe the SZ cluster feature.  I have been supporting SZ since it was first envisioned and it has always operated a Active/Active Peer operation where the AP's where the AP's are spread out between the nodes automatically. 

The SZ will send a list of available controllers to the AP's and the AP's try to connect to the first node in the list.  If that SZ does not respond then the AP's will (within about 60 seconds) automatically try to connect to the next node in the list. This list is sent in a psudo random order to try to spread out the load.  When a new node is added or a node is deleted to the cluster then the AP's are re-configured with C-list (controller list) of available nodes.

In a 3 node cluster, all 3 nodes are active and the AP's should be spread out between the nodes.  Any of the nodes can be the leader which has no special operational activities beyond providing synchronization (it acts as NTP client to network and NTP server for other nodes and the AP's so all devices are running same time).

The SZ cluster system is designed to survive a single node failure.  There are two copies of all data in two nodes.  If one node fails the data is still available.  In the case of a 3 or 4 node cluster it is important to delete a failed node as soon as it is detected so the data base can be reconfigured with 2 copies of each data in case another node fails.  If 2 nodes fail at the same time then there will be serious data base issues which may require restoring a backup.

In vSZ-H or SZ-300 there is a AP balance feature that will re-balance AP's across all nodes if over time or due to outage they have become unbalanced.  There is also a node affinity feature (I think also in vSZ-H or SZ-300) that allows you to tie AP Zones to a particular node.  This is useful if your nodes are geographically separated.

I hope this information clears up your doubts,

Cheers

Albert
Photo of 80211WiGuy

80211WiGuy

  • 51 Posts
  • 5 Reply Likes
Thanks Albert,
This has some serious implications to our design and I'm glad you brought them up before we had a chance to deploy.  I'll have to go back and spend some time thinking on this.

Is there a way to design a SZ implementation with geo-redundant failover? 95% of our deployments are sporadic hotspots which use the GRE dataplane feature.  We don't mind if the clients undergo re-association+DHCP on a new subnet in the case of major failure which is how we were going to originally do this.
Photo of Albert Pierson

Albert Pierson, Employee

  • 173 Posts
  • 149 Reply Likes
Hi Greg,

Ruckus has had a Geo-Redundancy feature since version 3.6 but as far as I can tell it still only support vSZ-H and SZ-300 and not SZ-100 or vSZ-E.  The idea is to have a separate cluster that can act as Standby Cluster in case of catastrophic failure of a NOC where an entire cluster becomes unavailable.  The configuration is automatically synced from Active to Standby.  As of SZ 5.1 this also support Active::Active redundancy and Many to one standby modes.

I have been searching Ruckus internal documents and as far as I can tell this feature has not yet been implemented in SZ-100 or vSZ-E controllers.

If I find differently I will update this forum.

Thanks

Albert
Photo of 80211WiGuy

80211WiGuy

  • 51 Posts
  • 5 Reply Likes
Thanks Albert, your feedback has been instrumental!
Photo of 80211WiGuy

80211WiGuy

  • 51 Posts
  • 5 Reply Likes
Hi Albert,
I was wondering if there's any way to pin WLAN/SSID traffic to one SZ GRE based Data Plane vs another?  My design isn't really impacted by spreading the control plane among different SZs in the same cluster but if theres away to control which SZ the tunneled WLANs terminate on, this would be good enough for us.  Even if it requires manual intervention to flip GRE traffic over to the second SZ in a failure scenario.
Photo of Albert Pierson

Albert Pierson, Employee

  • 173 Posts
  • 149 Reply Likes
Hi Greg,
There is another feature that unfortunately is not available on SZ100, vSZ-E - Node Affinity
This feature configures AP's to prefer one node (data and/or control) per AP Zone.

This feature was never available per WLAN/SSID.  But tunneling (on all SZ platforms) can be enabled per WLAN/SSID.

For fail safe operation all AP's will be configured with a list of all Data Plane IP's available.  As far as I know with SZ100 or vSZ-H there is not way to configure it to prefer one over the other.  If you are using the 2 interface/IP model where the Data Plane is on it's own interface you could use external firewall rules to block which AP's can reach which Data Planes. 

I hope this information is helpful

Thanks
Albert

(Edited)
Photo of 80211WiGuy

80211WiGuy

  • 51 Posts
  • 5 Reply Likes
Thanks for the info Albert!
Photo of 80211WiGuy

80211WiGuy

  • 51 Posts
  • 5 Reply Likes
Albert,
We need to move our only SZ to a new managment IP.  So far I've created an AP Script with the following commmand
"set scg ip ruckuscontroller.mycompany.com"

My question is that when we change the management IP address - will all APs fail back to looking for this DNS entry and redirect to the new management IP, or will they still try to go to the old IP?

I read something that mentioned we might have to re-initialize something on the AP side but I'm not sure if that was only if you were moving from one active SZ to another.
Photo of Albert Pierson

Albert Pierson, Employee

  • 173 Posts
  • 149 Reply Likes
Hi,

The set scg ip <control_ip_or_url>  is only for initial AP discovery.
Once an AP has discovered and been provisioned by a controller then the full list of available nodes is stored in a different file in the AP operating system.  So doing set scg ip alone is not enough.

You can create an AP CLI script to run from existing SZ that has 2 commands:

set scg init  (this will clear previous SZ list and reboot AP)
set scg ip <ip_or_url_of new controller>

If the two commands are run simultaneously then the second command should "take" before the AP reboots.

If the second command is not issued the AP will use DHCP option 43 and then DNS lookup for ruckuscontroller.<local_domain_via_dhcp> to try and discover a controller.

I hope this helps.
Photo of 80211WiGuy

80211WiGuy

  • 51 Posts
  • 5 Reply Likes
Thanks for the response!

We plan to update the "ruckuscontroller" dns entry for our local domain as part of this but I'm concerned what will happen when we change the existing management IP of our only SZ.

Will the APs time out of trying to reach the old SZ IP and query ruckuscontroller.localdomain to find the new IP?