When roaming from one AP to another, dropping telnet sessions

  • 1
  • Question
  • Updated 1 week ago
We have a 5 week old Ruckus install. All firmware is current. Over the last 8 days, we have started to experience telnet sessions dropping and resetting our people in a specific application when they hand off from one AP to another (which causes the app to hang and Administrator intervention is needed to resolve every time). Prior to 8 days ago, this was not an issue. Yes, we've looked at Windows updates (we even rolled back test machines to July of this year). Still the same issue. Is there a chance that a long term setting (like channelfly or something) is causing this to start happening because it took time for said setting to do it's job? We're past the basic troubleshooting, we're having to get into the "odd, nobody thought of that" category. Yes we have a ticket open with support, but we're not moving forward much over the last 4 days on that front. Anyone?
Photo of David Howard

David Howard

  • 3 Posts
  • 0 Reply Likes

Posted 1 week ago

  • 1
Photo of Andrea Coppini

Andrea Coppini

  • 66 Posts
  • 29 Reply Likes
Are you tunnelling?

Do you have proper coverage overlap between your APs?  The client might be disconnecting/connecting instead of roaming, which will cause the Telnet session to timeout.

Keep in mind that roaming is a 100% client-side operation, the infrastructure can only assist in the roam, not control it.
Photo of alexf

alexf

  • 28 Posts
  • 4 Reply Likes
Hi David, 

Welcome to the forum and thanks for sharing your situation. 

There was a similar discussion in the past, maybe it is related:

wireless handheld devices losing telnet connection

You should check if something change in your environment in the last 8 days (New rogue APs, new sources of RF interference for which you should use a spectrum analyzer).  
Try to see if the issue happens with ChannelFly disabled. 

Regards, 
Alex
Photo of David Howard

David Howard

  • 3 Posts
  • 0 Reply Likes
Thank you both for taking the time to reply. To Alex: Yes we have seen that discussion and it does not seem to fit/apply to what we're experiencing. 

To Andrea: We have great coverage and overlap, and do have the auto tuning function enabled to try to help us keep that in good shape. In a couple of instances, we have manually set power settings on APs. We're aware of the decision to change AP's lay within the client itself. However, as mentioned hasn't been a problem until these 8 days. To Alex's point about new things, we've gone through those gymnastics because the first thing we said was "what has changed". Clearly something is different, but we cannot find it. The obvious stuff of new interference, AP's, etc has not proven to be a path that has answers. Windows updates were being looked at. We're even looking through the logs of the controller to see if it auto updated something that we didn't mean for it to do. So far, the level 1/level 2 kind of thought process hasn't been working. Hence, asking for the oddball thinking, because so far, no answers on our end either.

Please continue to ask questions of us, might make us say "oh crap" but so far these are questions asked/answered to at least a level of "doesn't seem to make a difference". Thanks again!
Photo of Alex Shalima

Alex Shalima

  • 13 Posts
  • 1 Reply Like
Hi David,

It sounds like your network stack is experiencing a change whenever "roaming" is occurring. Telnet session is nothing but an TCP socket. There is something causing this socket to reset or time-out.

To reset a socket you would have to have a topology change on Layer 2 or Layer 3 of your network stack (VLAN change or IP change).

To time-out a socket you simply take longer time to reply to an active TCP session. Many factors can contribute to this, including but not limited to interference or latency on the network. This can simply be tested with PINGing your server and walk around the building and check your reply times and packet drop rate. With Ruckus roaming you should not loose more than a packet or two in ideal situation.

From the data above:

1. Are you using VLAN Pooling (are you changing VLANs)
2. Are you using Dynamic VLANs by chance?  (again, are you changing VLANs)
3. Are you using AAA RADIUS Authentication for your clients? (does your TCP socket time out before you get re-authenticated)


Best,
Alex
Photo of David Howard

David Howard

  • 3 Posts
  • 0 Reply Likes
Hello Alex. Thank you for the questions. Although my answers to 1,2 and 3 are "no" it did get me to think of a follow up line of thoughts. If we're on a DHCP environment and the hand off occurs with a 2 second delay, could it be trying to "get a DHCP address" even thought it already has one (thus getting the same one back), causing the drop? Hmmm, it's silly thought even as I see it in print, but I'm trying to think of something crazy that might actually lead us into the right path. It could just be that a 1-2 second handoff delay is simply dropping Telnet connectivity, but again WHY is it only doing that just recently is still the biggest aggravation, lol.
Photo of Alex Shalima

Alex Shalima

  • 10 Posts
  • 1 Reply Like
Hi David,

My answer would be - something changed and you are not aware of it.

You might be on something here. Unless there is a change on Layer 1 (physical connection) and/or Layer 2 (VLAN change) during roaming the only other reason for a client to get a new DHCP lease is because one of the DHCP packets never made it there. In that case the client will ask for a new IP, hence causing Layer 3 stack to reset.

Do a packet capture on your Gateway to make sure you're not dropping/filtering or mis-routing any packets for your DHCP. 

Another idea would be to static an IP on a client device and try roaming again. That would exclude DHCP server from the equation completely.


Best,
Alex
Photo of Jakob Peterhänsel

Jakob Peterhänsel

  • 57 Posts
  • 19 Reply Likes
If your can tail your DHCP log, that should be easy to see.
DHCP clients normally ask for a confirmation of the IP after any network changes, so if the IP stack thinks the network 'changed' during the roaming between AP's, you will see a DHCP Req, Response & Ack in the log.If you're on a unix platform, tail -f /var/log/dhcpd | grep -I <MAC>

One possible issue:
You have one or more AP's with a broken VLAN on the uplink, that still broadcast the SSID for it (as it does not test if the SSID VLAN works)But you should see that on the client, as the DHCP address changes from your lease IP to a self assigned one.
If you're on a new macOS device, the Airport icon in the menu will show an exclamation mark when that happens.

Check & re-check VLAN tagging/trunks everywhere, or trace your devices MAC on the system when the connection mess up.