D

14 Messages

 • 

310 Points

Thu, Sep 30, 2021 9:28 PM

No connectivity for a few seconds after connecting to WiFi

I have 3 Ruckus APs (720, 710, 510) connected to a switch, which is connected to a router/DHCP server/DNS server.

When connecting a client device to the WiFi network, there is a short period of time (1 to 2 seconds) immediately after the WiFi connects where the client is unable to reach the network. 

The issue only happens immediately after connecting - after those 1-2 seconds, the WiFi on all my devices works great, is rock solid, and blazing fast. And - it only happens on 5G, or at least, this delay is much longer on 5G than 2G. It happens on all 3 APs.

Why does a delay of a few measly seconds matter? It matters because Android uses network unreachability detection (NUD) to determine whether it's connected to a usable network. It does so by sending ARP requests to the default gateway. If NUD discovers the network is unreachable (i.e. no responses to these ARP requests arrive), it terminates the WiFi connection. Sometimes, NUD triggers before this weird problem fixes itself.

As a result, my Android phone sometimes takes 2, 3, or 4 attempts until it successfully connects to my WiFi. This is particularly annoying during roaming, as the wifi drops when walking around the house.

Is someone else observing this behavior? (The issue might also be on the client side, or with the switch, or with the router - but the fact that it is limited to 5G makes me very, very suspicious of the AP: EDIT: Yep, the culprit is the AP/Unleashed. I'm really peeved to find bugs like that in a networking device with a $1200 MSRP.)

Thanks,

- Dave.

Official Rep

 • 

1.3K Messages

 • 

17.7K Points

2 m ago

I have not seen this behavior, however, it is worth checking why it is happening.

Could you run the client troubleshooting utility and try to track a client connection while reproducing the problem. Check exactly where this delay is happening.

Also taking packet capture will help here.

14 Messages

 • 

310 Points

2 m ago

Thanks Syamantak!

I've noticed 2 odd things in the client debugging tool:

1. The AP sends bogus "inactivity timeout" disconnects. 

In the screenshot below, I connected to AP A (on 2.4G, at 11:26:06), then roamed to AP B (on 5G, 11:27:50). 16 seconds after roaming to AP B, and merely 2 minutes after initially connecting to the network, the device gets booted due to inactivity. My inactivity timeout is 500 minutes, and "force DHCP" is disabled, so this most certainly looks like a bug in Unleashed.

2. There are multiple DHCP requests from the client on 5G

After initially connecting to the wifi, the client sends 3-4 DHCP requests. Per screenshot below, the AP sees DHCP ACKs, but the ACKs get dropped either on the client, or between AP and client. This only happens on 5G, on 2.4G the DHCP request isn't repeated. So this is consistent with what I initially observed with ARP requests. 

Official Rep

 • 

1.3K Messages

 • 

17.7K Points

@david_fuchs if we don't see any delay in wifi connection (eapol) then DHCP can be culprit.

Please take a capture on AP's eth interface and on DHCP server.

Compare the send and receive delay between DHCP server and AP.

For inactivity timeout, it can also trigger if client is not responding to block ack frames.

Regards,

Syamantak Omer

Official Rep | Staff TSE | CWNA | CCNA | RASZA | RICXI

Follow me on Linkedin

14 Messages

 • 

310 Points

The DHCP server is not the culprit.

It's a bit hard to see as the forum mangles and downscales my screenshots a bit aggressively, but here's the relevant exchange, as seen by the AP:

Note that the client sends 3 separate requests - at 11:19:15, 11:19:16, and 11:19:18 - and the DHCP server immediately responds to all 3. The AP sees these DHCP ACKs - the client does not. This indicates that they get lost, either on the client or between AP and client. And it's not just DHCP - ARP behaves the same way.

So, the bug clearly lies with either the client or the AP. So I decided to test with a different client (a Lenovo X1 with an intel wifi chipset, running linux). And guess what? I'm seeing the exact same behavior. So unless there is a client bug that affects both Intel and Qualcom, across different operating systems, and only on 5G... This is a bug in Ruckus access points. 

So I guess the next question is - is there some particular setting that triggers this bug? Any settings worth trying to work around it?

Official Rep

 • 

1.3K Messages

 • 

17.7K Points

@david_fuchs If you look at the Access Point column, DHCP Ack is crossing it and clearly indicates that traffic is passed to the Client. Now in this case, packet captures on 3 nodes needs to be done.

1. AP eth and wifi interface (OTA capture).

2. DHCP server.

3. On client

If we run capture on all the devices, we will know where is is failing

(edited)

Regards,

Syamantak Omer

Official Rep | Staff TSE | CWNA | CCNA | RASZA | RICXI

Follow me on Linkedin

14 Messages

 • 

310 Points

2 m ago

There ya go:

Packet capture shows DHCP requests highlighted (probably hard to see, but the timestamps match exactly). Ruckus debug tool claims both those requests received responses; both responses never show up in the capture. 

So, yeah... a $1200 "enterprise grade" networking device is dropping frames. Great.

(edited)

Official Rep

 • 

1.3K Messages

 • 

17.7K Points

@david_fuchs please open up a support case if you have support contract so that support can look into the issue.

Regards,

Syamantak Omer

Official Rep | Staff TSE | CWNA | CCNA | RASZA | RICXI

Follow me on Linkedin

14 Messages

 • 

310 Points

Thanks again Syamantak!

I do not have a support contract with Ruckus.

Since this is pretty clearly a bug in Unleashed that causes unnecessary packet loss and disconnects, it's something Ruckus should fix, support contract or not. I'd be happy to share capture files or any other relevant information if contacted by someone on Ruckus' engineering team.

Best,

- Dave.

Important Announcement