Do anyone has any idea why this issue can occurred and how to solve it.
However, generally these flags are related to overhead, it is necessary to change the channel if the environment is dense, inserting static channels of 2.4GHz, and limiting the 5GHz to 40MHz.
Change the RF configuration to beamform, and change the study period of frequencies / neighboard / interference to 1h (3600 s); Also check the load / user balance, and if you prefer, reduce the power of the AP to 1/3.
So Ruckus is mis-reporting, or mis-categorizing something. The definition of "Authentication" from the Dashboard is: Authentication failure is a measurement of client connection attempts that failed at the 802.11 open authentication stage. This is the first stage in any modern Wi-Fi connection.
My guess is that our clients are roaming and start a connection, but don't complete it with that AP, but instead have moved on before it can complete the connection. But that still seems like it can't explain all of this, because wifi should connect fairly quickly, and our people walk (slowly, especially when I'm behind them in the hallway) around, and really shouldn't be passing to many APs without connecting to them before moving to another AP.
Then two weeks ago I upgraded the vSZ to 18.104.22.168.222 and only upgraded ONE zone, and it immediately started flagging almost all APs with a "connection failure health  because it crossed the threshold " event...so the system is saying that my APs are having *100%* client connection failure, when everything is obviously working fine (people would be screaming and I'd have tons of monitored ipads and printers and such offline if that were actually happening.)
The other three zones that are still on 3.5 continue to operate normally (no false flagging.) I haven't had the guts to upgrade to 5.x yet to see if it's fixed there, as that seems like such a different beast... but it's REALLY frustrating to see all these false flags on my zone because it's hiding any REAL issues I'd be interested in.
Please acknowledge and fix this, Ruckus!
I hope you are doing good.
I have created the below setup on your controller to check the connection failure issue on lab APs. Thereafter, I have mapped the wrong vlan under the Wlan settings to fail the clients to get an IP from the DHCP. So, the APs should show the connection failure rate in the UI.
1) Zone: Ruckus Test 3.6.2 and Wlan: 00Test-3.6.2 Client's devices were failed to connect to the SSID(00Test-3.6.2) due to the DHCP issue. I could see 100% connection failure rate and historical report of client's connection under the health Tab of the AP. The AP was displayed the accurate results in the UI.
2) Zone: Ruckus Test 3.5.1 and Wlan: 00Test-3.5.1 I have connected the same devices to the SSID(00Test-3.5.1) to check the failure rate and noticed that there were no connection failures reported on the AP for the clients who were failed to connect to the SSID.
I have analyzed the flagged APs(Default Zone) in your controller and could see the connection failures reported due to the DHCP failure. Client might have faced the difficulty to get an IP from the DHCP server at that moment. Please see the attached screenshot of historical report for your reference. The above test concludes that the connection failure algorithm is not working properly in the 3.5.1 software version and there is no miscalculation of connection failures in the 22.214.171.124.222 version.
As I informed earlier, the connection failure rate that is shown in the UI is a cumulative value of the AP. The AP would take into account Auth, Assoc, EAP, RADIUS and DHCP for reporting connection failures.Connection failures are calculated at 90 seconds interval. If during that time there are only failed attempts (even a single one) and no successful ones, the system will display 100% connection failure.
Soooo... my interpretation from all that is 1. Ruckus changed their algorithm in 3.6 to make it super-sensitive to DHCP failures and they are claiming that *3.5* is actually the broken version when it comes to connection failure flagging. 2. They seem to think that a SINGLE FAILURE within a 90 sec window should mean "100% connection failure rate"!? I honestly think there is still a bug and they just can't see it. We are seeing NO connectivity issues (that users can actually notice, anyway) and the ONLY difference is 3.5 (works fine) vs. 3.6 (reports crazy 100% connection failures on just about every AP).
I'm done dealing with Ruckus on this, and hope that they somehow get enough complaints to actually look into it at an engineering level someday. For now, it looks like we're all stuck with permanently (and falsely) flagged APs. Sigh.
So new firmware, old firmware, same old story.
"As [redacted] mentioned, In 3.6 we introduced a new feature where the AP would calculate all types of client connection failures and report it to SZ. That's where you see a huge difference between 3.5 and 3.6. The AP would take into account Auth, Assoc, EAP, RADIUS and DHCP into account for reporting connection failures. In 3.5 AP didn't have even 50% of this ability.
In 3.5, SZ maintains the connection failure history for up to 1 week, AP reports failures in multiple counters to the controller. In 3.6.2 it is changed to use a different counter that was delta.
Please refer 3.6.2 Release Notes:
ER-6198: Resolved an issue where the controller kept a track of old client connection data, which resulted in incorrect display of client connection failures.
To understand these failures, you can select an AP where you are seeing high failures and go to health tab for that AP. Under the health tab you have two sections, one for performance and the other for client connection failures. If you go to client connection failures you will have complete plots showing under which above category are the failures happening. Now if necessary you can choose failure types as show below and also click on individual failure type to see which clients are failing. This feature was introduced to help customer debug his network issues."