Skip to main content

3 Messages

 • 

112 Points

Wed, Jan 2, 2019 9:22 AM

Answered

AP FLAGGED STATUS

We detected most of our AP with Flagged status. This status happened randomly with random APs. The Flagged type is AP health high connection failure flag. Code 330.
Do anyone has any idea why this issue can occurred and how to solve it.  

Responses

52 Messages

 • 

760 Points

2 years ago

I find it interesting to share the AP / WLC GUI status print.

However, generally these flags are related to overhead, it is necessary to change the channel if the environment is dense, inserting static channels of 2.4GHz, and limiting the 5GHz to 40MHz.
Change the RF configuration to beamform, and change the study period of frequencies / neighboard / interference to 1h (3600 s); Also check the load / user balance, and if you prefer, reduce the power of the AP to 1/3.

3 Messages

 • 

112 Points

Should try this. Thanks for the suggestions:)

18 Messages

 • 

450 Points

2 years ago

We found the same issue and reported it to Ruckus (along with a LOT of other issues with vSZ).  They have so far not resolved it, but said it is probably related to them over-reporting connection failures.  For us, if you go to the Dashboard and Connection Failures, we see 70-85% failure rate overall, with the same for the first category of "Authentication".  The other categories: Association, EAP, Radius, DHCP, are all below 1% almost all the time.  We have 2500 clients, and they all are connecting fine now.  If 80% of them weren't connecting, I think someone might tell me.

So Ruckus is mis-reporting, or mis-categorizing something.  The definition of "Authentication" from the Dashboard is: Authentication failure is a measurement of client connection attempts that failed at the 802.11 open authentication stage. This is the first stage in any modern Wi-Fi connection.

My guess is that our clients are roaming and start a connection, but don't complete it with that AP, but instead have moved on before it can complete the connection.  But that still seems like it can't explain all of this, because wifi should connect fairly quickly, and our people walk (slowly, especially when I'm behind them in the hallway) around, and really shouldn't be passing to many APs without connecting to them before moving to another AP.

48 Messages

 • 

884 Points

2 years ago

Just chiming in that we are seeing the EXACT same behavior as David describes. I have a vSZ with four zones, and entire system was running fine on 3.5 (no authentication failure flags, APs were only flagged when something was flag-able like really high client counts, etc).

Then two weeks ago I upgraded the vSZ to 3.6.2.0.222 and only upgraded ONE zone, and it immediately started flagging almost all APs with a "connection failure health [100] because it crossed the threshold [30]" event...so the system is saying that my APs are having *100%* client connection failure, when everything is obviously working fine (people would be screaming and I'd have tons of monitored ipads and printers and such offline if that were actually happening.)

The other three zones that are still on 3.5 continue to operate normally (no false flagging.) I haven't had the guts to upgrade to 5.x yet to see if it's fixed there, as that seems like such a different beast... but it's REALLY frustrating to see all these false flags on my zone because it's hiding any REAL issues I'd be interested in.

Please acknowledge and fix this, Ruckus!
Brand User

Former Employee

 • 

2.6K Messages

 • 

44.8K Points

It might be helpful if you open a ticket and provide your logs to the engineer.

48 Messages

 • 

884 Points

Will do!

2 Messages

 • 

72 Points

Did you ever hear anything back from support? I have the same issue as well.
Brand User

Former Employee

 • 

2.6K Messages

 • 

44.8K Points

And only after upgrading a zone to 3.6.2 Frank?
And I'm curious too Jim, if you opened a ticket.

2 Messages

 • 

72 Points

Yep, same thing post-upgrade. Upgraded 3.6.1.0.227->3.6.2.0.222

48 Messages

 • 

884 Points

a year ago

Ok, I opened a ticket on this a month ago and here's the result. First, I actually let them connect to my NATed vSZ over the internet where they installed two APs, each into different zones with 3.5/3.6 firmware respectively. After a couple of weeks of them running the APs on my vSZ this is what I got back from support yesterday:

I hope you are doing good.

 I have created the below setup on your controller to check the connection failure issue on lab APs. Thereafter, I have mapped the wrong vlan under the Wlan settings to fail the clients to get an IP from the DHCP. So, the APs should show the connection failure rate in the UI.

 1) Zone: Ruckus Test 3.6.2 and Wlan: 00Test-3.6.2 Client's devices were failed to connect to the SSID(00Test-3.6.2) due to the DHCP issue. I could see 100% connection failure rate and historical report of client's connection under the health Tab of the AP. The AP was displayed the accurate results in the UI.

 2) Zone: Ruckus Test 3.5.1 and Wlan: 00Test-3.5.1 I have connected the same devices to the SSID(00Test-3.5.1) to check the failure rate and noticed that there were no connection failures reported on the AP for the clients who were failed to connect to the SSID.

 I have analyzed the flagged APs(Default Zone) in your controller and could see the connection failures reported due to the DHCP failure. Client might have faced the difficulty to get an IP from the DHCP server at that moment. Please see the attached screenshot of historical report for your reference. The above test concludes that the connection failure algorithm is not working properly in the 3.5.1 software version and there is no miscalculation of connection failures in the 3.6.2.0.222 version.

 As I informed earlier, the connection failure rate that is shown in the UI is a cumulative value of the AP. The AP would take into account Auth, Assoc, EAP, RADIUS and DHCP for reporting connection failures.

Connection failures are calculated at 90 seconds interval. If during that time there are only failed attempts (even a single one) and no successful ones, the system will display 100% connection failure.

Soooo... my interpretation from all that is 1. Ruckus changed their algorithm in 3.6 to make it super-sensitive to DHCP failures and they are claiming that *3.5* is actually the broken version when it comes to connection failure flagging. 2. They seem to think that a SINGLE FAILURE within a 90 sec window should mean "100% connection failure rate"!? I honestly think there is still a bug and they just can't see it. We are seeing NO connectivity issues (that users can actually notice, anyway) and the ONLY difference is 3.5 (works fine) vs. 3.6 (reports crazy 100% connection failures on just about every AP). 

I'm done dealing with Ruckus on this, and hope that they somehow get enough complaints to actually look into it at an engineering level someday. For now, it looks like we're all stuck with permanently (and falsely) flagged APs. Sigh.
Brand User

Former Employee

 • 

2.6K Messages

 • 

44.8K Points

Hi Jim,

What was your ticket number please?  Are you using the same AP models in both zones?

I want to be sure there's a bug filed on the over sensitive SZ release, thanks.

48 Messages

 • 

884 Points

Ticket #00927433  They used R700 in both tests/zones, which match some of our APs (we also have R600/610/710 also show the same flagging behavior).

18 Messages

 • 

450 Points

a year ago

FYI, I was told the same, that the new firmware will fix the problem. I upgraded my vSZ and APs to 5.1.1.0.624.  There was no change.  I still have over half my APs in flagged status, despite having a very low number of devices compared to normal.

So new firmware, old firmware, same old story.

48 Messages

 • 

884 Points

a year ago

Just closing the loop on this one... got a final explanation for this on my case. Looks like Ruckus considers this new behavior a feature :-/

"As [redacted] mentioned, In 3.6 we introduced a new feature where the AP would calculate all types of client connection failures and report it to SZ. That's where you see a huge difference between 3.5 and 3.6. The AP would take into account Auth, Assoc, EAP, RADIUS and DHCP into account for reporting connection failures. In 3.5 AP didn't have even 50% of this ability.

In 3.5, SZ maintains the connection failure history for up to 1 week, AP reports failures in multiple counters to the controller. In 3.6.2 it is changed to use a different counter that was delta.

Please refer 3.6.2 Release Notes:

ER-6198: Resolved an issue where the controller kept a track of old client connection data, which resulted in incorrect display of client connection failures.

https://support.ruckuswireless.com/documents/2435-smartzone-3-6-2-0-78-mr2-release-notes

To understand these failures, you can select an AP where you are seeing high failures and go to health tab for that AP. Under the health tab you have two sections, one for performance and the other for client connection failures. If you go to client connection failures you will have complete plots showing under which above category are the failures happening. Now if necessary you can choose failure types as show below and also click on individual failure type to see which clients are failing. This feature was introduced to help customer debug his network issues."