AP FLAGGED STATUS

  • 1
  • Question
  • Updated 2 days ago
  • Answered
We detected most of our AP with Flagged status. This status happened randomly with random APs. The Flagged type is AP health high connection failure flag. Code 330.
Do anyone has any idea why this issue can occurred and how to solve it.  
Photo of Nadia

Nadia

  • 3 Posts
  • 0 Reply Likes

Posted 5 months ago

  • 1
Photo of Jardel Almeida

Jardel Almeida

  • 42 Posts
  • 3 Reply Likes
I find it interesting to share the AP / WLC GUI status print.

However, generally these flags are related to overhead, it is necessary to change the channel if the environment is dense, inserting static channels of 2.4GHz, and limiting the 5GHz to 40MHz.
Change the RF configuration to beamform, and change the study period of frequencies / neighboard / interference to 1h (3600 s); Also check the load / user balance, and if you prefer, reduce the power of the AP to 1/3.
Photo of Nadia

Nadia

  • 3 Posts
  • 0 Reply Likes
Should try this. Thanks for the suggestions:)
Photo of David Buhl

David Buhl

  • 10 Posts
  • 11 Reply Likes
We found the same issue and reported it to Ruckus (along with a LOT of other issues with vSZ).  They have so far not resolved it, but said it is probably related to them over-reporting connection failures.  For us, if you go to the Dashboard and Connection Failures, we see 70-85% failure rate overall, with the same for the first category of "Authentication".  The other categories: Association, EAP, Radius, DHCP, are all below 1% almost all the time.  We have 2500 clients, and they all are connecting fine now.  If 80% of them weren't connecting, I think someone might tell me.

So Ruckus is mis-reporting, or mis-categorizing something.  The definition of "Authentication" from the Dashboard is: Authentication failure is a measurement of client connection attempts that failed at the 802.11 open authentication stage. This is the first stage in any modern Wi-Fi connection.

My guess is that our clients are roaming and start a connection, but don't complete it with that AP, but instead have moved on before it can complete the connection.  But that still seems like it can't explain all of this, because wifi should connect fairly quickly, and our people walk (slowly, especially when I'm behind them in the hallway) around, and really shouldn't be passing to many APs without connecting to them before moving to another AP.
Photo of Jim Michael

Jim Michael

  • 47 Posts
  • 12 Reply Likes
Just chiming in that we are seeing the EXACT same behavior as David describes. I have a vSZ with four zones, and entire system was running fine on 3.5 (no authentication failure flags, APs were only flagged when something was flag-able like really high client counts, etc).

Then two weeks ago I upgraded the vSZ to 3.6.2.0.222 and only upgraded ONE zone, and it immediately started flagging almost all APs with a "connection failure health [100] because it crossed the threshold [30]" event...so the system is saying that my APs are having *100%* client connection failure, when everything is obviously working fine (people would be screaming and I'd have tons of monitored ipads and printers and such offline if that were actually happening.)

The other three zones that are still on 3.5 continue to operate normally (no false flagging.) I haven't had the guts to upgrade to 5.x yet to see if it's fixed there, as that seems like such a different beast... but it's REALLY frustrating to see all these false flags on my zone because it's hiding any REAL issues I'd be interested in.

Please acknowledge and fix this, Ruckus!
Photo of Michael Brado

Michael Brado, Official Rep

  • 2839 Posts
  • 397 Reply Likes
It might be helpful if you open a ticket and provide your logs to the engineer.
Photo of Jim Michael

Jim Michael

  • 47 Posts
  • 12 Reply Likes
Will do!
Photo of Frank

Frank

  • 2 Posts
  • 0 Reply Likes
Did you ever hear anything back from support? I have the same issue as well.
Photo of Michael Brado

Michael Brado, Official Rep

  • 2839 Posts
  • 397 Reply Likes
And only after upgrading a zone to 3.6.2 Frank?
And I'm curious too Jim, if you opened a ticket.
Photo of Frank

Frank

  • 2 Posts
  • 0 Reply Likes
Yep, same thing post-upgrade. Upgraded 3.6.1.0.227->3.6.2.0.222
Photo of Jim Michael

Jim Michael

  • 47 Posts
  • 12 Reply Likes
Sorry, I dropped the ball and never opened a ticket. I'll see if I can get to it this week. (It's soooo painful to deal with support in my experience...)
Photo of Michael Brado

Michael Brado, Official Rep

  • 2839 Posts
  • 397 Reply Likes
Let me help, the key is knowing what they will ask you for!

Product :  What kind of SZ - SZ300 / SZ100 / vSZ
Serial Num:  They will ask...
SW Version: Important, since you didn't see the problem till upgrade
Logs:  Can you get logs from this Zone in advance?

If you have this ready, they may just need to open a ticket for you, send you a greeting msg with the case number, and you can reply back with your logs for them to go thru, search for bugs, etc.
(Edited)
Photo of Jim Michael

Jim Michael

  • 47 Posts
  • 12 Reply Likes
Thanks. I opened a ticket. We'll see where it leads!
Photo of David Buhl

David Buhl

  • 10 Posts
  • 11 Reply Likes
Good luck.  I've had a ticket opened since 11/8/18.  It's been "sent to engineering" several times, only for me to get contacted again to provide more information "so we can send it to engineering".  I just got another email a couple of days ago with the same update that was sent two months ago.  I'm hoping this was an error and they really don't need further information from me/you/us.

They said they confirmed the issue, but it seems to be going nowhere.  Since this is something that "engineering" broke with an update, I'd think it would be fairly simple to unbreak it.
Photo of Michael Brado

Michael Brado, Official Rep

  • 2839 Posts
  • 397 Reply Likes
What ticket number David, I'd like to look into it, what bug they think it matched, etc...
Photo of David Buhl

David Buhl

  • 10 Posts
  • 11 Reply Likes
00867333

I have not been given a bug (AR?) number on this one, which makes me a little uneasy.
Photo of Jim Michael

Jim Michael

  • 47 Posts
  • 12 Reply Likes
Ok, I opened a ticket on this a month ago and here's the result. First, I actually let them connect to my NATed vSZ over the internet where they installed two APs, each into different zones with 3.5/3.6 firmware respectively. After a couple of weeks of them running the APs on my vSZ this is what I got back from support yesterday:

I hope you are doing good.

 I have created the below setup on your controller to check the connection failure issue on lab APs. Thereafter, I have mapped the wrong vlan under the Wlan settings to fail the clients to get an IP from the DHCP. So, the APs should show the connection failure rate in the UI.

 1) Zone: Ruckus Test 3.6.2 and Wlan: 00Test-3.6.2 Client's devices were failed to connect to the SSID(00Test-3.6.2) due to the DHCP issue. I could see 100% connection failure rate and historical report of client's connection under the health Tab of the AP. The AP was displayed the accurate results in the UI.

 2) Zone: Ruckus Test 3.5.1 and Wlan: 00Test-3.5.1 I have connected the same devices to the SSID(00Test-3.5.1) to check the failure rate and noticed that there were no connection failures reported on the AP for the clients who were failed to connect to the SSID.

 I have analyzed the flagged APs(Default Zone) in your controller and could see the connection failures reported due to the DHCP failure. Client might have faced the difficulty to get an IP from the DHCP server at that moment. Please see the attached screenshot of historical report for your reference. The above test concludes that the connection failure algorithm is not working properly in the 3.5.1 software version and there is no miscalculation of connection failures in the 3.6.2.0.222 version.

 As I informed earlier, the connection failure rate that is shown in the UI is a cumulative value of the AP. The AP would take into account Auth, Assoc, EAP, RADIUS and DHCP for reporting connection failures.

Connection failures are calculated at 90 seconds interval. If during that time there are only failed attempts (even a single one) and no successful ones, the system will display 100% connection failure.

Soooo... my interpretation from all that is 1. Ruckus changed their algorithm in 3.6 to make it super-sensitive to DHCP failures and they are claiming that *3.5* is actually the broken version when it comes to connection failure flagging. 2. They seem to think that a SINGLE FAILURE within a 90 sec window should mean "100% connection failure rate"!? I honestly think there is still a bug and they just can't see it. We are seeing NO connectivity issues (that users can actually notice, anyway) and the ONLY difference is 3.5 (works fine) vs. 3.6 (reports crazy 100% connection failures on just about every AP). 

I'm done dealing with Ruckus on this, and hope that they somehow get enough complaints to actually look into it at an engineering level someday. For now, it looks like we're all stuck with permanently (and falsely) flagged APs. Sigh.
Photo of Michael Brado

Michael Brado, Official Rep

  • 2839 Posts
  • 397 Reply Likes
Hi Jim,

What was your ticket number please?  Are you using the same AP models in both zones?

I want to be sure there's a bug filed on the over sensitive SZ release, thanks.
(Edited)
Photo of Jim Michael

Jim Michael

  • 47 Posts
  • 12 Reply Likes
Ticket #00927433  They used R700 in both tests/zones, which match some of our APs (we also have R600/610/710 also show the same flagging behavior).