Spanning Tree Problems with Access Points connected to Procurve Switches

  • 4
  • Question
  • Updated 1 year ago
  • Answered
Anyone ever find switchports flapping on an off when Spanning Tree is active and an access point (7055, 7363, 7372, 7962) is connected? We have found this in several properties now with very similar configurations but have not been able to determine a cause. All of our hotels use Procurve switches at the edge (2910al-PoE or 2520-8G-PoE), all participate in spanning tree, but we only observe the flapping on about 1% of switchports. Of course, 1% for us means that 10-50 guests are getting knocked off the network regularly, so it is a bit of an issue (some ports have 200-300 state transitions *per day*.

We have verified that the configurations on the ports that have the problem match the ports that do not with regard to BPDU filtering, root-guard, etc. We do not believe that it is a deliberate attack because it is the same access points that have chronic issues, whereas the other 99% never do.

Any ideas?
Photo of Simon Eng

Simon Eng

  • 8 Posts
  • 1 Reply Like
  • frustrated.

Posted 4 years ago

  • 4
Photo of Keith - Pack Leader

Keith - Pack Leader

  • 860 Posts
  • 50 Reply Likes
Hi Simon,

If you aren't doing any mesh, you can safely disable spanning tree on those (leaf-node) ports to see if that's related to the issue.

My colleagues suspect it's related to POE (overload, faulty wiring, etc). Are you getting any other error messages on the switch?
Photo of Simon Eng

Simon Eng

  • 8 Posts
  • 1 Reply Like
Hi Keith,
Thanks for the suggestion. I should have mentioned this in the original problem description, but we have BPDU protection, Loop Detection, and all the ports forced to Edge except for inter-switch links.

I suspect you're right that it does have something to do with the wiring, a faulty switchport, or the AP(s) are themselves faulty in some way, Unfortunately we aren't getting much in the way of error messages from any of the components apart "Help, I'm dead".

We have in the past tried switching out the patch cables and the cable runs themselves have tested out good. Moving the AP moves the problem *sometimes* but not always, and the AP diagnostics don't yield much.

Perhaps, also, the title of the issue is misleading: I included Spanning Tree in the mix because the only thing that the switch complains about is having to put the port in block mode (but not as a function of the initial 3 second listening period for BPDU because the port is being forced to be an edge).
Photo of Keith - Pack Leader

Keith - Pack Leader

  • 860 Posts
  • 50 Reply Likes
Is it usually a particular model AP implicated? Just for example our indoorAP failure rates are extremely low (much less than 1% typically) so it's unlikely you're seeing actual hardware failures based on the description.
Photo of mmulcahy

mmulcahy

  • 3 Posts
  • 3 Reply Likes
Simon,
I've experienced this exact same problem. Took down my entire network one night. No idea what caused it. The port the AP was connected to started flapping, then go blocked, but the AP would Mesh to other APs and continue to flood the network. Eventually all of the uplink ports ended getting blocked, and the only way to get around it was turn off STP, disable Mesh on the APs, then we were able to turn STP back on.....definitely not a fun night. We have 7363, 7982, 7762, and 7962s
Photo of Keith - Pack Leader

Keith - Pack Leader

  • 860 Posts
  • 50 Reply Likes
@mmulcahy - that's great insight. Did you individually remove mesh from each AP? Or a factory reset and reconfigure? (which is the "official" method)
Photo of mmulcahy

mmulcahy

  • 3 Posts
  • 3 Reply Likes
I did it individually. I didn't want to risk anything going wrong using my backup configs. not the most fun thing to do, but if I ever do need to mesh something, I would be able to do it on an individual AP basis.
Photo of Simon Eng

Simon Eng

  • 8 Posts
  • 1 Reply Like
Mike, That's a good idea for us to run down since we do seem to experience the issue only at sites that have mesh enabled. Oddly though, if memory serves, none of the affected APs were (or should have been) actively participating in the mesh, but we will certainly give that more attention now.
Photo of Steve Byrum

Steve Byrum, Employee

  • 7 Posts
  • 1 Reply Like
IT's a good practice to not have mesh enabled on APs that are connected via the wire. I would suggest you set any AP that is not intended to be a mesh AP to ROOT only mode or if you are running newer versions of code you can disable mesh on a per AP basis.

When a wired AP goes into mesh mode the uplink switch may suspect a loop. Ruckus has a mechanism to avoid the switches detecting the loop. but it is not 100% guaranteed.

THe problem is if the wired connection is flaky it will cause the AP to thrash between wired and mesh and that thrashing can lead to a loop being detected or a real loop. Even more so when spanning tree kicks in there now is a third factor in the flapping as spanning tree can cause the wired port to become blocked sending the AP back to mesh status.

My recommendations are to disable mesh or set root mode on wired APs that do not require mesh. I
Photo of Koos Boersma

Koos Boersma

  • 2 Posts
  • 0 Reply Likes
Gents,

question. What happens when a AP connected to a wire is losing its wired connection. We setup the network with the Mesh function in our head as a fallback scenario. But if we disable Mesh for this wired AP's we are losing the fallback.

Suggestions?
Photo of Keith - Pack Leader

Keith - Pack Leader

  • 860 Posts
  • 50 Reply Likes
This is a great conversation that's separate from the main topic, so I created a new topic to continue the discussion. Please reference the new topic here: What happens when a AP connected to a wire is losing its wired connection
Photo of Brandon Chap

Brandon Chap

  • 1 Post
  • 0 Reply Likes
I am not sure if you have figure out your problem yet, but we had this exact same issue. With bpdu-protection setting enabled for ports going to AP's we were seeing the AP drops. Our AP's were in a university dorm type setting. Since we were using 802.1x in the dorms and the protocol is not supported by gaming consoles, some of the students were bridging their laptops wireless to their laptop's LAN port. When this happens, the laptop attempts to participate in STP and sends a BPDU packet to the AP. Once it gets to the switch, bpdu-protection disables the port. We turned off bpdu-protection and kept bpdu-filtering set for the AP ports on the switch. This way the BPDU packet is just dropped and traffic continues to flow as normal. No more random AP drops. Hope this helps!
Photo of Simon Eng

Simon Eng

  • 8 Posts
  • 1 Reply Like
Hi Brandon,

Thanks, we will try that as well. Thanks!
Photo of Mitchell Axtell

Mitchell Axtell

  • 50 Posts
  • 9 Reply Likes
Another killer- Wake-on-WLAN uses BPDUs, so any newer Intel chip would happily send BPDUs, get the AP shut off, roam to the next AP and repeat until all in range were downed.

We have adjusted to remove BPDU protection from the AP ports.

Side note- I can recommend removing loop protect as well- I have seen it cause some issues when clients roam.
Photo of Simon Eng

Simon Eng

  • 8 Posts
  • 1 Reply Like
Well, that's handy. Thanks for the heads-up. We're going heads down for the next 2 days installing 70 switches and ~200 access points. This should be fun.
Photo of Bill Burns

Bill Burns, AlphaDog

  • 203 Posts
  • 38 Reply Likes
Don't know why I never saw this thread before.

I had a similar issue between Ruckus APs (on a controller) and cisco switches.
There was spanning-tree instability in the network due to a firmware issue in a non-cisco-switch.

Ruckus APs meshed together, and passed BPDUs over the meshed connections, magnifying the original problem, causing wired ports to become spanning-tree blocked and generally screwing up the wired network.

After disabling mesh on all non-mesh APs and hard-coding mesh-root APs to the "root" role, the problem did not re-occur.
Apparently, the problem would have had to be triggered by some non-ruckus device but before mesh was disabled (except for areas where it was required and where it could not cause a loop) the mesh feature represented a serious vulnerability to the network.
(Edited)