Deployment sizes?

  • 1
  • Question
  • Updated 1 year ago
  • (Edited)
Does anyone have a deployment of over 100 AP's here or around 2000 clients?

I have a ticket that's been open for almost 6 months now around a memory leak in the Zonedirector code that is moving along very slowly. After all this time Ruckus support have worked out only which component has the memory leak. It seems to be related to client connections and the more there are the faster it becomes obvious. I actually suspect every single install sees this leak but until you get to a decent number of connections it's so small as to be non-obvious. 

It first became apparent to us when we got about 100 - 125 R600's deployed and had somewhere over 2000 clients. We've continued to grow the deployment and now have 200 AP's deployed and about 4000 concurrent clients all day and we can exhaust the memory of our ZD3000 in about a week.

Our install doesn't appear to be anything special, we publish a couple of EAP protected SSID's and a single PSK one. The only reason I can think of that this isn't affecting all large customers is that there really aren't that many with that much load.

I'd love to hear from anyone who has a deployment of this size (or larger)
Photo of Dave Watkins

Dave Watkins

  • 61 Posts
  • 12 Reply Likes

Posted 1 year ago

  • 1
Photo of Michael Brado

Michael Brado, Official Rep

  • 1968 Posts
  • 275 Reply Likes
What version of firmware are you running on your ZD today?  Was Support able to provide a bug ID number?
An average of 20 clients per AP should not exhaust resources with only a few SSIDs in service.
Photo of Dave Watkins

Dave Watkins

  • 61 Posts
  • 12 Reply Likes
Hi Michael

We're currently running a dev version (9.12.2.0 build 14546650), but this originally started with 9.8.

Our average depends on the site involved, for example one of our sites averages 30-40 connections per AP but some spike over 100 at times on single AP's. Other sites average 10 connections per AP. Case ID is 00329641, ER-3275.

My biggest issue is support seems to want to wait until memory consumption is over 80% before looking at the system, in the past when the deployment was smaller that took weeks to occur, and now on the day it does climb over 80% I have to reboot the ZD that night to maintain operation the next day without an unexpected reboot.

Basically, it's been almost 6 months, I now have to reboot the ZD weekly to maintain smooth operation, and I'm not convinced we're actually much closer to having this solved, and I have more sites to upgrade. I'm at a point now where we we'll be holding off upgrading those sites until this is resolved, and if it goes on too much longer we'll have to start looking at other wireless systems, which disappoints me because the actual functionality has been great.
Photo of Sean

Sean

  • 342 Posts
  • 87 Reply Likes
Hi Dave,

What sort of set up do you have i.e. tunnelled client traffic, application visibility etc?

Previously I used ZD3000's, but found them to be a little lacking on the performance side of things and had to upgrade - I also was seeing high CPU on a ZD3000 (flatlined at 100%), but I RMA'd the device and shortly after upgraded to the ZD5000 as I did not want to take anymore risks using mid range product for high end network.

Since the upgrade the network has been solid as a rock!

Note: The network I was running had over 100 AP's and around 8,000 to 10,000 concurrent sessions, so the ZD3000 was borderline in terms of suitability for my network in the first place.
(Edited)
Photo of Dave Watkins

Dave Watkins

  • 61 Posts
  • 12 Reply Likes
We don't have any CPU issues on the ZD, just ever increasing RAM usage. We have basically no tunneled traffic, that's not entirely true as we have one tiny site that we tunnel from but it would be less than 1% of the total traffic. Application Visibility is off (one of the things tried to control the RAM issue).

Our SSID's are configured for high density as one of our sites is pretty busy so basically odfm only and a couple of other sensible settings, nothing out of the ordinary and all according to the various whitepapers. The R600's are also configured to allow 200 connections and so are the SSID's (per AP)

Our AP's are alsmot exclusively R600's with 5 R710's for locations that needed a little more range, and about 20 ZF7372's. Our EAP SSID's are the ones that get used heavily and our PSK SSID is hiidden and only used for devices that don't support EAP (appletv's, projectors etc etc)
Photo of Bill Burns

Bill Burns, AlphaDog

  • 203 Posts
  • 38 Reply Likes
I have over 100 APs on a ZD3000.
(probably less than 125)
Total employee count here is 1100 plus there are sometimes 500 visitors.

I don't ever look at my ZD CPU or memory utilization but I have no reason to suspect they're high.
The ZD here is basically just funneling log messages to a syslog server.
I'm not sure what would tax the CPU or memory.
Photo of Dave Watkins

Dave Watkins

  • 61 Posts
  • 12 Reply Likes
Just for the record the problem is 802.11r FT Roaming.

Enabled and you lose a small amount of memory with every connection (or maybe roam). Disabled and memory consumption stays static
Photo of Sean

Sean

  • 342 Posts
  • 87 Reply Likes
I suggest you raise a ticket and send Ruckus the debug logs from the ZD.

Q: Can you cross reference your memory util  peak with number of clients peak values on the real time monitoring page on the UI?
(Edited)
Photo of Bill Burns

Bill Burns, AlphaDog

  • 203 Posts
  • 38 Reply Likes
Based on "happy it's been identified", I'd assume that Dave already opened a ticket.

Dave: Did you discover the association of the memory leak w/ 802.11r on your own or through support?
Photo of Sean

Sean

  • 342 Posts
  • 87 Reply Likes
Assume nothing! ;)
Photo of Bill Burns

Bill Burns, AlphaDog

  • 203 Posts
  • 38 Reply Likes
You're right.
How 'bout this:

Based on: "a ticket that's been open for almost 6 months now"
I realize that I should have been better at reading (or remembering) the OP.

But I still wonder if Dave was the one who discovered that the memory leak had to do w/ 802.11r "Fast Transition" roaming. (and what the impact is/was)
Photo of Sean

Sean

  • 342 Posts
  • 87 Reply Likes
your right my bad :)

HIH
Photo of Bill Burns

Bill Burns, AlphaDog

  • 203 Posts
  • 38 Reply Likes
I'm curious:
How did you initially realize there was a memory leak issue?
What was the impact on your network/operations?

Did the APs stop working altogether?
Did the APs start refusing to accept new associations?
Did it happen to all APs at once, or all together?
Photo of Dave Watkins

Dave Watkins

  • 61 Posts
  • 12 Reply Likes
So, yes, a ticket has been open a long time :).

I'm running Icinga and pulling CPU and memory as well as AP CPU and AP clients  via SNMP, although initially I wasn't.

Initially I discovered this with an unexpected ZD reboot in the middle of the day which happily kicked a couple of thousand clients until Smart Redundancy picked them up (on that note, actual seamless failover would be a god send).

Once that happened I put monitoring and graphing in place to see what was going on.

To be clear, this isn't about the number of clients actively connected per say. From my graphing I can tell you the memory gets used as they associate for the first time that day, so we see a spike of memory consumption in the morning and not much movement for the rest of the day. It never drops once everyone leaves so it may just as easily be a failure to clean up after connections leave. It only happens with FT enabled. Before disabling FT I was losing between 8% and 12% of total memory every day (with about 200 AP's and 4000 active clients at any point during the day). 

I disabled FT at the request of support after the accumulation of a _lot_ of logs. Debug logs didn't tell them much, remote sessions only confirmed the station manager was what was chewing up the memory, some custom logging they setup to TFTP with a custom firmware build initially didn't show much seemingly but after that the suggestion came to disable FT and now we're getting somewhere.

So, progress :)