OSX Clients Connected to AP, but not passing traffic

  • 7
  • Question
  • Updated 3 weeks ago
Anyone else experiencing issues with OSX clients connecting to APs, then randomly not being able to pass traffic/browse the internet?

We're seeing this primarily with OSX 10.11 clients. We have ZD1200s and R710s running 9.13.1

Ruckus support's latest fix was enabling Proxy ARP or upgrading to Sierra. But we just experienced the same issue with a Sierra device.
Photo of JasonD

JasonD

  • 40 Posts
  • 3 Reply Likes
  • frustrated

Posted 10 months ago

  • 7
Photo of Com1 NL - Bas Sanders

Com1 NL - Bas Sanders

  • 32 Posts
  • 9 Reply Likes
Hi Jason,
We have seen issues with Apple devices on multiple releases, but mostly when the latency between AP and ZD are high (500+ms). And yes, we know that is beyond limits, but the odd thing is that it only happens with apple devices. Any other device will work fine.

Our setup is traveling with the F1 and our ZD is located in NL. Any race in Asia/AUS or the america's causes latencies up to 750ms.

So far ruckus was never able to pinpoint the problem in our case.
Photo of JasonD

JasonD

  • 40 Posts
  • 3 Reply Likes
Interesting...I'm concerned that Ruckus is not able to resolve this issue given how long they have been continuing and we'll be forced to switch to a new vendor.
Photo of Joseph LoBianco

Joseph LoBianco

  • 5 Posts
  • 0 Reply Likes
Hi jason,

We are having a similar issue.  Running ZD1200 and 3 R710s in a full El Capitan (10.11)/Yosemite environment (some Linux clients).

Symptom: A wifi connected client will experience connectivity loss to the default gateway (and of course the public internet as well) for roughly ~10 seconds; all applications/connections will automatically re-connect where applicable after this time.  This does not involve a drop in connection to the wifi.  At first, Ruckus tried to state that this was a GTK/PTK corruption issue involved with WPA2, which can only rectified by upgrading to Sierra (as you see, it has not solved your issue).  Confirmed loss to default gateway by seeing ICMP drop.

After pushing back, we ended up putting a packet capture on the ports that each AP is connected to, and what I found was that at some point in time all return traffic fails to respond to the client's requests, which continue trying to retransmit un-ack'd packets; the requests fail to even reach the AP, proving the issue is on my network at first glance.

What is interesting is that this only occurs for wifi clients, not ethernet.  Furthermore, our network is ridiculously simple in terms of design/configuration, so I truly believe there may still be an issue on Ruckus's end.

My current plan of action is to continue to isolate where the return traffic is not returning, and then work with my network vendor.

Please let me know if our symptom is similar to yours.

Edit:  Also, what is the frequency of your issue?  For us, it is anywhere from affecting multiple clients a day multiple times a day to no issues (or at least, no clients reporting issues to me) for a week or more.

Edit: We do not experience any major latency issues (500+ ms) prior to the drop 
(Edited)
Photo of JasonD

JasonD

  • 40 Posts
  • 3 Reply Likes
That's very similar behavior to what we're seeing. Although we have to turn off/on the wireless adapter to get traffic going again (but we have 802.1x authentication). In terms of frequency, it's hit or miss for us...a couple times a day maybe. Though our users probably work around it, rather than send in tickets. We also do not have major latency issues.

Edit: Ruckus support also called out a GTK corruption issue and stated Proxy ARP/OSX Sierra would resolve the GTK issue
(Edited)
Photo of Joseph LoBianco

Joseph LoBianco

  • 5 Posts
  • 0 Reply Likes
Are you using Juniper by any chance? Ruckus also suggest proxy arp/sierra. I can confirm that what we are experiencing is NOT related to any wpa2 corruption.

I highly recommend running a packet capture at the port(s) on your network device that connects to you APs and see if you experience the same traffic patterns as what I described (no return traffic coming back to the APs.).

We do not use 802.1x. Since you have to reconnect your wifi connection manually, there is a possibility you are in fact seeing a gtk/ptk issue (you should be able to easily verify this yourself with packet capture).
Photo of JasonD

JasonD

  • 40 Posts
  • 3 Reply Likes
No to Juniper. I'll try the packet capture. Thanks
Photo of Monnat Systems

Monnat Systems, AlphaDog

  • 708 Posts
  • 150 Reply Likes
hi

i got one customer with ZD1200 on 9.12 and 2*r700 & R300, They have MBP wth sierra with similar issue as you guys mentioned
I have asked them to try the suggestions on this post as it seems to solve following issue:

  • The Mac disconnects from wi-fi when wakes from sleep
  • macOS Sierra drops wi-fi connections or disconnects from wireless at random
  • Wi-Fi connections are unusually slow or have a higher ping than usual after updating to macOS Sierra

http://osxdaily.com/2016/09/22/fix-wi-fi-problems-macos-sierra/

May be this helps..
(Edited)
Photo of Toomas Kadarpik

Toomas Kadarpik

  • 4 Posts
  • 1 Reply Like
I have writte sleep wakeup script that can switch on wi-fi after wakeup, and before sleep I put the Wi-Fi card off. My macbook retina with Sierra work now perfectly. I digged into logs deeply and it is completely mac problem when firmware fails to intialize. Mine even failed to shutdown correctly before sleep.
Photo of JasonD

JasonD

  • 40 Posts
  • 3 Reply Likes
Interesting fixes, but those are a bit more invasive than I would like to do for all of our clients. Ideally, we'll see a true fix from Apple or Ruckus soon.
Photo of Joseph LoBianco

Joseph LoBianco

  • 5 Posts
  • 0 Reply Likes
Some new information:

Packet capture with Ruckus proved issue out to my firewall (acting as the default gateway for my wireless subnet).  After lots of troubleshooting, we now see this to be related to an arp issue that is only experienced by wireless clients.  It appears that the Ruckus equipment will RANDOMLY interfere with arp request/reply between the client and default gateway, which causes a delay of between ~8-50 seconds of loss of [default gateway]->[client] traffic.  Once an arp entry is put into the default gateway's arp table, traffic is immediately resumed.  I emphasize randomly, because sometimes the arp resolution is done appropriately and no issues are seen.

If you are also experienced by this issue, then you can relieve the problem by increasing the arp entry expiration on your router.  I set mine to 24 hours.  Of course this is a workaround and not a fix...

The challenge now is for me to continue troubleshooting with Juniper support in order to procure evidence showing this is a problem with Ruckus' equipment.
Photo of JasonD

JasonD

  • 40 Posts
  • 3 Reply Likes
Very interesting! Our ARP entry timeout is 30minutes and is not adjustable :(
Photo of Toomas Kadarpik

Toomas Kadarpik

  • 4 Posts
  • 1 Reply Like
Ruckus is interfering if proxy arp option is enabled at particular WLAN setup
Photo of JasonD

JasonD

  • 40 Posts
  • 3 Reply Likes
It was recommended by Ruckus support to enable proxy arp for 10.11 clients. Once all clients are 10.12 it was recommended to disable it.
Photo of Joseph LoBianco

Joseph LoBianco

  • 5 Posts
  • 0 Reply Likes
I'd take Ruckus' suggestions with a grain of salt in regards to their suggestion on proxy arp unless it is escalated to a (very) senior engineer.  Ruckus has repeatedly told me to try certain fixes that very clearly have no effect or make any technical sense.  We use 10.11 here in my office without proxy arp enabled, and have no issues except for the one that I am troubleshooting.  It is possible that we would need to enable proxy arp as they suggested, but from my understanding this would not apply to us since we are seeing this issue within a single broadcast network (i.e. proxy arp on/off wouldn't help/hurt here).

If I can now reproduce this issue by significantly lowering my arp expiration timeout setting on my router, then I should be able to test out the effect of enabling proxy arp on the WLAN.
Photo of ThX

ThX

  • 128 Posts
  • 2 Reply Likes
Have you made any additional determinations about enabling proxy arp?
(Edited)
Photo of Joseph LoBianco

Joseph LoBianco

  • 5 Posts
  • 0 Reply Likes
Unfortunately I have not yet.  I will report when I get a chance to.
Photo of Bill Fehring

Bill Fehring

  • 2 Posts
  • 0 Reply Likes
I'm experiencing this too. Proxy ARP seems to help, but fundamentally I have a packet capture showing quite clearly that my Juniper gateway (an L3 switch in this case) sent an ARP request to the client, it arrived at the AP (which I'm capturing via a SPAN port), and the AP refused to deliver it to the client. It wasn't until the client did another ARP request to the gateway that things got better. On the same token, I've noticed that Windows clients seem to send gratuitous ARP requests periodically and I suspect this is actually what's helping them avoid this. ARP is pretty fundamental; the AP is deliberately interfering with it in a bad way.
Photo of James Buckey

James Buckey

  • 7 Posts
  • 0 Reply Likes
ZD3000 130 APs (88 R710 and 42 R700). Brocade ICX 7250 switching.  I have experienced this same issue but only recently after switching from various EOL HP switches to Brocade.  My issue is with all Apple devices, iOS and OSX.
Photo of ThX

ThX

  • 128 Posts
  • 2 Reply Likes
Since you are using ALL infrastructure Brocade products you may hear something from tech support.   I have an open ruckus case for this problem that has now been set to a status of "DEFECT."  Our techie said next support level was "blaming" Apple.
(Edited)
Photo of JasonD

JasonD

  • 40 Posts
  • 3 Reply Likes
Our ticket was also marked as a defect and Ruckus mentioned its an issue with Apple's code. I can't confirm how true that is though. I'll mention we're not using Brocade.
Photo of James Buckey

James Buckey

  • 7 Posts
  • 0 Reply Likes
I can believe that a portion of the blame is with Apple, "Think different, do different". However, this was not an issue before my network overhaul.

Previously, I had a badly designed network. A 10.32.0.1/16 network, VLAN 1, no other config. It was a large subnet under no management but it worked. Now I have 60 VLANs each building in a seperate OSFP area each SSID on its own VLAN within each building. I have quad checked all switch configs against DHCP scopes. I fixed a few fat-finger mistakes but they were issues with mistyped router IPs per VLAN which wasnt allowing DHCP requests to reach the server. Unrelated.

Being this is only Apple devices affected, the problem is obviously Apple under a Ruckus environment, undeniably. That said, I am not necessarily convinced the root cause is a non-standard way of Apple devices communicating vs an oddity in Ruckus code that clashes with something proper in Apple's.
Photo of ThX

ThX

  • 128 Posts
  • 2 Reply Likes
We had one Surface Pro 4 user report similar wifi problems yesterday.  I watched the techie place device in airplane mode and then disable airplane mode and the Surface was then able to access the Internet via the Ruckus APs.
Photo of James Buckey

James Buckey

  • 7 Posts
  • 0 Reply Likes
That blows my theory away. I already have bss-minrate 12, and added ofdm-only, even though i have no b clients. Maybe there is a rogue client floating around. Will wait until tomorrow and see what happens before making another change. Too often i make multiple changes and have no idea the cause of a problem or the change that fixed it.
Photo of ThX

ThX

  • 128 Posts
  • 2 Reply Likes

Computer: Lenovo Yoga

OS—Windows 10

similar problems
Photo of ThX

ThX

  • 128 Posts
  • 2 Reply Likes
Told by ruck tech yesterday that an unnamed customer switched to Smartzone from ZoneDirector and all these (as stated in the thread above) problems were solved.
Photo of James Buckey

James Buckey

  • 7 Posts
  • 0 Reply Likes
If so that would be an acceptable fix, however I am not willing to re-purchase 200 licenses and support for something that should work as is. I was told by two vendors that Ruckus doesn't convert or recycle ZD license into SZ licenses.  Unacceptable, they should be fully interchangable from a user standpoint, don't really care how its done on the Ruckus side. I otherwise love the Ruckus gear, but having paid $2000+ for a ZD which is only a low-end Core 2 Duo 1U server ($500 maybe) and licenses to boot...no longer.

The SZ may have different firmware, configs, etc, but there should be the same basic functionality on ZD for devices to work. 
Photo of Com1 NL - Bas Sanders

Com1 NL - Bas Sanders

  • 32 Posts
  • 9 Reply Likes
As a reply to ThX..

We are doing an event in Dubai at this very moment with about 1700 users at the Dubai Autodrome.. We run vSZ to control about 30 AP's.

The only problem we faced during this week: A group of Apple devices are not able to connect.. Android devices: No issues!

So a vSZ is NOT a fix by default.
(Edited)
Photo of James Buckey

James Buckey

  • 7 Posts
  • 0 Reply Likes
Good to know. I would have accepted the fix but not the reason. A Sz or Zd should have the same standard configs and abilities under 802.11.

I have had much better luck after enabling ofdm-only. I am in K12 education and if i have any B clients in not doing my job anyway.
Photo of Ryan C

Ryan C

  • 2 Posts
  • 0 Reply Likes
We're experiencing this issue on OSX 10.10-10.11.  Have yet to confirm if it's occurring on 10.12 since we have yet to upgrade our Macs to Sierra.  

We have ZD1200 and (10) r710 APs.  I was just told from a tech support rep to upgrade controller to 9.13.3.  We're on 9.12.2 so it's probably time to upgrade but curious to see if anyone has seen it help?  
Photo of JasonD

JasonD

  • 40 Posts
  • 3 Reply Likes
I don't believe they've fixed the bugs in 9.13.3...at least not from the release notes. My understanding is they are still working with Apple to resolve. Proxy ARP will probably be your best bet until they issue a fix. We were also told to configure background scanning at 20 seconds to address the bar kick out bug in OSX.

In terms of wireless reliability, 10.12.2 has been much more stable.
Photo of Bill Fehring

Bill Fehring

  • 2 Posts
  • 0 Reply Likes
+1.. We are running 9.13.3 and can confirm that the problem isn't fixed there, but proxy arp definitely helped a lot.
Photo of Ryan C

Ryan C

  • 2 Posts
  • 0 Reply Likes
Thanks Bill.  So if I understand this right, arp requests from my gateway to client are being disrupted somehow and taking longer than normal and sometimes timing out.  The workaround is to either enable proxy arp or increase the arp cache time to 24 hours (according to Joseph LoBianco).  
Photo of ThX

ThX

  • 128 Posts
  • 2 Reply Likes
We were told by a Ruckus tech to set background scanning to 1800 secs.  For us, arp cache set to 24 hours did not help.
Photo of James Buckey

James Buckey

  • 7 Posts
  • 0 Reply Likes
3600 from a Ruckus tech here.
Photo of JasonD

JasonD

  • 40 Posts
  • 3 Reply Likes
All over the map...another ruckus engineer rec'd 600-900 seconds. Come on Ruckus...
Photo of Steven Veron

Steven Veron

  • 7 Posts
  • 0 Reply Likes
Verify your channel widths. IOS has issues with 80Mhz channels. 
Photo of James Buckey

James Buckey

  • 7 Posts
  • 0 Reply Likes
Been using 40Mhz for awhile as 80 doesn't leave enough free channels with our number of APs.
Photo of John D

John D, AlphaDog

  • 497 Posts
  • 136 Reply Likes
Please elaborate on said issues :)