Number of Probe Response Retry

  • 1
  • Idea
  • Updated 3 days ago
  • Under Consideration
There is a problem with the response to broadcast probe requests that contain a wildcard SSID. 

Some client implementations wait only a few ms after sending a probe request while scanning. It might be impossible to get the frame out before the client leaves the channel again. If the client leaves before all probe reponses where acked this can cause the probe response to be retried quite often consuming even more airtime.

Therefore, the question arises. Is it possible to set a flag to not send more than 1 probe response retry to broadcast probe requests that contain a wildcard SSID?

Photo of Pavel Semenischev

Pavel Semenischev

  • 15 Posts
  • 2 Reply Likes

Posted 1 year ago

  • 1
Photo of Michael Brado

Michael Brado, Official Rep

  • 2300 Posts
  • 314 Reply Likes
No, there is no such flag.
Photo of Pavel Semenischev

Pavel Semenischev

  • 15 Posts
  • 2 Reply Likes
Hi Michael,
I know that there is no such feature, I was told about it in technical support.
I noted my message as an idea, because it is worth considering as a future request.
I captured the radio interface traffic from the street access points.
63.6% of all traffic is the probe response packets. 54.9% of all traffic - probe responce retry packages to broadcast probe requests that contain a wildcard SSID.
It's horrible. And by and large, these packages are meaningless
Photo of Novell Red

Novell Red

  • 3 Posts
  • 0 Reply Likes
I am capturing the same thing. When the client sends out a wildcard probe request my wireshark capture files shows dozens and dozens of probe response retries.
Photo of Josef Felber

Josef Felber

  • 6 Posts
  • 0 Reply Likes
Hi,
we are observing the same issue and I think it is a severe degradation of performance. In an medium to dense access point environment we get a rising number of complaints regarding VoWLAN clients when roaming in hidden SSIDs and multi-SSID setups. The issue is especially related to "smart" phone clients as those seem to be programmed for SSID-broadcast probe request in case of hidden SSIDs. With Ruckus access points which are obviously programmed for beam steering even at that early stage when responding to probe request (which are MAC broadcasts) an avalanche of probe reports and retransmits is triggered. Most of these will not be received by the probing client as the avalanche will cause high interference in an (unavoidable) hidden station scenario. And many probe responses come from access points far to distant to have enough SNR at the clients (phones may be smart, but their antenna are weak). And last but not least it seems that the access point is first re-transmitting the first configured SSID a lot of times, then the second ... and so on. At the end it may happen that not a single probe reponse for the relevant VOIP SSID has been transmitted before there is a general response timeout. So it seems to be a good idea to make the VOIP SSID the first in the access points list, but that is not easy to achieve.
We have reports from customers that used the same smart phones for VOIP with Ubiquity hardware without roaming issues and then getting into trouble by upgrading to Ruckus.
I stongly believe there is an rather simple solution to improving the probing behaviour of Ruckus access points dramatically:
1. Reduce re-transmit number to onyl a few (2-3) for probe responses (and other messages to broadcast requests)
2. In multi-SSID send transmit and re-transmits in a round (the SSID list) robin manner.
3. Use a starting CWMin value depending on request RSSI to make responses from distant access points less prioritized than from close ones.
Kind Regards, Josef
(Edited)
Photo of Jeronimo

Jeronimo

  • 199 Posts
  • 22 Reply Likes
I hope to add the function for the AP to respond only to requests with more than a certain signal(RSSI threshold) even if other functions are excluded.

Example, Local Probe Request Threshold (dB) on Aruba.
(Edited)
Photo of Neil Mac

Neil Mac, Employee

  • 25 Posts
  • 8 Reply Likes
I'd be interested to hear more of what you are seeing here. Normally a probe request to a wildcard SSID is a broadcast, and the probe response does not get an ACK - so there should be no retries.

Beamflex only applies to data frames.

A direct probe response sent to an AP will receive an ACK, and then a direct probe response, which will also receive an ACK, but you're saying that the clients do not send the ACK as they're off doing something else, so the AP has to retry the Probe Response?

I'd like to see evidence of what you're stating above - do you have frame capture to show this? 
(Edited)
Photo of Josef Felber

Josef Felber

  • 6 Posts
  • 0 Reply Likes
Hi Neil
thanks for taking care.
To make clear, not the requests of the clients are retransmitted (at least not on the L2 level), but only the responses of the access points to these requests.
If you have access to support cases you can find some relevant packet traces in case ID 00527734, see e.g. "probe responses after ...pcap". There is also quite some discussion. (The case also contained a second issue with 5.5 MBit/s beacons on a ofdm-only SSID, but that has been split off). Case happend with older ZD firmware but more recent traces with newest FW show same behavior.
>>> Beamflex only applies to data frames.
I can only asume that the unusual high number of retransmits is due to Beamflex. Even if this is not the case the parameter for the retransmit number seems to be the same or similar to data packets.
>>> but you're saying that the clients do not send the ACK
I only can trace packets at the access point - not (in parallel) at the client. So I have to deduce that the client does not receive probe responses correctly due to high interference caused by hidden stations (whose probe response packets do not show up in the trace for that reason). There is also some indication that the APs do use a rather small CW window for probe responses and do not enlarge the CW window as a consequence of retransmitting. If this is actually true (timing of traces is not very precise) , I asume this is caused by the high priority of management frames. But with probe responses being responses to broadcast request this small CW window may degrade the effectiveness of the probe mechansim. Here it also plays a role that in a multi-SSID setup the order of probe responses for different SSIDs seems not to be optimized. Typically all SSIDs should be probe-responded one after another (or maybe SSIDs with high priority should be reponded first) and only then retransmit should start on all SSIDs. Actually it seems that the first SSID in the list will be retransmitted a lot of times and then the next SSID will follow. But there seems to be no clear rule on that. Looking a the traces there seems to be a timeout on probe responses to a certain request - which makes sense. However, caused by the obvious interference and by the order that SSIDs are probe-reponded it seems possible that some SSIDs are never responded within the timeout period. Judging by the traces timeout of probe responses seems to be around 50 msec. But that means that active probing is not very much faster than simply listening to beacons (which of course does not work with hidden SSIDs) but also that the shown and traced situation would cause a 50 msec full contention of one of the rare 2.4 GHz channels in a rather vast spacial region (Ruckus APs high RX sensitivity can be cumbersome...)

So for me it seems to make sense to:
- reduce number of retransmits of probe respones
- make the CW window more flexible to solve contention situations
- let start CWmin for probe responses at a higher value for requests with small RSSI
- optimize the order of probe-responses in multi-SSID setups

This could considerably improve roaming behavior, especially of VoIP clients on smart phones. We have indication, most smart phones will use broadcast probe request (SSID = blank) to search for hidden SSIDs although the devices are configured for these SSIDs (name and WPA). So a probe request of these such devices will triggger responses on all configured SSIDs. Dedicated VOIP phones typically will only probe the configured SSID so behvior is much less problematic.

One should also not forget that active probing in conjunction with MAC spoofing is an entrance door to DoS attacks. With the AP behavior described above the door will become a gateway.

Kind Regards, Josef
(Edited)
Photo of Neil Mac

Neil Mac, Employee

  • 25 Posts
  • 8 Reply Likes
Hi Josef - that's a lot for me to digest!

Let me have a think and get back to you. 
Photo of Josef Felber

Josef Felber

  • 6 Posts
  • 0 Reply Likes
... thanks again - and sorry for the typos.
Photo of Neil Mac

Neil Mac, Employee

  • 25 Posts
  • 8 Reply Likes
Josef - I'm part of the training team, so wouldn't normally  - I only come to the forums to offer informal advice. I've read the support case, and I see that this was looked into pretty deeply by the support team. As an employee of Ruckus, I can't really add any more, as the support team have access to the engineering teams and I know would have looked into this fully for you.

Prior to joining Ruckus, I did professional services and taught AirMagnet and CWAP classes. If I take my Ruckus hat off and look at this from a purely technical perspective,  I would want to look at the issue you're actually reporting before coming to any conclusions. That would involve doing a capture using a dedicated capture tool rather than from the access point, including next to any client stations, and also to get some proper metrics into retry rates and data rates. Knowing 802.11 as I do, I do not believe that altering Cw values will have an affect, though we would need to get into a long technical debate in order see why. Usually when delivering Ruckus training, I advise to leave settings at defaults unless you a) fully understand the implications of the change you make and b) are able to thoroughly test the results of any changes.

I don't feel I can add anything here as you have had support go through this with you, and as I'm not part of the support team, anything I add would be guesswork without having the opportunity to test properly, which unfortunately I''m not in a position to offer.

I'm sorry I can't be more useful in this case.
Photo of Josef Felber

Josef Felber

  • 6 Posts
  • 0 Reply Likes
>>> I'm sorry I can't be more useful in this case.

But then who can?

The outcome of the support case I have referenced more or less was:
"...We transmit management frames with highest priority as they are essential for maintaining/connecting clients. On the other hand, we may reduce the management frames priority after certain retransmissions (which will increase the complexity of our design) or reduce the retransmissions of management frames and we have had this discussion internally as well. This will be a major design change and so is on Engineering’s to-do list. But I don’t have a timeline on it and am not sure when we would start working on it as we will have to first spend some resources on the gain of this change overall." (Ruckus Support Engineer)

This was about one year ago. Our customers still report issues, we still see strange traces and obvoiusly other groups observe similar or even more dramatic problems.

To bring it to a point: what should I respond next time to a customer who complains about issues using certain VOIP clients with Ruckus equipment that ran ok with (much cheaper) different brand access points before?

Josef
(Edited)
Photo of Michael Brado

Michael Brado, Official Rep

  • 2300 Posts
  • 314 Reply Likes
Thanks for sharing your detailed analysis Josef.  Most of the 802.11 standards are set and I understand the support engineer, who was repeating what product marketing must have told them.

What I find most interesting in your observations, is about the SSID order and probe responsing the 1st one multiple times and not the deepest WLAN(s).  If you can show this to our engineers with a trace, I'd ask your support engineer to file a bug.  That will have more immediate effect than a feature request.

Meanwhile, it might be helpful if your VoIP WLAN was the first SSID.
Photo of Josef Felber

Josef Felber

  • 7 Posts
  • 0 Reply Likes
Hi Michael,

thanks for commenting.
>>> If you can show this to our engineers with a trace
Can you provide an upload link or an email address?

Josef
(Edited)
Photo of Michael Brado

Michael Brado, Official Rep

  • 2300 Posts
  • 314 Reply Likes
You'd need to please open a ticket, if you don't have one open already?
https://support.ruckuswireless.com/contact-us
Photo of Josef Felber

Josef Felber

  • 7 Posts
  • 0 Reply Likes
The old case had been closed. I have opened a new one with new and very recent traces. You are welcome to have an eye on the progress of this new case (00789474).
By the way, how can I change the order of SSIDs on access points? Please see also the case - there is something even more strange about the order of SSIDs appearing in the probe responses. Must be some 'hidden' parameter other than the plain list order that is relevant.
Josef
(Edited)