AP Randomly Restart

  • 2
  • Question
  • Updated 4 years ago
Hi all Wireless Gurus,

I just want to share some problem on our WLAN and looking for suggestion that will help us to solve this problem. We are using ZD3000 and having about 60 AP. We always experiencing randomly restarting of AP with seconds interval. Most of our AP is 7363 model and the FW of our ZD is 9.8.2.0 build 15. It gave us a headache causing some user to raise ticket for slowness or intermittent in connection. Hoping someone would help us. 

Regards,

4Jonjon20
Photo of 4Jonjon20

4Jonjon20

  • 7 Posts
  • 0 Reply Likes

Posted 4 years ago

  • 2
Photo of Andrea Coppini

Andrea Coppini, Employee

  • 75 Posts
  • 38 Reply Likes
Check the ZD logs, they will give you an idea of why the AP is rebooting. Share it here in the forum
Photo of 4Jonjon20

4Jonjon20

  • 7 Posts
  • 0 Reply Likes
Hi Andrea,

Here is some logs of our ZD:



Jan 11 15:38:35 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-17th [email protected]:01:7c:37:0b:d0] heartbeats lost Jan 11 15:41:14 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-17th [email protected]:01:7c:37:0b:d0] joins with uptime [63] s and last disconnected reason [AP Restart : application reboot] 
Jan 11 15:41:19 AEV-NAC-Ruckus-ZD stamgr: stamgr_update_reboot_info():AP[c4:01:7c:37:0b:d0] reboot detail:The apmgr timer thread stock,and let rsmd reboot the AP   
Jan 11 16:07:15 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-16th [email protected]:0c:90:00:1f:50] heartbeats lost 
Jan 11 16:07:27 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-17th [email protected]:01:7c:37:0d:10] heartbeats lost 
Jan 11 16:07:29 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-12th [email protected]:0c:90:04:73:70] heartbeats lost 
Jan 11 16:09:37 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-12th [email protected]:0c:90:04:73:70] joins with uptime [283369] s and last disconnected reason [Heartbeat Loss] 
Jan 11 16:10:06 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-17th [email protected]:01:7c:37:0d:10] joins with uptime [59] s and last disconnected reason [AP Restart : application reboot] 
Jan 11 16:10:07 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-16th [email protected]:0c:90:00:1f:50] joins with uptime [62] s and last disconnected reason [AP Restart : application reboot] 
Jan 11 16:10:11 AEV-NAC-Ruckus-ZD stamgr: stamgr_update_reboot_info():AP[c4:01:7c:37:0d:10] reboot detail:The apmgr timer thread stock,and let rsmd reboot the AP   
Jan 11 16:10:11 AEV-NAC-Ruckus-ZD stamgr: stamgr_update_reboot_info():AP[8c:0c:90:00:1f:50] reboot detail:The apmgr timer thread stock,and let rsmd reboot the AP   
Jan 12 08:25:45 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[Ruckus 7363 [email protected]:3d:37:3c:89:70] detects excessive probe requests on radio [11b/g]. 
Jan 12 08:27:48 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[Ruckus 7363 [email protected]:3d:37:3c:89:70] detects excessive probe requests on radio [11b/g]. 
Jan 12 08:38:01 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-21th [email protected]:0c:90:04:75:20] heartbeats lost 
Jan 12 08:38:10 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-17th [email protected]:01:7c:37:0b:d0] heartbeats lost 
Jan 12 08:39:58 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-17th [email protected]:01:7c:37:0b:d0] joins with uptime [61189] s and last disconnected reason [Heartbeat Loss] 
Jan 12 08:40:27 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-21th [email protected]:0c:90:04:75:20] joins with uptime [59] s and last disconnected reason [AP Restart : application reboot] 
Jan 12 08:40:32 AEV-NAC-Ruckus-ZD stamgr: stamgr_update_reboot_info():AP[8c:0c:90:04:75:20] reboot detail:The apmgr timer thread stock,and let rsmd reboot the AP   
Jan 12 09:26:28 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-20th [email protected]:93:96:05:32:60] heartbeats lost 
Jan 12 09:27:27 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-16th [email protected]:0c:90:04:03:d0] heartbeats lost 
Jan 12 09:27:46 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-21th [email protected]:0c:90:04:75:20] heartbeats lost 
Jan 12 09:28:47 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[Ruckus 7363 [email protected]:3d:37:3c:89:70] detects excessive probe requests on radio [11b/g]. 
Jan 12 09:30:24 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-16th [email protected]:0c:90:04:03:d0] joins with uptime [62] s and last disconnected reason [AP Restart : watchdog timeout ] 
Jan 12 09:30:35 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-21th [email protected]:0c:90:04:75:20] joins with uptime [60] s and last disconnected reason [AP Restart : watchdog timeout ] 
Jan 12 09:30:57 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[Ruckus 7363 [email protected]:3d:37:3c:89:70] detects excessive probe requests on radio [11b/g]. 
Jan 12 09:46:29 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():Lost contact with AP[NAC-Wifi-20th [email protected]:93:96:05:32:60] 
Jan 12 09:51:47 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-16th [email protected]:0c:90:00:1f:50] heartbeats lost 
Jan 12 09:52:21 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-17th [email protected]:01:7c:37:0d:10] heartbeats lost 
Jan 12 09:52:28 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-20th [email protected]:93:96:1b:8e:00] heartbeats lost 
Jan 12 09:54:04 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-20th [email protected]:93:96:1b:8e:00] joins with uptime [73742] s and last disconnected reason [Change State Response Loss] 
Jan 12 09:54:22 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-17th [email protected]:01:7c:37:0d:10] joins with uptime [52] s and last disconnected reason [AP Restart : application reboot] 
Jan 12 09:54:27 AEV-NAC-Ruckus-ZD stamgr: stamgr_update_reboot_info():AP[c4:01:7c:37:0d:10] reboot detail:The apmgr timer thread stock,and let rsmd reboot the AP   
Jan 12 09:57:01 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-20th [email protected]:93:96:05:32:60] joins with uptime [64] s and last disconnected reason [AP Restart : watchdog timeout ] 
Jan 12 09:57:47 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-16th [email protected]:0c:90:00:1f:50] joins with uptime [64124] s and last disconnected reason [Join Fail] 
Jan 12 10:18:48 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-21th [email protected]:0c:90:04:75:20] heartbeats lost 
Jan 12 10:21:10 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-21th [email protected]:0c:90:04:75:20] joins with uptime [59] s and last disconnected reason [AP Restart : application reboot] 
Jan 12 10:21:15 AEV-NAC-Ruckus-ZD stamgr: stamgr_update_reboot_info():AP[8c:0c:90:04:75:20] reboot detail:The apmgr timer thread stock,and let rsmd reboot the AP   
Jan 12 10:58:54 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-19th [email protected]:01:7c:35:fb:60] heartbeats lost 
Jan 12 11:01:52 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-19th [email protected]:01:7c:35:fb:60] joins with uptime [60] s and last disconnected reason [AP Restart : application reboot] 
Jan 12 11:01:56 AEV-NAC-Ruckus-ZD stamgr: stamgr_update_reboot_info():AP[c4:01:7c:35:fb:60] reboot detail:The apmgr timer thread stock,and let rsmd reboot the AP   
Jan 12 11:14:23 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-12th [email protected]:0c:90:04:73:70] heartbeats lost 
Jan 12 11:14:23 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-21th [email protected]:0c:90:04:75:20] heartbeats lost 
Jan 12 11:15:40 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-21th [email protected]:0c:90:04:75:20] joins with uptime [3328] s and last disconnected reason [Heartbeat Loss] 
Jan 12 11:15:41 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-12th [email protected]:0c:90:04:73:70] joins with uptime [352135] s and last disconnected reason [Heartbeat Loss] 
Jan 12 12:25:29 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-21th [email protected]:0c:90:04:75:20] joins with uptime [59] s and last disconnected reason [AP Restart : application reboot] 
Jan 12 12:25:33 AEV-NAC-Ruckus-ZD stamgr: stamgr_update_reboot_info():AP[8c:0c:90:04:75:20] reboot detail:apmgr, Receives reset command from ZD in Run state   
Jan 12 13:14:22 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-9th [email protected]:0c:90:03:ab:50] heartbeats lost 
Jan 12 13:14:42 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-21th [email protected]:0c:90:04:75:20] heartbeats lost 
Jan 12 13:16:33 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-21th [email protected]:0c:90:04:75:20] joins with uptime [3123] s and last disconnected reason [Heartbeat Loss] 
Jan 12 13:16:42 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-9th [email protected]:0c:90:03:ab:50] joins with uptime [99425] s and last disconnected reason [Heartbeat Loss] 
Jan 12 13:52:10 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-21th [email protected]:0c:90:04:75:20] heartbeats lost 
Jan 12 14:01:05 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-21th [email protected]:0c:90:04:75:20] joins with uptime [239] s and last disconnected reason [AP Restart : power cycle] 
Jan 12 14:12:18 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-16th [email protected]:0c:90:00:1f:50] heartbeats lost 
Jan 12 14:14:34 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-16th [email protected]:0c:90:00:1f:50] joins with uptime [54] s and last disconnected reason [AP Restart : application reboot] 
Jan 12 14:14:39 AEV-NAC-Ruckus-ZD stamgr: stamgr_update_reboot_info():AP[8c:0c:90:00:1f:50] reboot detail:The apmgr timer thread stock,and let rsmd reboot the AP   
Jan 12 14:38:39 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-9th [email protected]:0c:90:03:ab:50] heartbeats lost 
Jan 12 14:38:55 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-21th [email protected]:0c:90:04:75:20] heartbeats lost 
Jan 12 14:39:23 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-17th [email protected]:01:7c:37:0d:10] heartbeats lost 
Jan 12 14:39:28 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-15th [email protected]:0c:90:04:72:a0] heartbeats lost 
Jan 12 14:39:49 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-14th [email protected]:01:7c:37:0d:00] heartbeats lost 
Jan 12 14:41:14 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-17th [email protected]:01:7c:37:0d:10] joins with uptime [17261] s and last disconnected reason [Heartbeat Loss] 
Jan 12 14:41:14 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-15th [email protected]:0c:90:04:72:a0] joins with uptime [4068024] s and last disconnected reason [Heartbeat Loss] 
Jan 12 14:41:41 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-9th [email protected]:0c:90:03:ab:50] joins with uptime [57] s and last disconnected reason [AP Restart : application reboot] 
Jan 12 14:41:44 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-21th [email protected]:0c:90:04:75:20] joins with uptime [59] s and last disconnected reason [AP Restart : application reboot] 
Jan 12 14:41:44 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-14th [email protected]:01:7c:37:0d:00] joins with uptime [17169643] s and last disconnected reason [Heartbeat Loss] 
Jan 12 14:41:45 AEV-NAC-Ruckus-ZD stamgr: stamgr_update_reboot_info():AP[8c:0c:90:03:ab:50] reboot detail:The apmgr timer thread stock,and let rsmd reboot the AP   
Jan 12 14:41:48 AEV-NAC-Ruckus-ZD stamgr: stamgr_update_reboot_info():AP[8c:0c:90:04:75:20] reboot detail:The apmgr timer thread stock,and let rsmd reboot the AP   
Jan 12 15:05:37 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-17th [email protected]:01:7c:37:0c:70] heartbeats lost 
Jan 12 15:05:38 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-16th [email protected]:0c:90:00:1f:50] heartbeats lost 
Jan 12 15:05:38 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-20th [email protected]:93:96:1b:8e:00] heartbeats lost 
Jan 12 15:05:55 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-21th [email protected]:93:96:05:2c:e0] heartbeats lost 
Jan 12 15:07:58 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-21th [email protected]:93:96:05:2c:e0] joins with uptime [87976] s and last disconnected reason [Heartbeat Loss] 
Jan 12 15:08:24 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-16th [email protected]:0c:90:00:1f:50] joins with uptime [51] s and last disconnected reason [AP Restart : application reboot] 
Jan 12 15:08:28 AEV-NAC-Ruckus-ZD stamgr: stamgr_update_reboot_info():AP[8c:0c:90:00:1f:50] reboot detail:The apmgr timer thread stock,and let rsmd reboot the AP   
Jan 12 15:08:30 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-17th [email protected]:01:7c:37:0c:70] joins with uptime [51] s and last disconnected reason [AP Restart : application reboot] 
Jan 12 15:08:31 AEV-NAC-Ruckus-ZD syslog: eventd_to_syslog():AP[NAC-Wifi-20th [email protected]:93:96:1b:8e:00] joins with uptime [60] s and last disconnected reason [AP Restart : application reboot] 
Jan 12 15:08:34 AEV-NAC-Ruckus-ZD stamgr: stamgr_update_reboot_info():AP[c4:01:7c:37:0c:70] reboot detail:The apmgr timer thread stock,and let rsmd reboot the AP   
Jan 12 15:08:35 AEV-NAC-Ruckus-ZD stamgr: stamgr_update_reboot_info():AP[58:93:96:1b:8e:00] reboot detail:The apmgr timer thread stock,and let rsmd reboot the AP   
Photo of Andrea Coppini

Andrea Coppini, Employee

  • 75 Posts
  • 38 Reply Likes
APs are rebooting due to a 'heartbeat loss' which means they lost contact with the ZD. Check your cabling. I see most of the APs are upper floor APs, could it be your cables are too long (over 80m?). What about your uplinks? Do you have any kind of broadcast storm control enabled on your switches? Finally replace the patch lead going into the ZD.

I also see some alerts related to 'excessive probe requests'. This would cause the AP to temporarily blacklist that client as a form of protection. If the issue persists, either seek-and-destroy the client or turn off protection from the WIPS section of the ZD.
Photo of 4Jonjon20

4Jonjon20

  • 7 Posts
  • 0 Reply Likes
We have switches in every 2 floors where the AP will connect. We don't have also configure storm control on our switches. I also try to connect the AP to other switch for I'am assuming that the POE is the problem, but it is not. As of the moment I have experiencing restarting of APs.
Photo of Andrea Coppini

Andrea Coppini, Employee

  • 75 Posts
  • 38 Reply Likes
Check the ZD side, check the switch logs, look at the interface error counters for hints. Try replacing the ZD's patch lead
Photo of Sean

Sean

  • 349 Posts
  • 93 Reply Likes
Increase the Auto recovery to say 30mins or so:



That should solve your issue as heartbeat packet is not priority and in a loaded network the lower value can have an affect on AP's rebooting.

This topic has been asked about before, albeit based on another AP model:

https://forums.ruckuswireless.com/ruckuswireless/topics/occasional-heartbeat-losses-on-r710

Good Luck
(Edited)
Photo of Andrea Coppini

Andrea Coppini, Employee

  • 75 Posts
  • 38 Reply Likes
Try to identify the cause of the heartbeat losses first, it is usually cabling related, it's rarely due to load.

Changing the reboot timeout is only covering the symptoms, not fixing the root cause. The APs might not reboot, but the clients will have trouble connecting since every connection and roam needs to go through the ZD. To avoid relying on the ZD you would need to run the SSID in Autonomous mode, but again this would simply be brushing the symptoms under the carpet.
Photo of Sean

Sean

  • 349 Posts
  • 93 Reply Likes
I have seen this loads of times and by setting the value to higher than the default it sorts the issue.

Additionally when the network is quiet I never see AP's rebooting, its always under load.
Photo of Andrea Coppini

Andrea Coppini, Employee

  • 75 Posts
  • 38 Reply Likes
Then you must have network saturation somewhere between your AP and ZD. There are loads of deployments with hundreds of APs and no issues with heartbeat loss or AP reboots. The most common network issue I see is duplex mismatch in the uplinks causing a lot of collisions.
Photo of Sean

Sean

  • 349 Posts
  • 93 Reply Likes
Re your comment:
Then you must have network saturation somewhere between your AP and ZD
Correct, this is what I am saying.

When its busy the heartbeat frame gets dropped, so it's best to set it to another value.

I have over 20,000 ruckus AP's in my network varying from ZD to SCG deployments and I see it all the time.
Photo of 4Jonjon20

4Jonjon20

  • 7 Posts
  • 0 Reply Likes
Yeah, you're right.. When there is small amount of user connecting to the WLAN, the AP is not experience this problem, but when it goes to regular working schedule, that the problem existed. If the network saturation is the root cause of this problem, what is your recommended solutions? Thank you all for trying to help me. 
Photo of 4Jonjon20

4Jonjon20

  • 7 Posts
  • 0 Reply Likes
Yeah, you're right.. When there is small amount of user connecting to the WLAN, the AP is not experience this problem, but when it goes to regular working schedule, that the problem existed. If the network saturation is the root cause of this problem, what is your recommended solutions? Thank you all for trying to help me. 
Photo of 4Jonjon20

4Jonjon20

  • 7 Posts
  • 0 Reply Likes
I forgot to mention also, that the reboot timeout of our ZD is set to 60mins. Thanks.
Photo of Andrea Coppini

Andrea Coppini, Employee

  • 75 Posts
  • 38 Reply Likes
An occasional 'heartbeat loss' error is normal -although not desirable- if the AP is very busy.  If the AP reboots, it means that it couldn't reach the ZD/SZ for an extended period which isn't normal.

Start by running Speedflex between the ZD/SZ and the AP.  Simply click on the speedo icon next to the AP in the AP list.  This will do a throughput test and should run at 0% packet loss and good wired throughput.  Then look at your switches - are your switchports all running at 1000/FDX or are some of them running Half Duplex?  Check the error counters on the switches.  In a good wired network you should see zero errors.