ICX 7250 stack switches rebooting

  • 1
  • Question
  • Updated 5 months ago
I updated to 8090 UFI on my ICX 7250s in most of the stacks unit 1 stays up but units 2 and 3 show that the have restarted sometime recently. Found this because my AP report disconnected but the monitor still thinks the stack it still up.

Just a few examples
STACKID 1  system uptime is 48 day(s) 22 minute(s) 54 second(s)
STACKID 2  system uptime is 2 day(s) 5 hour(s) 22 minute(s) 36 second(s)
The system started at 19:24:38 Pacific Mon Jul 01 2019

STACKID 1  system uptime is 37 day(s) 8 hour(s) 1 minute(s) 28 second(s)
STACKID 2  system uptime is 12 hour(s) 36 minute(s) 56 second(s)
The system started at 11:22:29 Pacific Fri Jul 12 2019

STACKID 1  system uptime is 37 day(s) 4 hour(s) 51 minute(s) 34 second(s)
STACKID 2  system uptime is 6 hour(s) 16 minute(s) 52 second(s)
STACKID 3  system uptime is 6 hour(s) 16 minute(s) 51 second(s)

STACKID 1  system uptime is 37 day(s) 5 hour(s) 5 minute(s) 7 second(s)
STACKID 2  system uptime is 11 day(s) 15 hour(s) 42 minute(s) 16 second(s)
The system started at 13:51:51 Pacific Fri Jul 12 2019
 
Photo of Network Administrator

Network Administrator

  • 10 Posts
  • 0 Reply Likes

Posted 5 months ago

  • 1
Photo of NETWizz

NETWizz

  • 184 Posts
  • 59 Reply Likes
Good Morning.  Does it give the reason?  Also anything in the log?  i.e. show logging

Something like this?

STACKID 1  system uptime is 52 day(s) 7 hour(s) 37 minute(s) 11 second(s)
STACKID 2  system uptime is 52 day(s) 7 hour(s) 37 minute(s) 22 second(s)
The system started at 23:48:16 Eastern Fri Jun 28 2019

The system : started=warm start   reloaded=by "reload"
My stack unit ID = 1, bootup role = active


Photo of Network Administrator

Network Administrator

  • 10 Posts
  • 0 Reply Likes
STACKID 1  system uptime is 48 day(s) 22 minute(s) 54 second(s)
STACKID 2  system uptime is 2 day(s) 5 hour(s) 22 minute(s) 36 second(s)
The system started at 19:24:38 Pacific Mon Jul 01 2019

The system : started=warm start   reloaded=by "reload"
My stack unit ID = 1, bootup role = active

STACKID 1  system uptime is 37 day(s) 8 hour(s) 1 minute(s) 28 second(s)
STACKID 2  system uptime is 12 hour(s) 36 minute(s) 56 second(s)
The system started at 11:22:29 Pacific Fri Jul 12 2019

The system : started=cold start
My stack unit ID = 1, bootup role = active

STACKID 1  system uptime is 37 day(s) 4 hour(s) 51 minute(s) 34 second(s)
STACKID 2  system uptime is 6 hour(s) 16 minute(s) 52 second(s)
STACKID 3  system uptime is 6 hour(s) 16 minute(s) 51 second(s)
The system started at 13:58:33 Pacific Fri Jul 12 2019

The system : started=cold start
My stack unit ID = 1, bootup role = active


Photo of NETWizz

NETWizz

  • 184 Posts
  • 59 Reply Likes
Anything that says cold start most likely lost power on one of the stack units.

Also do you have hitless failoer enabled?  If not, it is likely more stack members will reset in the event of an issue.

Did you do a "show logging" and also a "show chassis" to see if there is anything logged or wrong with the power supplies, etc. ???
Photo of Network Administrator

Network Administrator

  • 10 Posts
  • 0 Reply Likes
Hitless failover enabled.
Nothing showing up in the logs either.
Waiting for another failure to diag the issue
Photo of Hashim Bharoocha

Hashim Bharoocha, Employee

  • 64 Posts
  • 37 Reply Likes

Hey Network Administrator


Please check outputs of:
dm save 1
dm save 2
dm save 3
dm save 4

If you see no outputs above  please monitor the memory.
show memory (Check for free memory  value going up and down) If  free memory only decreasing then it might be memory leak issues,
Also have a console connection being logged to a file to see what happens around crash time.

Best is to open up a case with TAC to troubleshoot.

Hope this helps.
Thanks
Hashim


Photo of Network Administrator

Network Administrator

  • 10 Posts
  • 0 Reply Likes
I will try the console connection idea and see what happens.

Exceeded my 3 year support so I am only down to hardware replacement :-(

Photo of Hashim Bharoocha

Hashim Bharoocha, Employee

  • 64 Posts
  • 37 Reply Likes
how about going to the latest 8090 Code.  Right now it is 8090b.
Photo of NETWizz

NETWizz

  • 184 Posts
  • 59 Reply Likes
I would also setup a syslog server, so the logs don't age out.  Let's say your syslog server is 10.1.2.3:

logging host 10.1.2.3


Cold start usually indicates a power problem.  Warm start usually indicates a it was reloaded.  I know that is not hugely helpful.  I would recommend their Watchdog End-User support contract.  Ruckus's Technical Assistance Center (TAC) is very good.

The upgrade command I use on newer platforms like this is:

copy tftp system-manifest 10.1.2.3 FI08090b_Manifest.txt primary switch-image
copy tftp system-manifest 10.1.2.3 FI08090b_Manifest.txt secondary router-image

It figures out which files to copy including the matching bootrom, etc.
Photo of Network Administrator

Network Administrator

  • 10 Posts
  • 0 Reply Likes
It is already on 8090b and testing a few on 8091