SSH Stops after 2 weeks

  • 1
  • Question
  • Updated 5 days ago
  • Acknowledged
We have multiple ICX 7750 and 7450 in our DCs. Randomly we get SSH failing after approx 2 to 4 weeks. To fix we have to go to DC and console onto device and apply crypto zeroise then crypto generate CLI commands. This is impacting and time consuming. Any ideas why it is doing this? Code version is 8.0.40 and IP SSH idle-time has also been set to 10 mins but still the issue occurs?

Many thanks Ben
Photo of Ben Middleton

Ben Middleton

  • 3 Posts
  • 0 Reply Likes

Posted 6 days ago

  • 1
Photo of William Hadley

William Hadley, Employee

  • 7 Posts
  • 4 Reply Likes
Do you see the keys in "show ip ssh config". It would be the line host key:
Photo of Ben Middleton

Ben Middleton

  • 3 Posts
  • 0 Reply Likes
We have fixed the current switch failure so cannot check at the moment. Obviously with SSH working now we see the keys. If/when it re-occurs I will check this prior to fixing and also grab a show tech

Appreciate the help so far, I'll keep you posted

Many thanks
Ben
Photo of NETWizz

NETWizz

  • 48 Posts
  • 15 Reply Likes
It is probably a bug.  We have NOT encountered this though years ago there was a bug where we would not be able to connect to something that hadn't been connected to in a long time.  When you would open your SSH session it would hang for a while then fail... but a subsequent attempt worked fine because the process started up on the switches.

I do not remember what version that was presumably some ancient 08.0.30 code from years ago.  Honestly, everything has been rock solid for us on all our 64xx, 6610, 7150's and 7450's.

We are running 08.0.80ca on everything that supports it, and it has been bug-free for us as is 08.0.30sa on everything that doesn't support the 08.0.80ca.

Here is how we do our SSH configuration (I am not saying you are doing yours wrong on that collectively this is the configuration options we use that have something to do with SSH...)...


crypto key zeroize rsa
crypto key zeroize dsa
crypto key generate rsa mod 2048


Then we make a list where device management can come from... edit to suit your taste:

ip access-list standard 99
 permit host 10.1.2.3
 permit 10.1.0.0 0.0.255.255
!
exit

Without Radius we use this block:

aaa authentication web-server default local
aaa authentication enable default local
aaa authentication login default local
aaa authentication login privilege-mode


enable aaa console

console timeout 30


no telnet server

no web-management http
web-management https
!
ssh access-group 99
web access-group 99
!

ip ssh  authentication-retries 2
ip ssh  timeout 30
ip ssh  idle-time 30
ip ssh  scp disable
ip ssh  encryption disable-aes-cbc


I would probably try a code update and consider adding or changing some of your configuration options to include some of the above arguments... certainly there is quite a bit more than SSH going on here, but I included related material.

Thank you

Photo of Ben Middleton

Ben Middleton

  • 3 Posts
  • 0 Reply Likes
Thanks for your prompt reply,

We have most of this configured. For a bit more detail. the ssh works fine for a period of time then just stops so we cannot use ssh to connect. This is time consuming to attend a remote DC to connect on console to fix. We had a suspicion a security monitoring system we have which runs multiple connections to each switch might be causing this problem hence we dropped the IP ssh  idletime to 10 mins in the hope we do not overload the amount of ssh sessions on one device.
The real confusing part is on another network we have with the same setup and same security monitoring tool, and ICXs used for its OoB network and running the same version of code never has the issue..
Photo of William Hadley

William Hadley, Employee

  • 7 Posts
  • 4 Reply Likes
A few things: When did this start happening? how long have the switches been up? Does the monitoring tool close the ssh sessions properly? There is a limit of 5 active sessions unless you modify system parameters. Does it still happen after a reload of the switch?
Photo of NETWizz

NETWizz

  • 48 Posts
  • 15 Reply Likes
Okay, so the strange thing is we actually have Solar Winds Orion sending SSH requests often doing things like backing up configurations, yet we have not experienced this problem.

I suspect if you set both the ssh idletime and time-out that it should fix this though almost certainly there is already a default value, and you probably already set both of these settings being you have been working on it.  Most likley this is a bug in your specific code.

Do all of your switches run the same code version?  Do all of your switches have the same problem?  If yes or no, is there a correlation to which ones exhibit the problem and their version of the FastIron, Iron Ware codebase?
(Edited)
Photo of Jijo Panangat

Jijo Panangat, Employee

  • 8 Posts
  • 2 Reply Likes
You can run 'show who' to see the the number of active ssh sessions on the switch in problem state and 
'kill ssh <ssh session number>'  This is to forcefully kill an existing ssh session and make room for new connection.