Users unable to get IP address after several months running normally!

  • 1
  • Question
  • Updated 8 months ago
  • Acknowledged
My ZD is 9.9.0.0 build 216 and i can ́t do any change right now
I have several old APs which doesn ́t allow me to upgrade

I ́m running this very same solution in the last 3 years with no problems at all

Yesterday, some users complain about the were unable to get IP addresses

My DHCP Server is a Windows 2012
My Ruckus environment have 12 APs and several SSIDs and Serveral VLANs
My network uses HP/3COM Switchs models 2600, 5550, etc


After doing a investigation:

The problem only affected a single SSID and is related to a single VLAN
At DHCP Server logs, i can see SOME clients are still able to get IPs, but randomly, some users can ́t. After some network tracing, i can see DHCP requests arriving at DHCP Server and all requests that come to DHCP are served with no problem. 

But the clients with problems, the DHCP request does not arrive at the interface, for some reason, the traffic was blocked at some point and the DHCP packets were not arriving at the DHCP Server.. Again, for some clients, other can connect with no problem.

Yesterday, afet several months of uptime i rebooted my ZD and all things got normal again. Ok, maybe it was a "glitch", anyway, after the reboot.. everyting got back normal

Today, the problem is back! After some tests i did the following: I changed the VLAN ID of the SSID and clients were able to get IP address again (each VLAN has its own IP range), so, in the new range, things were nromal. And, after re-configuring back as before, the problem was gone. First a reboot, now a little change in configuration to solve the problem. Weird!

Some monutes ago, the problem is back again! and i didnpt reboot yet, but now, no matter what change i do, the problem is here, some clients can leas IPs and some not (randomly) and if the pacjets arrive at DHCP Server they ́re leased with no problem, but some packets simply didn ́t arrive at the DHCP Server interface to be processed

So, what ́s next? What should I do?
Photo of flavio borup

flavio borup

  • 13 Posts
  • 1 Reply Like

Posted 8 months ago

  • 1
Photo of Nick Zourdos

Nick Zourdos

  • 29 Posts
  • 5 Reply Likes
Are your DHCP scopes filling up? How many clients are connected to the network, and what is the subnet mask of each of these VLANs?

Another possibility is that you have a rogue DHCP server running on your network. Try running a tool like this one to see how many DHCP servers are being detected, or alternatively you could run a packet capture using Wireshark to see how many servers are responding to your client's DHCP request. 
Photo of flavio borup

flavio borup

  • 13 Posts
  • 1 Reply Like
No, plenty of addresses, more than 80% of a full Class-C range available
I ́m sure there are no rogue DHCPs at the network, but it ́s a good idea, to re-check


Photo of Andrew Giancola

Andrew Giancola

  • 121 Posts
  • 40 Reply Likes
I'd run some Wireshark, and look for the replies. could be your IP helper statement or something else.
`set capture eth0 stream`
where eth0 is your active ethernet on the AP ( 0 or 1 )
Then go to Wireshark, (windows version only) and set up a remote interface by typing the IP of your AP, then the port 2002. Many interfaces will start to appear, only select eth0. Once there, use the following filter
 `bootp`
Hope that helps.
Photo of flavio borup

flavio borup

  • 13 Posts
  • 1 Reply Like
Great, i like it, but as a first-tme suer for Ap ́s cmd-line, see my commands below:

rkscli: get eth

Port  Interface  802.1X  Logical Link  Physical Link      Label
-----------------------------------------------------------------
2     eth2       None    Up            Up 100Mbps full    10/100/1000 PoE
1     eth1       None    Down          Down               10/100
0     eth0       None    Down          Down               10/100
OK

rkscli: set capture eth2 stream
eth2: Invalid wlan interface name


rkscli:  get ipmode wan
IP Mode: dual
OK

rkscli:  get ipmode lan
IP Mode: ipv4
OK

rkscli:  get ipaddr  wan
IP Address: (static, vlan 1), IP: 10.121.60.59  Netmask 255.255.255.0  Gateway 10.121.60.1
OK
Photo of Andrew Giancola

Andrew Giancola

  • 121 Posts
  • 40 Reply Likes
set capture eth2 stream
Photo of flavio borup

flavio borup

  • 13 Posts
  • 1 Reply Like
rkscli: set capture eth2 stream
eth2: Invalid wlan interface name
Photo of flavio borup

flavio borup

  • 13 Posts
  • 1 Reply Like
i found the get wlanlist and there is a mapping between wlanX and SSIDs
but...

rkscli: set capture wlan0 (or any wlan in the list) stream
parameter error

Photo of Andrew Giancola

Andrew Giancola

  • 121 Posts
  • 40 Reply Likes
in wireshark(WS);
capture>options
in the WS - Capture interfaces box hit the "manage interfaces" button
in the Manage Interfaces Box, hit "Remote Interfaces" tab
on the Manage Interfaces Tab, hit the + on the lower left
put in your IP 10.121.60.59 and use port 2002

Photo of Andrew Giancola

Andrew Giancola

  • 121 Posts
  • 40 Reply Likes
Ugh. i mis-typed the command.

rkscli: set capture eth2 stream


Photo of Andrew Giancola

Andrew Giancola

  • 121 Posts
  • 40 Reply Likes
Try hitting `set cap` then use tab to complete.

Photo of flavio borup

flavio borup

  • 13 Posts
  • 1 Reply Like
ok, now i got, wlan100 for 2.4 GHz radio, i ́m testing WS remote capture, and i can see a lot of "sub interfaces" available
Photo of flavio borup

flavio borup

  • 13 Posts
  • 1 Reply Like
now, after selecting the wlan40 (my case)

---------------------------

---------------------------
The capture session could not be initiated on interface 'rpcap://[10.121.60.59]:2002/wlan40' (Unknown error (pcap bug; actual error cause not reported)).
---------------------------



Please check that "rpcap://[10.121.60.59]:2002/wlan40" is the proper interface.


Help can be found at:

       https://wiki.wireshark.org/WinPcap
       https://wiki.wireshark.org/CaptureSetup

---------------------------
OK   
---------------------------

Photo of flavio borup

flavio borup

  • 13 Posts
  • 1 Reply Like
---------------------------

---------------------------
The capture session could not be initiated on interface 'rpcap://[10.121.60.59]:2002/eth2' (Unknown error (pcap bug; actual error cause not reported)).
---------------------------



Please check that "rpcap://[10.121.60.59]:2002/eth2" is the proper interface.


Help can be found at:

       https://wiki.wireshark.org/WinPcap
       https://wiki.wireshark.org/CaptureSetup

---------------------------
OK   
---------------------------

Photo of Andrew Giancola

Andrew Giancola

  • 121 Posts
  • 40 Reply Likes
can you attach a screen shot? of your remote session failing?

Photo of Andrew Giancola

Andrew Giancola

  • 121 Posts
  • 40 Reply Likes
also, perhaps we'll need to use br0 or the wlan your client is failing on.

lacking the pcap from the AP you can run wireshark directly on the server. Are you forwarding  DHCP requests via IP helper, or is your dhcp on the same subnet?
Photo of flavio borup

flavio borup

  • 13 Posts
  • 1 Reply Like
After trying the "ANY" interface, i ́m able to see packets and now i ́m filtering by "Bootp" and i think, for now, that i ́m able to start diagnosing something
Photo of Andrew Giancola

Andrew Giancola

  • 121 Posts
  • 40 Reply Likes
Excellent! Just remember, to make sure your eth interfaces on the computer your using aren't selected.

Best of Luck Flavio!
Photo of flavio borup

flavio borup

  • 13 Posts
  • 1 Reply Like
Now, going on by the main/original subject:

I ́ve applied a filter based on DHCP Transaction ID : (bootp.id == 0xd585f0f8) && ((bootp.option.dhcp == 1) || (bootp.option.dhcp == 5))

I can see 5 DHCP Discovers, clearly showing wifi client trying to get an Ip address, but for now, i have no idea on where the hell is going on

To where are these packets going? If there was a rogue DHCP Server, should I see the (rogue) dhcp server packets responses? SO, i ́m 99% sure that there is no rogue DHCP operating in the segment

Is consistent, client can get an IP and packets are no arriving at the DHCP Server (i can see that ncapturing packets at the real DHCP Server)

Packets are getting lose on their way...
Photo of Andrew Giancola

Andrew Giancola

  • 121 Posts
  • 40 Reply Likes
ok so you're looking for the server to ACK or NAK these replies. do you see that?
Photo of flavio borup

flavio borup

  • 13 Posts
  • 1 Reply Like

Full DHCP Discover captured by WS, at the remote interface connected to the AP:


Frame 45478: 360 bytes on wire (2880 bits), 360 bytes captured (2880 bits) on interface 0
    Interface id: 0 (rpcap://[10.121.60.59]:2002/any)
    Encapsulation type: Linux cooked-mode capture (25)
    Arrival Time: Apr 16, 2019 16:47:29.913063000 E. South America Standard Time
    [Time shift for this packet: 0.000000000 seconds]
    Epoch Time: 1555444049.913063000 seconds
    [Time delta from previous captured frame: 0.194345000 seconds]
    [Time delta from previous displayed frame: 14.868523000 seconds]
    [Time since reference or first frame: 41.932491000 seconds]
    Frame Number: 45478
    Frame Length: 360 bytes (2880 bits)
    Capture Length: 360 bytes (2880 bits)
    [Frame is marked: False]
    [Frame is ignored: False]
    [Protocols in frame: sll:ethertype:ip:udp:bootp]
    [Coloring Rule Name: UDP]
    [Coloring Rule String: udp]
Linux cooked capture
    Packet type: Broadcast (1)
    Link-layer address type: 1
    Link-layer address length: 6
    Source: SamsungE_45:8a:17 (d0:59:e4:45:8a:17)
    Protocol: IPv4 (0x0800)
Internet Protocol Version 4, Src: 0.0.0.0, Dst: 255.255.255.255
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes (5)
    Differentiated Services Field: 0x10 (DSCP: Unknown, ECN: Not-ECT)
        0001 00.. = Differentiated Services Codepoint: Unknown (4)
        .... ..00 = Explicit Congestion Notification: Not ECN-Capable Transport (0)
    Total Length: 344
    Identification: 0x0000 (0)
    Flags: 0x02 (Don't Fragment)
        0... .... = Reserved bit: Not set
        .1.. .... = Don't fragment: Set
        ..0. .... = More fragments: Not set
    Fragment offset: 0
    Time to live: 64
    Protocol: UDP (17)
    Header checksum: 0x3986 [validation disabled]
    [Header checksum status: Unverified]
    Source: 0.0.0.0
    Destination: 255.255.255.255
    [Source GeoIP: Unknown]
    [Destination GeoIP: Unknown]
User Datagram Protocol, Src Port: 68, Dst Port: 67
    Source Port: 68
    Destination Port: 67
    Length: 324
    Checksum: 0x4162 [unverified]
    [Checksum Status: Unverified]
    [Stream index: 28]
Bootstrap Protocol (Discover)
    Message type: Boot Request (1)
    Hardware type: Ethernet (0x01)
    Hardware address length: 6
    Hops: 0
    Transaction ID: 0xd585f0f8
    Seconds elapsed: 32
    Bootp flags: 0x0000 (Unicast)
        0... .... .... .... = Broadcast flag: Unicast
        .000 0000 0000 0000 = Reserved flags: 0x0000
    Client IP address: 0.0.0.0
    Your (client) IP address: 0.0.0.0
    Next server IP address: 0.0.0.0
    Relay agent IP address: 0.0.0.0
    Client MAC address: SamsungE_45:8a:17 (d0:59:e4:45:8a:17)
    Client hardware address padding: 00000000000000000000
    Server host name not given
    Boot file name not given
    Magic cookie: DHCP
    Option: (53) DHCP Message Type (Discover)
        Length: 1
        DHCP: Discover (1)
    Option: (61) Client identifier
        Length: 7
        Hardware type: Ethernet (0x01)
        Client MAC address: SamsungE_45:8a:17 (d0:59:e4:45:8a:17)
    Option: (57) Maximum DHCP Message Size
        Length: 2
        Maximum DHCP Message Size: 1500
    Option: (60) Vendor class identifier
        Length: 18
        Vendor class identifier: android-dhcp-7.1.2
    Option: (12) Host Name
        Length: 24
        Host Name: android-4e8617727f4e6848
    Option: (55) Parameter Request List
        Length: 10
        Parameter Request List Item: (1) Subnet Mask
        Parameter Request List Item: (3) Router
        Parameter Request List Item: (6) Domain Name Server
        Parameter Request List Item: (15) Domain Name
        Parameter Request List Item: (26) Interface MTU
        Parameter Request List Item: (28) Broadcast Address
        Parameter Request List Item: (51) IP Address Lease Time
        Parameter Request List Item: (58) Renewal Time Value
        Parameter Request List Item: (59) Rebinding Time Value
        Parameter Request List Item: (43) Vendor-Specific Information
    Option: (255) End
        Option End: 255
    Padding: 00



Photo of flavio borup

flavio borup

  • 13 Posts
  • 1 Reply Like
At the DHCP Server, using network traffica capture i can see all ACKs and NACKs (just a few, filtered-out by my policies)  and i can see some clients from that particular VLAN working with no problem, but for some clients (randomly) the packets didn ́t arrive in any way in the DHCP Server interface. I ́m pretty confident that the DHCP Server is running and doing its job, i never got a single problem, if the packet arrives ath the DHCP Server, the IP is leased, the clients with problems, the packets are not reaching DHCP Server
Photo of Andrew Giancola

Andrew Giancola

  • 121 Posts
  • 40 Reply Likes
possible you may need to reboot that AP, next thing i'd check is ARP entries for the interface mapped to the proper Vlans on your switched network.

Photo of Andrew Giancola

Andrew Giancola

  • 121 Posts
  • 40 Reply Likes
Flavio, yesterday had a similar incident to yours, though the problem presents with different equipment ( was happening to all Wireless clients on Vlan 10, 3 and all of our hardwired credit cards on vlan 7 we ended up needing to reboot a couple of switches. best as we can tell, we had some mac flapping that ended up with Spanning Tree pruning a few vlans. We're running a composite of Cisco and Arris, so rebooting the Cisco resolved the issue immediately. 

Photo of flavio borup

flavio borup

  • 13 Posts
  • 1 Reply Like
Thanks for the input. Thinking about it, i ́m not confident that this is my case, but sure, is very unlikely, but not impossible. IN my case we have 2 X Core L3 Switchs and a dozen os L2 Edge Switches, we ́re mapping the case, to check if they ́re could be related to a series fo switchs or if thyey ́re spreaded arround the corp

untill now, no rogue DHCP servers were found (despite some very weird Chinese-MAC-address  found) 

The WS networ traffic captures are still running, we ́re a little limited on when we can do changes and tests because we got error sometimes, so we ́re are still looking for DHCP packets on eth2 of the AP
Photo of Andrew Giancola

Andrew Giancola

  • 121 Posts
  • 40 Reply Likes
Best of luck, don't forget to investigate implementing DHCP IP SNOOPING commands on your switches equivalent. this will limit a Rogue DHCP server's impact on your network. Also, you might consider moving those Chinese MACs to a Nul Vlan, and or checking for duplicate IP's associated with the network. ARPING is your friend when it comes to finding these same network duplicate IPs.