Darn! We’ve lost connectivity to the Cisco HyperFlex Controller AGAIN. What could possibly be wrong?
Well, the problem relates to how Cisco HyperFlex uses a floating IP address across multiple Storage Controller VM MAC addresses, and how ACI maintains the IP to MAC address table.
Let’s start with the physical picture.
The focus is on the three Storage Controller VMs (SCVMs) in the above picture. The heavy purple lines show the resolved path to the default gateway. Each SCVM has a its own MAC address and ONE of the SCVMs shares that MAC address with the HyperFlex Management IP address (172.16.19.30).
So the ACI fabric sees the IP to MAC resolution like this:
apic1# show endpoints | grep "172\.16\.19\.3[0-3]" 00:0C:29:82:4F:B2 172.16.19.33 learned 101 eth1/32 vlan-119 not-applicable 00:0C:29:90:F4:70 172.16.19.31 learned 101 eth1/32 vlan-119 not-applicable 00:0C:29:A9:B7:0D 172.16.19.32 learned 101 eth1/32 vlan-119 not-applicable 00:0C:29:A9:B7:0D 172.16.19.30 learned 101 eth1/32 vlan-119 not-applicable
Note how as far as ACI is concerned, the MAC address 00:0C:29:A9:B7:0D is shared by both 172.16.19.32 and 172.16.19.30.
The problem we are having has been caused by the fact that the floating HyperFlex Management IP address has actually “floated” to another node (172.16.19.31). Any traffic that needs to go to 172.16.19.30 now needs to go to MAC 00:0C:29:90:F4:70. But ACI hasn’t learned this, and never will unless it sees a packet FROM 172.16.19.30 sourced with MAC 00:0C:29:90:F4:70.
Packets from my management PC (172.16.5.102) addressed to 172.16.19.30 reach ACI, and ACI routes them to the correct subnet, but sends them to MAC 00:0C:29:A9:B7:0D. Here’s a few ICMP packets I captured on the 00:0C:29:A9:B7:0D host that prove this.
root@hxscvm2:~# tcpdump -i eth0 -n icmp tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes 11:48:16.187163 IP 172.16.5.102 > 172.16.19.30: ICMP echo request, id 1, seq 26, length 40 11:48:20.959581 IP 172.16.5.102 > 172.16.19.30: ICMP echo request, id 1, seq 27, length 40
Now let’s be clear – this is a problem caused by HyperFlex using multiple MAC addresses for a single IP address more than it is that ACI won’t forget the old entry. The same problem could occur if a normal router was used such as if a normal topology like this was used.
The difference here is that on the router, there would be an ARP cache that would hold the mapping of IP 172.16.19.30 to MAC 00:0C:29:A9:B7:0D that would time out after some time – typically 4 hours. But ACI doesn’t do anything like this by default. So long as the MAC says alive, and it will so long as 172.16.19.32 keeps sending packets, ACI keeps the entry. By default. But there is a fix. Kind of.
In ACI, there is an option for IP Aging created specifically for this kind of scenario. To configure IP Aging (and I consider this would be best practice to ALWAYS enable IP Aging) you need to navigate to the System > System Settings >> Endpoint Controls >| [IP Aging] tab.
Once IP Aging has been enabled, as explained in the ACI Fabric Endpoint Learning White Paper,
“IP aging policy tracks and ages unused IP addresses on an endpoint. Tracking is performed by using the endpoint retention policy, which is configured for the bridge domain to send ARP requests (for IPv4) and neighbor solicitations (for IPv6) at 75 percent of the local endpoint aging interval. When no response is received from an IP address that IP address is aged out.”
In our case, the default endpoint retention policy was in use, so the aging time was at 15 minutes. And sure enough, 12 minutes (≅ 75% of 15 mins) after enabling the IP Aging option, the SCVM currently hosting the floating IP received an ARP request from the default gateway IP:
12:09:45.833118 ARP, Request who-has 172.16.19.30 (ff:ff:ff:ff:ff:ff) tell 172.16.19.1, length 46 12:09:45.833134 ARP, Reply 172.16.19.30 is-at 00:0c:29:90:f4:70, length 28
And so Happy HyperFlex days were here again from this point onwards. I was able to access the HX Management IP address (172.16.19.30) from my management PC.
BTW – if you are experiencing this problem and you don’t want to wait the 12 minutes for the IP to be re-mapped by ACI, you can issue the following command at the APIC CLI to clear the IP immediately:
apic1# fabric leaf_id clear system internal epm endpoint key vrf vrf:name ip 172.16.19.30
I have done this a few times in the past because in our lab environment where we do unusual things all the time, this is a common occurrence. Today I decided to work out exactly what was going on.
Floating IP addresses are used in a number of load-balancing situations. In some cases, like VRRP, a special virtual MAC address is assigned to the IP and the MAC floats along with the IP.
What I haven’t explored yet is what exactly goes on when a new SCVM takes on the floating IP address. If best practices are followed, the new SCVM SHOULD send a gratuitous ARP request using it’s new MAC address – in which case both the traditional router scenario AND the ACI topology should respond by updating their mappings. If this did indeed happen, then clearly (in our ACI setup anyway) ACI is not updating its mapping as it should.
I’ll explore this further in my next post!