ACI and the HyperFlex Hiccup

Darn! We’ve lost connectivity to the Cisco HyperFlex Controller AGAIN. What could possibly be wrong?

Well, the problem relates to how Cisco HyperFlex uses a floating IP address across multiple Storage Controller VM MAC addresses, and how ACI maintains the IP to MAC address table.

Let’s start with the physical picture.

The focus is on the three Storage Controller VMs (SCVMs) in the above picture. The heavy purple lines show the resolved path to the default gateway. Each SCVM has a its own MAC address and ONE of the SCVMs shares that MAC address with the HyperFlex Management IP address (172.16.19.30).

So the ACI fabric sees the IP to MAC resolution like this:

apic1# show endpoints | grep "172\.16\.19\.3[0-3]"
 00:0C:29:82:4F:B2  172.16.19.33    learned       101    eth1/32   vlan-119   not-applicable
 00:0C:29:90:F4:70  172.16.19.31    learned       101    eth1/32   vlan-119   not-applicable
 00:0C:29:A9:B7:0D  172.16.19.32    learned       101    eth1/32   vlan-119   not-applicable
 00:0C:29:A9:B7:0D  172.16.19.30    learned       101    eth1/32   vlan-119   not-applicable

Note how as far as ACI is concerned, the MAC address 00:0C:29:A9:B7:0D is shared by both 172.16.19.32 and 172.16.19.30.

The problem we are having has been caused by the fact that the floating HyperFlex Management IP address has actually “floated” to another node (172.16.19.31). Any traffic that needs to go to 172.16.19.30 now needs to go to MAC 00:0C:29:90:F4:70. But ACI hasn’t learned this, and never will unless it sees a packet FROM 172.16.19.30 sourced with MAC 00:0C:29:90:F4:70.

Packets from my management PC (172.16.5.102) addressed to 172.16.19.30 reach ACI, and ACI routes them to the correct subnet, but sends them to MAC 00:0C:29:A9:B7:0D. Here’s a few ICMP packets I captured on the 00:0C:29:A9:B7:0D host that prove this.

root@hxscvm2:~# tcpdump -i eth0 -n icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
11:48:16.187163 IP 172.16.5.102 > 172.16.19.30: ICMP echo request, id 1, seq 26, length 40
11:48:20.959581 IP 172.16.5.102 > 172.16.19.30: ICMP echo request, id 1, seq 27, length 40

Now let’s be clear – this is a problem caused by HyperFlex using multiple MAC addresses for a single IP address more than it is that ACI won’t forget the old entry. The same problem could occur if a normal router was used such as if a normal topology like this was used.

The difference here is that on the router, there would be an ARP cache that would hold the mapping of IP 172.16.19.30 to MAC 00:0C:29:A9:B7:0D that would time out after some time – typically 4 hours. But ACI doesn’t do anything like this by default. So long as the MAC says alive, and it will so long as 172.16.19.32 keeps sending packets, ACI keeps the entry. By default. But there is a fix. Kind of.

In ACI, there is an option for IP Aging created specifically for this kind of scenario. To configure IP Aging (and I consider this would be best practice to ALWAYS enable IP Aging) you need to navigate to the System > System Settings >> Endpoint Controls >| [IP Aging] tab.

Once IP Aging has been enabled, as explained in the ACI Fabric Endpoint Learning White Paper,

“IP aging policy tracks and ages unused IP addresses on an endpoint. Tracking is performed by using the endpoint retention policy, which is configured for the bridge domain to send ARP requests (for IPv4) and neighbor solicitations (for IPv6) at 75 percent of the local endpoint aging interval. When no response is received from an IP address that IP address is aged out.”

In our case, the default endpoint retention policy was in use, so the aging time was at 15 minutes. And sure enough, 12 minutes (≅ 75% of 15 mins) after enabling the IP Aging option, the SCVM currently hosting the floating IP received an ARP request from the default gateway IP:

12:09:45.833118 ARP, Request who-has 172.16.19.30 (ff:ff:ff:ff:ff:ff) tell 172.16.19.1, length 46
12:09:45.833134 ARP, Reply 172.16.19.30 is-at 00:0c:29:90:f4:70, length 28

And so Happy HyperFlex days were here again from this point onwards. I was able to access the HX Management IP address (172.16.19.30) from my management PC.

BTW – if you are experiencing this problem and you don’t want to wait the 12 minutes for the IP to be re-mapped by ACI, you can issue the following command at the APIC CLI to clear the IP immediately:

apic1# fabric leaf_id clear system internal epm endpoint key vrf vrf:name ip 172.16.19.30

I have done this a few times in the past because in our lab environment where we do unusual things all the time, this is a common occurrence. Today I decided to work out exactly what was going on.

Floating IP addresses are used in a number of load-balancing situations. In some cases, like VRRP, a special virtual MAC address is assigned to the IP and the MAC floats along with the IP.

What I haven’t explored yet is what exactly goes on when a new SCVM takes on the floating IP address. If best practices are followed, the new SCVM SHOULD send a gratuitous ARP request using it’s new MAC address – in which case both the traditional router scenario AND the ACI topology should respond by updating their mappings. If this did indeed happen, then clearly (in our ACI setup anyway) ACI is not updating its mapping as it should.

I’ll explore this further in my next post!

RedNectar

3 Responses to ACI and the HyperFlex Hiccup

Pingback: ACI and the HyperFlex Hiccup Cure | RedNectar's Blog
Avery Abbott says:

2021/04/30 at 02:41

Why in the world wouldn’t HX send a GARP when the master changes?

Hit something similar with F5 failover – we had a dedicated VIP subnet shared between two F5 LTM appliances. When the active member changed, we would lose traffic. Apparently they don’t send a GARP unless they have a “self IP” in that subnet. Fixed it by adding a non-routable self IP to both appliances, problem solved.

Both instances (HX and F5) seem to be the result of poor decision making on the vendor’s behalf or sloppy coding, not sure which.

- RedNectar Chris Welsh says:
  
  2021/04/30 at 08:42
  
  I’ve done a bit more research. Turns out HX DOES send a Gratuitous ARP when the master changes, but ACI ignores it – possibly because it already has an IP associated with the source MAC in the Gratuitous ARP. But I’m guessing here.
  Thanks for the comment!