Hyperflex Post Install script fixer

I was shocked the other day to learn that the hx_post_install script that is used during the Cisco HyperFlex install process does NOT work the way it should.

In fact, the validation option is a complete waste of time (if working with M5 servers, which is probably 90% of installations), as I reported here.

To fix it, I could create a new copy of the script and give that to you, and you could copy that to your HyperFlex Storage Controller VM, but that’s a pain. Instead, I’ve decided to give yo a few commands to run that you can cut and paste into a command shell to fix the problem – or at least work around the problem until Cisco fixes it. To ease the pain, all you need do is cut-and-paste the following into your ssh session on the storage controller – IF you have blind faith in my skills. Otherwise, you might want to go through it step-by-step, so you understand it.

Non-nonsence cut-and-paste answer

Cut-and-paste the following into your ssh session on the storage controller.

cp $(which hx_post_install) .
sed -i 's/vmnic1/rednectar4/g' hx_post_install
sed -i 's/vmnic2/rednectar1/g' hx_post_install
sed -i 's/vmnic3/rednectar5/g' hx_post_install
sed -i 's/vmnic4/rednectar2/g' hx_post_install
sed -i 's/vmnic5/rednectar6/g' hx_post_install
sed -i 's/vmnic6/rednectar3/g' hx_post_install
sed -i 's/rednectar/vmnic/g' hx_post_install
sed -i 's/and args.validate//' hx_post_install
sed -i "s/Select post_install/***RedNectar's Updated hx_post_install script M5 modifications have been applied.***\\\nSelect post_install/" hx_post_install
sed -i 's/SCRIPT_VERSION = "4.0"/SCRIPT_VERSION = "4.1 RedNectar"/'

Full-blown answer

The first step after establishing an ssh session to a storage controller VM is to locate the hx_post_install script

admin@hxscvm1:~$ which hx_post_install

Using the result of the output above, copy the script to your admin home directory (where you land when you start your ssh session) and check that it exists.

admin@hxscvm1:~$ cp /bin/hx_post_install .
admin@hxscvm1:~$ ls -lh
total 92K
-rwxr-xr-x 1 admin springpath 92K Sep 17 10:42 hx_post_install


  • Don’t miss the period at the end of the first line.
  • If you wanted to be fancy, you could combine step 1 &2 with:
    cp $(which hx_post_install) .

Now comes the bits where you manipulate the copy of the file using sed.  Basically, you have to swap the vnic names from the order used in the old M4 servers to the new order used by the M5 servers according to the table below:

vSwitchM4 vmnics usedM5 vmnics used
vswitch-hx-inband-mgmtvmnic0 vmnic1vmnic0 vmnic4
vswitch-hx-storage-datavmnic2 vmnic3vmnic1 vmnic5
vswitch-hx-vm-networkvmnic4 vmnic5vmnic2 vmnic6
vmotionvmnic6 vmnic7vmnic3 vmnic7

The problem is of course that if you replace say vmnic1 with vmnic4, when you later replace vmnic4 with vmnic2, you’ll be replacing the things you just replaced, so you need a double pass over the file.  Since I’m pretty sure the word rednectar does not occur in Cisco’s script, I’ll use that character pattern as a temporary placemarker for the word vmnic and then replace all occurrances of rednectar with vmnic at the end.

admin@hxscvm1:~$ sed -i 's/vmnic1/rednectar4/g' hx_post_install
admin@hxscvm1:~$ sed -i 's/vmnic2/rednectar1/g' hx_post_install
admin@hxscvm1:~$ sed -i 's/vmnic3/rednectar5/g' hx_post_install
admin@hxscvm1:~$ sed -i 's/vmnic4/rednectar2/g' hx_post_install
admin@hxscvm1:~$ sed -i 's/vmnic5/rednectar6/g' hx_post_install
admin@hxscvm1:~$ sed -i 's/vmnic6/rednectar3/g' hx_post_install
admin@hxscvm1:~$ sed -i 's/rednectar/vmnic/g' hx_post_install

Now that should take care of the bug – but there is one more annoying flaw with the script that I’d like to clean up too.  And that is the fact that if you run the script without using the –validate option, it still asks you if you want to run a health check – BUT THEN DOESN’T DO THE MTU check.

So, to make the script ship-shape, add one more change to remove the logic that skips the test if the –validate argument was not specified:

admin@hxscvm1:~$ sed -i 's/and args.validate//' hx_post_install

Great, but you’ll also want to know you are running a version of the script that has been updated, so finish with:

admin@hxscvm1:~$ sed -i "s/Select post_install/***RedNectar's Updated hx_post_install script M5 modifications have been applied.***\\\nSelect post_install/" hx_post_install
admin@hxscvm1:~$ sed -i 's/SCRIPT_VERSION = "4.0"/SCRIPT_VERSION = "4.1 RedNectar"/' hx_post_install

And you are ready to run, BUT you’ll need to be careful that you run the copy that you’ve just edited, so in the same directory, instead of issuing the command hx_post_install, you’ll need to put the location path (i.e. ./) as part of the command – so enter:

admin@hxscvm1:~$ ./hx_post_install
***RedNectar's Updated hx_post_install script M5 modifications have been applied.*** Select post_install workflow- 1. New/Existing Cluster 2. Expanded Cluster (for non-edge clusters) 3. Generate Certificate Note: Workflow No.3 is mandatory to have unique SSL certificate in the cluster. By Generating this certificate, it will replace your current certificate. If you're performing cluster expansion, then this option is not required.

And of course, from now on you can just use the modified script by typing ./hx_post_install at the admin@hxscvm1:~$ prompt.

WARNING: If you started your session to the cluster IP address, then you need to remember which controller VM actually serviced your session, and make sure you have a session with the same controller VM before you try the ./hx_post_install version of the command.

Happy HX Installing



Posted in Cisco, Hyperflex | Tagged | Leave a comment

Webex multi-screen support – where is it Cisco?

This is a reprint (with pictures) of an idea I submitted to Cisco – please support and vote for it after clicking this link.

Many Webex users have multiple screens, yet Webex fails to make use of this beyond the ability to share one of those screens – at least in Webex (Teams) and Webex Meetings – last time I checked in the obsolete Webex Training not even that was available.

The takeaway

I’d like Cisco to move to a default two window model when screen sharing is active. For the presenter, one “window” would be the screen being shared. And ALL the pesky panels in a SINGLE window that can be managed as a single unit and remember where it lives when screen sharing stops. For the viewer, one window for the screen being shared and one for the collection of other panels.

In this discussion, I am writing from the Webex Meetings experience, but probably the ideas are applicable on other variations. I’m also writing from the point of view of a macOS user – there may be some variations tot Webex behaviour in other versions. Now there are MANY ways and instances where this could be implanted, but I wish to fist make the distinction between a Presenter who is sharing a screen, and a Participant, who is juggling trying to view that screen while keeping track of chats, Q&A etc.

The Presenter – the person SHARING a screen.

For the Presenter, when I share my screen, I need to option to create a panel window. – or the option to NOT use a panel window and put up with what we have now – floating windows covering your shared screen until you move them. (I have 3 screens, some colleagues have more)

This is my shared screen. I want to move ALL those overlay panels into a single window

The panel window should show ALL the other panels: the participants video feeds, the chat, the Q&A etc ALL in a single window that can be maximised (or not) and NOT appear over the top of every window in every space (currently, if i open say the chat window and move it to my second screen, it sits in from to all other content on that window, and EVEN WHEN I SWAP TO ANOTHER SPACE it STILL sits on top of the windows on THAT other space (Windows users may not understand spaces, but macOS users will).

My second screen cluttered with multiple pesky panels

If I wanted that window to be in all spaces, I’d CHOOSE to make that window available on all desktops!

So please don’t force your screen onto every desktop unless I choose!

But I digress – back to the proposed panel window

This Panel Window should remember its settings, so it the presenter STOPs sharing a screen, but later resumes sharing, the Panel Window should remember how it was set up last time. I envisage that the panel window would have many options for showing, hiding, focusing on speakers etc

So to recap – I’d like all my floating panels in ONE SINGLE window, and only appear on one desktop (unless I choose to show on all desktops). Something like this:

This is my mock-up of how a second window might look.

This is a mock-up of how I MIGHT arrange the panels on my second screen, I’d envisage that the second secret would be something like the current Participant’s screen but without the shared screen.

AND I’d like Webex (Meetings) to remember this layout should I stop sharing and then start sharing again.

If another person is talking, I’d particularly like that person’s image (or video if they are using it) to dominate (and show their name) something like above (where I’ve had to ADD the name under the picture – I want the name there even if the video is on)

The Participant – the person viewing the shared screen

Now Cisco has made some great improvements with the experience for the viewer in terms of the options for layouts. But still no support for a second screen.

Why can’t a participant move the presenter’s shared screen to another monitor?

Why can’t a participant move the bit that is being shared to another screen, and have Webex support two windows, like what I’ve described above for the presenter?

Cisco – please improve your support for multi-screen layouts. We have moved to a world of working from home where MANY people have to put up with this day in and day out. It is So frustrating being forced to use Webex when there are so many limitations.


Posted in Cisco, Webex | Tagged , , | Comments Off on Webex multi-screen support – where is it Cisco?

Cisco has re-vamped their ACI Docs pages. Here’s what I think.

If you have upgraded your ACI to verison 5.1 or the recently released 5.2, you’ll notice a big change if you should ever venture to that rather obscure menu item Settings > Documentation > API Documentation

Settings > Documentation > API Documentation

What’s the difference?

What you used to get was a basic but wholesome view of the Cisco APIC Management Information Reference Model, with a list of all the Classes, Types, Events, Faults etc listed on the left side, with clickable links associated with each Class etc.

Clicking on one of the links opened a Pandora’s Box of information in the viewing pane. One of my favourites is fv:Tenant

Old APIC Management Information Model Reference

I’ve taken a bit of an extreme here – fv:Tenant is probably the largest class by far of the whole ACI Object model. For fun, I copied the information (text only) of the information pane above and pasted it in MS Word. I now have a 4140 page (556818 words) document that I can browse!

But if you knew how to navigate the page (there are a few handy shortcuts at the top) and use your browser’s find function, you could generally find what you wanted. Although, I must admit, the shortcuts at the top of the page may not work until the thousands of lines of content have loaded – which may be many seconds.

The new opening screen now presents a very fine search function – I has only to type the letters fvte before the search had located the fvTenant object. Happy to see the search is NOT case sensitive.

New APIC Object Model Documentation

But there are a couple of other subtle improvements too. There is a toggle on the right-hand side that (by default) restricts your search to configurable objects. And you can only search all Objects or Faults via tabs at the top left-hand side. I think everyone always used the All group in the old system, so I’m happy with this improvement.

Having found my object, clicking on it opens a sub-window on the right-hand side, which has a second link that I must click to actually see the information I need.

I have to click a second time to get any useful information

At first, I was annoyed at having to click twice, [edit:2021.06.19 Turns out you can just double-click the name] but in other contexts where you have a list of objects, you’ll find the information window stays open as you click on each object, making it quite a useful feature. However, it does reveal the absurdity of some of the object descriptions which were probably cooked up in a hurry for release 1.0. For instance, the description for a Tenant object includes this:

For example, you can create a tenant with contexts and bridge domains shared by other tenants.

Oh really? Good luck trying to do that!!!!!

Tip for Cisco: Time to review the object descriptions in the Object Model.

Anyway, back at the fvTenant object, the old Pandora’s Box of information is still mostly there (I sadly miss the old Diagram section), some of it less clear than before, some of it more clear.

For instance, the old system had a great section on Naming Rules – nicely formatted with links to other name formats (you can see them underlined in the picture below. As you can see on the right, the new format is a) not formatted at all, and b) is missing the links.

Old and new Naming Rules styles

Tip for Cisco: Keep the old style – codes like to see indents and good mon-spaced fonts, and coloured text always helps. Take a look at any coders editor for goodness sake! And keep the links.

One of the great features of the Old Style listing of the Naming Rules is that I could click on the word name above, and I’d be sent to the part of the page that shows the rules for tenant name, again well formatted and very clear to read.

Name object in the old MIM

From here I could easily see that a Tenant name has a maximum length of 63 characters and consists of only upper and lowercase letters, digits and the characters underscore, period and dash.

The good news is that the same information is not too hard to find in the new system either. With the fv:Tenant object still opened, I have several tabs I can navigate. The first one past the default Overview tab is the Properties tab and clicking on name gives me the same information. Not as succinctly, or as neatly as above, but there all the same:

Easy to find the Validator information in the Properties tab

And I really do have to call Cisco out on the choice of font again here. On the screen above I see the word:

WTF is lId?

Now – if any human can tell if the last 3 letters are double-l-d or double-I-d or Ild or lId then good for you – sure, the context probably reveals it, but this is a reference document.

Tip for Cisco: Stick with non-ambiguous fonts designed for coders when specifying names. It really does make a difference.

I do have one gripe with the Properties table in the new system. The old system also gave me a list of Constants that will be used – and I can’t find this list in the new system.

The old system also gave me a list of Constants that will be used – and I can’t find this list in the new system.

And this is important – without this information it would not have been possible to work out why a filter for TCP port 22 suddenly started allowing ALL traffic through! You can read about that disaster here.

Moving across the tabs, the Relationships tab has some key information right there, and this time with clickable links to the related object.

Relationships Tab

This is much more consumable than the older system, which did have the same information right under the diagram, but with the Relations separated from their corresponding object as shown above.

Relationships and MO Containers in the old view

In the new system, Managed Object (MO) containments get thier own tab – and again, much more consumable, and with the list of Managed Objects shown in a more manageable (still almost never-ending) vertical list, but really, why someone decided to change MO (Managed Object) to Mo (a state in the USa) I can’t understand!

Containment Tab

The remaining tabs (Faults, Events and Stats) are also presented slightly more nicely than the older version. In Events, for instance, the event Code is shown, whereas on the older version, you had to click the hyperlink on the event to discover the Event ID.

So what else is missing?

The most obvious omission in the new system is the massive diagram that accompanied the Object definition in the old system. For the sake of brevity, I’ve chosen one of the more manageable objects. Note that for each box in the diagram there is a clickable link under the diagram. The new system has the same MO information, but NOT the visual representation.

Old MIM Diagram for infra:AccPortGrp

The other sections that missing (apart from the Constants mentioned above) are probably not as important. Those that need to delve deeper into the programming side of ACI may disagree, but the old system also had sections for Containers Hierarchies, Contained Hierarchy, and Inheritance. In some cases, (such as fvTenant above) the Contained Hierarchy list was thousands of entries.

My verdict?

I could find more missing pieces if I dug deeper, but I think I’ve covered the major items. But at the end of the day what Cisco has done is given us not just a prettier version, but in many cases more useable too. There are some important pieces missing, but I hope they will be added back in a future update.

Key advantages of the new UI

  • Ability to filter on Configurable Only
  • The search function is fast. Schmick!
  • Tabbed interface is much neater and manageable than the old huge-html-page approach
  • The pop-up window that appears when an object is clicked makes it easy to quickly browse through many objects/attributes and see the contained information.

Key disadvantages of the new UI

  • Lack of attention to detail when it comes to presenting programming information. There are many typefaces/fonts designed specifically for programming, Cisco should use one of them.
    • Another example of the lack of attention to detail is the sudden translation of MO to Mo – it does make a difference. There could well be other examples too.
    • The CONSTANTS section needs to be shown for each relevant attribute
    • I’d like to see the Diagram section return, but I must admit I rarely used it.


Posted in ACI | Comments Off on Cisco has re-vamped their ACI Docs pages. Here’s what I think.

ACI and the HyperFlex Hiccup Cure

In my previous post, I explained how to regain access to a HyperFlex controller when ACI fails to update the IP to MAC mappings in the endpoint table by enabling the IP Aging option.

In this post I’ll show you how I reduced that failover time to about one minute.

To see if I could reduce the failover time, I turned to one of the best documents Cisco has ever produced for ACI – the ACI Fabric Endpoint Learning White Paper

And sure enough, I found that:

First-generation leaf switches cannot reflect IP address movement between two MAC addresses on the same interface with the same VLAN to the endpoint database. This sort of IP address movement may occur in a high-availability failover scenario in which GARP typically is used to update IP to MAC relation on upstream network devices. This behavior is resolved by enabling the GARP-based EP Move Detection option

And since my HyperFlex nodes are indeed connected to 1st generation ACI N9K-C9336PQ switches, this is exactly what I tried next:

GARP EP Move detection

The curious thing about this option is that appears under the L3 Configurations tab but ONLY if ARP Flooding is enabled under the General tab.

Time to set up a test to see how much faster the failover is with the GARP Based Detection option enabled for the Bridge Domain

Test Plan

For the record, my test platform is running HyperFlex Data Platform v4.0(2d) (the current recommended latest version) and connected to ACI N9K-C9336PQ switches running v14.2(4i) . The APIC is running v4.2(5n).

Recall, my physical setup is like this:


As I write this, the SCVM that has taken on the management IP address is with MAC address 00:0C:29:90:F4:70

apic1# show endpoints | grep "172\.16\.19\.3[0-3]"
 00:0C:29:82:4F:B2    learned       101    eth1/32   vlan-119   not-applicable
 00:0C:29:90:F4:70    learned       101    eth1/32   vlan-119   not-applicable
 00:0C:29:90:F4:70    learned       101    eth1/32   vlan-119   not-applicable
 00:0C:29:A9:B7:0D    learned       101    eth1/32   vlan-119   not-applicable

Armed with the information that the MAC address bound to the Mgmt IP address is shared (the SCVM IP on ESXi host #1), my plan is to put ESXi host #1 into HX Maintenance Mode to force the election of another Mgmt SCVM and measure how long my Mgmt PC looses connectivity to the Mgmt IP address.

To do this I have set up:

  • A continuous ping from my mgmt PC to – I’m using PowerPing to do this so I get timestamps
  • tcpdump sessions on the SCVMs capturing only ARP packets so I can see the Gratuitious ARP requests and replies.
  • an endless loop issuing the command vsh_lc -c "show system internal epmc endpoint ip" on the ACI APIC
    • The purpose of this command was to see when ACI’s COOP database was updated to show a different second IP address on the same host as

What I expected to happen is that once the two remaining SCVMs discover that has failed, they will elect another SCVM to host the address, and that VM will send gratuitous ARP requests to ensure ACI updates its endpoint table and my management IP will be able to gain access to the management IP again.

Test Results

Here’s the timeline of what happened. It wasn’t quite like I expected

Time Action
14:18:40 Initiate HyperFlex Maintenance Mode for ESXi Host#1
14:19:32 SCVM#1 answers ARP request for from so is still online
14:19:33 SCVM#1 answers ARP request for from so is still online
14:19:44 Last ping reply recieved from on the Mgmt station, indicating HX Mgmt IP is offline from this point
14:19:56 SCVM #3 starts sending contunuous ARPs for to FF:FF:FF:FF:FF:FF
14:20:17 SCVM #2 also starts sending contunuous ARPs for to FF:FF:FF:FF:FF:FF
14:20:39 SCVM #2 starts replying to ARPs for , first to a specific MAC address, then…
14:20:40 SCVM #2 starts sending contunuous ARP replies for to FF:FF:FF:FF:FF:FF
14:20:40 COOP database starts showing endpoint is now shared with indicating that the leaf switch has updated the COOP database on receipt of the ARP reply to FF:FF:FF:FF:FF:FF
14:20:41 Mgmt Station gets replies from
14:20:59 SCVM #2 starts sending GARP requests to/from to destination MAC FF:FF:FF:FF:FF:FF

What I expected would have happened is that the GARP requests would have been sent at about 14:20:40 – rather than a string of ARP replies. However, it seems the ARP replies had the same effect.

Total failover time based on last ping reply received from SCVM#1 to first reply from SCVM#1: 14:20:41-14:19:44=00:57 – just inder one minute, which is far better than the 12 minutes I achieved last time.


  • ACI treats Gratuitous ARP replies just as you would expect GARP requests to be treated – in other words, ACI learns L2/L3 info from ARP replies sent to MAC FF:FF:FF:FF:FF:FF.
  • In ACI, by enabling
    • IP Aging in System Settings > Endpoint Controls, and…
    • …in the the ACI BD where 1st generation switches are used
      • ARP Broadcasting, and
      • GARP based detection for EP Move Detection Mode
  • HyperFlex management IP address failover when used in conjunction with ACI can be reduced to approximately one minute.



While preparing to write this, I recorded my steps – it’s on YouTube but the transition to YouTube quality makes it almost impossible to see clearly. But if you have 7 mins to spare (Tip: play it back at double speed and on a 34″ monitor if you have one) the link is here: https://youtu.be/OxCEOAyKcSw

Posted in ACI, Cisco, Hyperflex | Comments Off on ACI and the HyperFlex Hiccup Cure

ACI and the HyperFlex Hiccup

Darn! We’ve lost connectivity to the Cisco HyperFlex Controller AGAIN. What could possibly be wrong?

Well, the problem relates to how Cisco HyperFlex uses a floating IP address across multiple Storage Controller VM MAC addresses, and how ACI maintains the IP to MAC address table.

Let’s start with the physical picture.


The focus is on the three Storage Controller VMs (SCVMs) in the above picture.  The heavy purple lines show the resolved path to the default gateway. Each SCVM has a its own MAC address and ONE of the SCVMs shares that MAC address with the HyperFlex Management IP address (

So the ACI fabric sees the IP to MAC resolution like this:

apic1# show endpoints | grep "172\.16\.19\.3[0-3]"
 00:0C:29:82:4F:B2    learned       101    eth1/32   vlan-119   not-applicable
 00:0C:29:90:F4:70    learned       101    eth1/32   vlan-119   not-applicable
 00:0C:29:A9:B7:0D    learned       101    eth1/32   vlan-119   not-applicable
 00:0C:29:A9:B7:0D    learned       101    eth1/32   vlan-119   not-applicable

Note how as far as ACI is concerned, the MAC address 00:0C:29:A9:B7:0D is shared by both and

The problem we are having has been caused by the fact that the floating HyperFlex Management IP address has actually “floated” to another node (  Any traffic that needs to go to now needs to go to MAC 00:0C:29:90:F4:70.  But ACI hasn’t learned this, and never will unless it sees a packet FROM sourced with MAC 00:0C:29:90:F4:70.

Packets from my management PC ( addressed to reach ACI, and ACI routes them to the correct subnet, but sends them to MAC 00:0C:29:A9:B7:0D. Here’s a few ICMP packets I captured on the 00:0C:29:A9:B7:0D host that prove this.

root@hxscvm2:~# tcpdump -i eth0 -n icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
11:48:16.187163 IP > ICMP echo request, id 1, seq 26, length 40
11:48:20.959581 IP > ICMP echo request, id 1, seq 27, length 40

Now let’s be clear – this is a problem caused by HyperFlex using multiple MAC addresses for a single IP address more than it is that ACI won’t forget the old entry.  The same problem could occur if a normal router was used such as if a normal topology like this was used.


The difference here is that on the router, there would be an ARP cache that would hold the mapping of IP to MAC 00:0C:29:A9:B7:0D that would time out after some time – typically 4 hours.  But ACI doesn’t do anything like this by default.  So long as the MAC says alive, and it will so long as keeps sending packets, ACI keeps the entry. By default.  But there is a fix. Kind of.

In ACI, there is an option for IP Aging created specifically for this kind of scenario.  To configure IP Aging (and I consider this would be best practice to ALWAYS enable IP Aging) you need to navigate to the System > System Settings >> Endpoint Controls >| [IP Aging] tab.

ACI Config IP_Aging

Once IP Aging has been enabled, as explained in the ACI Fabric Endpoint Learning White Paper,

“IP aging policy tracks and ages unused IP addresses on an endpoint. Tracking is performed by using the endpoint retention policy, which is configured for the bridge domain to send ARP requests (for IPv4) and neighbor solicitations (for IPv6) at 75 percent of the local endpoint aging interval. When no response is received from an IP address that IP address is aged out.”

In our case, the default endpoint retention policy was in use, so the aging time was at 15 minutes. And sure enough, 12 minutes (≅ 75% of 15 mins) after enabling the IP Aging option, the SCVM currently hosting the floating IP received an ARP request from the default gateway IP:

12:09:45.833118 ARP, Request who-has (ff:ff:ff:ff:ff:ff) tell, length 46
12:09:45.833134 ARP, Reply is-at 00:0c:29:90:f4:70, length 28

And so Happy HyperFlex days were here again from this point onwards.  I was able to access the HX Management IP address ( from my management PC.

BTW – if you are experiencing this problem and you don’t want to wait the 12 minutes for the IP to be re-mapped by ACI, you can issue the following command at the APIC CLI to clear the IP immediately:

apic1# fabric leaf_id clear system internal epm endpoint key vrf vrf:name ip

I have done this a few times in the past because in our lab environment where we do unusual things all the time, this is a common occurrence.  Today I decided to work out exactly what was going on.

Floating IP addresses are used in a number of load-balancing situations.  In some cases, like VRRP, a special virtual MAC address is assigned to the IP and the MAC floats along with the IP.

What I haven’t explored yet is what exactly goes on when a new SCVM takes on the floating IP address.  If best practices are followed, the new SCVM SHOULD send a gratuitous ARP request using it’s new MAC address – in which case both the traditional router scenario AND the ACI topology should respond by updating their mappings.  If this did indeed happen, then clearly (in our ACI setup anyway) ACI is not updating its mapping as it should.

I’ll explore this further in my next post!



Posted in ACI, Best Preactices, Cisco, GNS3 WorkBench, Hyperflex | Tagged , , | 3 Comments

Rednectar’s Rules for writing Lab Guides

I wrote these as a guide for lab writers whose work I get to review and are in the context of writing lab guides using the frustrating wordprocessor known as Microsoft Word. It is meant to be a set of instructions for writers to follow BEFORE passing them on to me.

Before saving ready to be check-formatted, take these simple steps

Page breaks, paragraphs, tabs and spaces


    1. Remove all page breaks. Page breaks are determined by grouping paragraphs together that need to stick together by using the “Keep with next” paragraph attribute.  [A “keep with previous” would be SO much better… please upvote this https://word.uservoice.com/forums/304924-word-for-windows-desktop-application/suggestions/33552385-keep-with-previous]
      1. This will save me from having to do my first task in every review, which is to search and replace all instances of page breaks with nothing.
      2. And while on the topic – make sure you apply “Keep with Next” to every cell in a table EXCEPT the last row.  [A “keep with previous” would be SO much better…]
    2. Remove all empty paragraphs.  Spacing between paragraphs is determined by the style. If you don’t like the amount of space between paragraphs, let me know which style you’d like to change. Remember that this will change ALL paragraphs of that style, that’s why we use styles. I reserve the right NOT to agree.
      1. This will save me from having to do my second task in every review, which is to search and replace all instances of two CRs with one CR
    3. Remove all double spaces except after full stops. Use <tab>s to space items if necessary, or create a table.



    1. In general, every Graphic is to be either:
      • Placed inline, so that text flows around it, something like
        Press the gearicon icon; or
      • Given an entire paragraph to itself, like those above.  If you have two graphics that have to go one after the other or side by side, find a graphics program like Preview and combine the two graphics into one.  Don’t paste them as separate graphics and expect that they will stay side by side (they won’t) or on the same page (they might if you are lucky.)
      • In the paragraph where the graphic lives, don’t add any tabs or spaces.
    2. Do NOT use MS Word shapes, or if you do, they follow the same rules as graphics. One per paragraph.  If your graphic needs a circle or arrow super-imposed, use a graphics program to compose it, and paste the picture. Powerpoint is a convenient choice if you love the MS style shapes so much that you have to use them. Preview also does a good job.
      • If you DO use MS graphic shapes, there is no guarantee that they will appear on the page you meant them to be on. That’s just life with MS Word.

Other rules

  • We click buttons – we Don’t press them or push them or “go to” them
  • We don’t “go to” menus or tabs.  We navigate menus and click on tabs or select tabs. You can select menu items too. Using “Navigate to” combines a “Navigate” plus a “Select”
  • Every Step MUST require the user to take an action.  The following is NOT a step.

Step 1: The GET request failed because the API Key has not been added

  • The following IS a step

Step 1: Observe that the GET request failed because the API Key has not been added

  • We check boxes, we don’t tick them. Sometimes we clear or (ugh) uncheck them. We never untick them.  If you must use the work tick, make sure you are referring to a small insect. Oh, and when a box is checked or cleared, it is to be accompanied with a little symbol indicating this:

This checkbox is checked: 

This checkbox has been cleared: 

I’ll update this document if I think of any more!



Posted in Microsoft, Microsoft Word, MS Word Tips | Tagged , | Comments Off on Rednectar’s Rules for writing Lab Guides

How to schedule Hyperflex Scheduled Snapshots

If you have been using Cisco’s Hyperflex and have used the Flash/Flex vCenter Plugin to create a schedule to take snapshots, then when upgrade your Hyperflex plugin, you’ll find that the Schedule Snapshots option has GONE

This article show you how you can still schedule snapshots even if you are using the HTML5 plugin.

The secret is that you don’t need the plugin. All you need to do is make sure you take ONE snapshot using Hyperflex Connect ot the HTML5 plugin. This will ensure that the all-important SENTINAL snapshot is taken, which ensure that all future snapshots will be Hyperflex Native snapshots rather than the old VMware REDO type snapshots.

Step 1: Take a snapshot using the HTML5 plugin.

Right-click on your VM, select Cisco Hyperflex (way down the bottom) and choose Snapshot now

Step 2: Validate the SENTINAL snapshot

If the snapshot worked, you will see a snapshot called SENTINAL when you manage snapshots in vCenter

The existence of the SENTINAL snapshot validates that you have a Hyperflex Native snapshot and this will prevent VMware from even being able to create a REDO snapshot.

Step 3: Create your Schedule

In vCenter, locate your VM in Hosts and Clusters, click on the Configure tab, then click Scheduled Tasks, click NEW SCHEDULED TASK and finally Take Snapshot

That’s pretty much it – you’ll just need to give the task a name and set the schedule for when you want to run it!

Happy Scheduling


Posted in GNS3 WorkBench | Comments Off on How to schedule Hyperflex Scheduled Snapshots

New APIC ACI 5.1 firmware – Cisco have gone colloquial

Cisco released new firmware for ACI on 22 Oct 2020. I was in the middle of having a problem with upgrading a lab from 4.2 to 5.0 when I read that one of the enhancements in 5.1 was:

Enhancements to the upgrade process through the GUI when upgrading the APIC or switch software

so I thought I’d give 5.1 a shot.

Now the first worrying sign that I noticed with the UI for the upgrade process is that it looks much more like the super-unfriendly GUI of the ACI MSO (Multi Site Orchestrator)

but having said that, it turned out to be less confusing than MSO in this instance, and the upgrade process worked. But I really do have to wonder at the colloquial language used in the GUI – is this some kind of attempt to appeal to the masses by Cisco? If so, they are only making it harder for all those customers that don’t come from an English speaking background (or should I say North American speaking background) to understand the application. The dialogue that particularly annoyed me was this one:

Now not too many users will be jumping to get their watch out (many users don’t even use a watch), and the exclamation Watch Out is pretty universal, but why change from the traditional Caution text that is usually associated with a dialogue of this type?

So I can accept the Watch Out colloquialism, but Continue Anyways is NOT acceptable. This is an absolute affirmation that Cisco only cares about North American users and not the rest of the world. It is even worse than referring to events occurring in the summer or some other season with absolute arrogance – as if the southern hemisphere didn’t exist. Although that instance is probably ignorance of the fact that there IS a southern hemisphere.

And I suspect that in this case it is another case of ignorance and the usual lack of quality control in the ACI user interface. (Don’t let me start on the number of inconsistent namings used in the ACI GUI…)


Posted in ACI, Cisco, rant | Tagged , , | 1 Comment

Making the most of ACI when routing between tenants via a Firewall

An approach often used when migrating traditional IP Subnet based networks to ACI is to isolate security zones into different tenants (or VRFs), and re-deploy existing Firewalls between the tenants (or VRFs).

In this post, I’ll show you how you can enhance your ACI migration by using some ACI features that are practically impossible to implement on L3 Firewalls.

The approach is identical irrespective of whether you isolate between ACI tenants or VRFs because the isolation is at the VRF level.  But for my example I’ll use two tenants.  To allow communications between the tenants using an external Firewall or Router to apply policy between the tenants, I have a couple of different options.

  1. Use Policy Based Redirect to send traffic to to a Firewall
  2. Use a L3Out to connect to the Firewall

And there would be several variations of the above, even a combined approach.  But for the purpose of this post, I’m going to use a L3Out to connect each interface of an external router to the tenant’s VRF.


First of all, the ground rules.

I have two tenants. TenantA and TenantB. Each has one VRF called A1_VRF and B1_VRF respectively.  Each Tenant has one Bridge Domain (A1_BD and B1_BD) each with one subnet. Each tenant has one Application Profile (A1_AP and B1_AP) and each Application Profile has two EPGs, TenantA has A1_EPG and A2_EPG, and (you guessed it) TenantB has B1_EPG and B2_EPG.


To connect to the firewall/router, each tenant has an L3Out configured and linked to the VRF. Each L3Out has an External EPG to allow traffic based on the IP addressing of the other tenant. So let’s add the L3Outs to the picture, and a little of the physical picture too.


Policy Scope

You might think that since you have a firewall in the picture, there is no need for policy. But that is not true in this scenario. You will still need ACI contracts in place to allow the endpoints in each tenant to communicate to the local firewall interface.  Of course the no-brainer approach would be to allow all traffic to and from each VRF to go to the firewall. You could use the vzAny construct in the VRF and the default contract in the common tenant to do this.

No-brainer vzAny approach


But this approach also means that you have opened up communication between every EPG in the tenant – so A1_EPG can communicate with A2_EPG. Ditto for B1_EPG and B2_EPG. And if this is what you want to do forever, then go ahead and take this approach.

But what if you now want to fine-tune your policy so that TenantB can access only the servers in A1_EPG?  Or perhaps allow only B2_EPG servers access to the services in TenantA?

Because your firewall is IP based, and TenantA servers are all in the same subnet – and the same for TenantB servers – you IP based Firewall is useless to you.

This is where you can leverage the ACI EPG construct to implement policy that is impossible to implement on L3 Firewalls.

To finalise the scope of my example, I’ll scope the policy to say that all TenantB servers can access only the servers in A1_EPG. TenantA servers do not need to initiate any TCP connections to TenantB servers.

Since A1_EPG and A2_EPG are sharing a subnet, this restriction is too hard for a Firewall, so I’ll use ACI features to implement this.


I’m going to use L3Outs in each Tenant. The L3Out will be configured to use OSPF, so the assumption is that the External Firewall is also going to dynamically learn the routes from each tenant and advertise them to the other tenant.  I could have used static routes or a different routing protocol, but my lab is already set up with OSPF, so OSPF it is.

Since the firewall is still applying policy, I won’t complicate things by filtering traffic to and from the firewall, but I will create a new contract in each tenant to be used specifically for traffic to and from the firewall. This avoids any confusion later if another EPG is configured to use the common tenant’s default contract.


Note the following:

In TenantA, only A1_EPG is providing the contract, but since the contract uses the default filter (from the common tenant – which allows all traffic) the concepts of “Provide” and “Consume” have little meaning.

In TenantB, since both EPGs are able to consume any services that get past the firewall, the  contract is consumed by the vzAny construct of the the VRF. If the policy should change in the future, say to restrict consumption to just one of the EPGs, the contract could easily be changed to allow this.

Another consequence of having the vzAny construct of the the VRF only consuming the contract (and NOT providing services to the contract), B1_EPG and B2_EPG are not able to communicate without a separate contract.  Should you wish these EPGs to be able to communicate freely, you could configure the vzAny construct of the the VRF to consume the contract as well, or use a new contract.

In TenantA, the External EPG (BSnet_L3EPG) is based on the IP addressing of TenantB, and vice versa for the External EPG in TenantB (ASnet_L3EPG).

Although the contract in both Tenants is exactly the same, and yes, I could have configured it in the common tenant or even used the common tenant’s default contract, I wanted to emphasise in the naming and in the implementation that the contract was designed to allow the consumption of services from A1_EPG in TenantA – hence the contract is named A1EPG.Services_Ct in both tenants.


To complete the story, I’ll add the steps required to do the L3Out configuration. I’ll assume you already have your tenants, VRFs, BDs, application profiles and EPGs configured. Similarly, I’ll assume you have created the Access Policies required to connect your external router/firewall.

My lab has the following subnets configured, and these will be used in the example:

TenantA Router ID:
TenantA L3Out SVI:
TenantB Router ID:
TenantB L3Out SVI:

Where the configuration is the same for both TenantA and TenantB, I’ll refer to the tenant as TenantA|B. Similarly for other names like A|B1_BD for A1_BD and B1_BD.  In the following, the >+ sequence when following the menu path means right-click.  >> is used to indicate the end of navigating the horizontal sub-menu and begin navigating the vertical tree menu.

Task 1: Create the contract

This will be exactly the same process on each tenant – you could indeed do this once in the common tenant and use it in both TenantA and TenantB

Tenants > TenantA|B >> Contracts > Standard >+ Create Contract

  • Name: A1EPG.Services_Ct
  • Scope: VRF
  • [+] Subject
    • Name: A1EPG.Services_Subj
    • [+] Filters
      • Name: common/default

Task 2: Create the L3Out

This step is the core of the config. It is a long one, and I’ll break it into sections that follow the wizard steps. By following the wizard, the Node Profiles and Interface Profiles will be named automatically, and be slightly different to my diagram above. My lab has the router connected to Leaf2201 interface 1/10 and is configured for OSPF – area for TenantA and area for TenantB

Tenants > TenantA|B >> Networking > L3Outs >+ Create L3Out

Wizard Step 1. Identity
  • Name: A|B1.OSPF_L3Out
  • VRF: A|B1_VRF
  • L3 Domain: A|B_L3Dom [Recall I’ve assumed you have the access policies configured]
  • [x] OSPF
    • OSPF Area ID: 0.0.0.x [in my config, x=11 for TenantA and 12 for TenantB]
    • Regular Area
Wizard Step 2. Nodes and Interfaces
  • Interface Types
    • Layer 3: SVI
    • Layer 2: Port
  • Nodes
    • Node ID: Leaf2201
    • Router ID: 10.21x.0.201 [in my config, x=1 for TenantA and 2 for TenantB]
    • Loopback Address: 10.21x.0.201
    • Interface: 1/10 [in my lab]
    • IP Address: 10.21x.1.201 [in my config, x=1 for TenantA and 2 for TenantB]
    • MTU: 1500
    • Encap: VLAN
    • IP Address: 24×1 [in my config, x=1 for TenantA and 2 for TenantB]
Wizard Step 3. Protocol Associations
  • [x] Hide Policy
Wizard Step 4. External EPG
  • Name: BSnet_L3EPG [for TenantA] ASnet_L3EPG [for TenantB]
  • Provided Contract: <blank> [for TenantA] A1EPG.Services_Ct [for TenantB]
  • Consumed Contract: A1EPG.Services_Ct [for TenantA] <blank> [for TenantB]
  • Default EPG for all External Networks [ ] UNCHECKED
  • [+] Subnets:
    • IP Address: [for TenantA] [for TenantB]

Note that the subnets specify the range of EXTERNAL addresses – so TenantA specifies that TenantB’s subnets are permitted, and vice versa for TenantB

Task 3: Configure your BDs

You will need to modify your existing BDs in two ways to ensure each tenant’s subnet is advertised to the external router/firewall:

  1. Configure the subnet for each BD and check the Advertised Externally option.

Tenants > TenantA|B >> Networking > Bridge Domains > A|B1_BD > Subnets > 10.21x.11.1/24   [in my config, x=1 for TenantA and 2 for TenantB]

  • [x] Advertised Externally
  1. Link the BD to the L3Out

Tenants > TenantA|B >> Networking > Bridge Domains > A|B1_BD >| [L3 Configurations]

  • [+] Associated L3 Outs
    • A|B1.OSPF_L3Out

Task 4: Apply the contracts

For my example, I’m allowing all of TenantB to use the services from TenantA but only A1_EPG is providing the service.

So, for TenantA

Tenants > TenantA >> Application Profiles > A1_AP > Application EPGs > A1_EPG > Contracts >+ Add Provided Contract

  • A1EPG.Services_Ct

But, for TenantB

Tenants > TenantB >> Networking > VRFs > B1_VRF > EPG Collection for VRF

  • [+] Consumed Contracts
    • A1EPG.Services_Ct

That concludes the required configuration. You are ready to test.

You should find that:

  • All servers in both EPGs for TenantB can access the servers in TenantA’s A1_EPG, but nothing from A2_EPG even though A1_EPG and A2_EPG servers are on the same subnet.
    • This is the key finding – you have used ACI features to implement additional control above and beyond what can be achieved by using a firewall alone.
  • TenantA’s EPGs can’t communicate until you configure another contract to allow them to communicate.
  • TenantB’s EPGs can’t communicate until you configure another contract to allow them to communicate.


The whole point of this post is to show that you can easily use ACI features to implement additional control above and beyond what can be achieved by using a firewall alone.

Don’t just blindly force all traffic through a firewall without thinking about what traffic actually needs to be firewalled – you’ll reduce the load on the firewall and give yourself access to easier fine tuning in the future.



Posted in ACI, ACI Tutorial, Cisco, configuration tutorial, Data Center, Data Centre | Comments Off on Making the most of ACI when routing between tenants via a Firewall

ACI Version mismatch Alert. Don’t use v5 on APIC and v14 on Leaves

No Problem

First of all – if you follow best practices, THERE IS NO PROBLEM

This problem I am about to describe is NOT a deficiency in the Cisco software, just an incompatibility between versions that you might not notice.

The Problem

If you are stuck with some first-generation switches in your ACI fabric, you might be tempted to upgrade your APIC to version 5.x – maybe even attempt to upgrade your leaf switches to the companion v15.x.

But of course, the first-generation switches (that DON’T have a -EX or -FX or -FX2  at the end of the model number) don’t support version 15.x firmware. But you knew that already from reading the release notes right!

Now if you DO decide to ignore my advice, then most things may well continue as normal. But I accidentally discovered a corner case that turns a filter based on port 22 into a filter based on unspecified. (=all traffic)

So, any contract that has a filter based on port 22, when pushed to the switches is transformed into a filter on unspecified. I.E. ALL TRAFFIC.

Now let me clarify “when pushed to the switches

Any EXISTING contracts and filters (for port 22) for existing stable EPGs will continue to work.

But if you create a filter for port 22 and use it or provide/consume a contract to an EPG using a filter on port 22, or create a new attachment on a 1st gen switch that causes policy for the filter to be pushed, this is what will happen!

Let’s say you create a filter called MgmtServices_Fltr and add two entries. One for port 22 and one for port 23 (Destination ports of course)

Note that the GUI show ssh rather than port 22 which you entered when you created the filter.  This fact is indeed the crux of the problem.

Now say you create a contract called MgmtServices_Ct, and allocate the MgmtServices_Fltr, to the contract.

Have the contract Provided/Consumed by two EPGs that have endpoints on one of your 1st gen switches.

Check out the MgmtServices_Fltr, in the object browser to learn the fwdId value (you’ll need this later)

Now check the entries of the filter with the ID you just determined on the Gen1 switch.

apic1# fabric 2201 show zoning-filter filter 161
 Node 2201 (Leaf2201)
| FilterId |  Name | EtherT |    ArpOpc   | Prot | ApplyToFrag | Stateful |  SFromPort  |   SToPort   |  DFromPort  |   DToPort   |  Prio |   Icmpv4T   |   Icmpv6T   | TcpRules |
|   161    | 161_1 |   ip   | unspecified | tcp  |      no     |   yes    | unspecified | unspecified | unspecified | unspecified | proto | unspecified | unspecified |          |
|   161    | 161_0 |   ip   | unspecified | tcp  |      no     |   yes    | unspecified | unspecified |      23     |      23     | dport | unspecified | unspecified |          |

WOW – your port 22 filter has been magically transformed to allow all traffic!

So what’s going on?

To understand what the problem is, you’ll need to look at one the changes made to the APIC GUI between v4 and v5.  It’s not listed in the Release Notes (although given the consequences, it should be.)

Start with a visit to https://developer.cisco.com/site/apic-mim-ref-api/ and check out the details for the object vz:Entry for APIC version 4.2. Or just trust that I have it right below.

Then check out the same thing for v5.x (Note: At the time of writing, the https://developer.cisco.com/site/apic-mim-ref-api/ v5.0(1) Model did NOT reflect what I found on a real APIC, as shown below from v5.0(2h) – so the change may have come between v5.0(1) and v5.0(2))

I think you can spot the difference. I’ve made it pretty obvious.

What you may not have realised is that when the filter information gets pushed to the leaves, it is the textual Constant value (i.e., the ssh) that gets pushed in the filter, rather than the numeric value (stupid idea in my opinion, but I didn’t write the code so my opinion doesn’t count)

When the switches still running v14 (the switch equivalent of APIC v4) code see the textual ssh, they look up the list of constants from the first list above and don’t find it, so use the default instead.


This is a bad thing. This will happen again if there is ever another port added to the list of constants. Cisco should do something about it.

What should Cisco do?

The way I see it, Cisco should do both of these things to avoid further problems in the future.

  1. Have the APIC always send filters as port numbers. Why it is any different I’ll never understand.
  2. Not have the default as unspecified(0) – instead make it 65535 – at least that would change the filter to allow only one port through.

Side Issue

I first discussed this in a Facebook post where Daniel Pita picked up an error in the GUI related to this change (and had it filed as bug CSCvv49124 – visible only to internals).  If you try to edit the filter later in the filter view, you see red boxes around the letters SSH, and if you try to edit it and select SSH from the drop down, it won’t let you!

So, I hope I save someone from grief with this post, and maybe even spur Cisco on to improving their code.


And thanks to Daniel for his help. You should check his blog

Posted in ACI, Cisco, Data Center, Data Centre | Comments Off on ACI Version mismatch Alert. Don’t use v5 on APIC and v14 on Leaves