I’m finally getting the chance to deploy OTV and LISP in a live environment and wanted to share one of the issues I’ve run into.
As I mentioned in my post about OTV Traffic Flow Considerations, using HSRP (or VRRP/GLBP) at each site has the potential to cause traffic to “trombone” through the network in a sub-optimal path. Because of this behavior, FHRP filtering should be configured on your OTV routers to ensure that the HSRP device on each side of the overlay becomes an active gateway for the network. The ASR1001 is supposed to have this built-in.
Here’s the topology:
The Problem
After I setup OTV and LISP, I noticed that I had spotty connectivity to my host inside the overlay. A continuous ping revealed that I was missing a ping or two almost every 60 seconds. When I looked at the route for that host, the age was always less than 1 minute. Since these routes are redistributed into OSPF, I went back to the OTV/LISP routers and tried to see what was happening.
On the OTV/LISP routers, I could see that the local Lisp routes were also being inserted and withdrawn regularly, which meant that Lisp thought the EID was moving to the other router. Since the LISP mapping system is in charge of communicating EID-to-RLOC mapping changes, I ran debug lisp control-plane map-server and observed the following output (abbreviated):
Oct 2 11:09:41.623 EDT: LISP: Processing received Map-Notify message from 10.5.7.82 to 10.5.8.82
...
Oct 2 11:09:41.623 EDT: LISP-0: Local dynEID MOBILE-VMS IID 0 prefix 10.78.1.245/32, Received map notify (rlocs: 1/1).
Oct 2 11:09:41.623 EDT: LISP-0: Local dynEID MOBILE-VMS IID 0 prefix 10.78.1.245/32, Map-Notify contains new locator 10.5.7.82, dyn-EID moved (rlocs: 1/1).
Since I hadn’t moved the VM across the overlay, It surprised me to see that LISP thought the VM was moving. After banging my head on the wall with that issue, I started looking lower in the stack at OTV.
During normal operation, the OTV routing table on the local OTV router (router closest to the host) should look like this:
SAV-OTVRTR2#sh otv route
...
OTV Unicast MAC Routing Table for Overlay1
Inst VLAN BD MAC Address AD Owner Next Hops(s)
----------------------------------------------------------
0 800 800 0000.0c07.ac4e 40 BD Eng Gi0/0/0:SI800
...
Note the route for 0000.0c07.ac4e , which is the MAC for HSRP group 78. This is a FHRP address, so should it even be showing up? Since it was there I assumed that the FHRP filtering must only prevent the route from being advertised to OTV neighbors.
But during one of the blips, I noticed this:
Inst VLAN BD MAC Address AD Owner Next Hops(s)
----------------------------------------------------------
0 800 800 0000.0c07.ac4e 30 ISIS RAD-OTVRTR2
So not only was the HSRP MAC showing up with FHRP Filtering enabled, but it was also still being advertised across the network. This shouldn’t be.
The Solution – for now
I opened a TAC case and consulted with Cisco about the issue. They agreed that it was “odd” that the HSRP information was leaking across the overlay and recommended I put in an ACL to block FHRP information:
mac access-list extended otv_filter_fhrp
deny 0000.0c07.ac00 0000.0000.00ff host 0000.0000.0000
deny 0000.0c9f.f000 0000.0000.0fff host 0000.0000.0000
deny 0007.b400.0000 0000.00ff.ffff host 0000.0000.0000
deny 0000.5e00.0100 0000.0000.00ff host 0000.0000.0000
permit host 0000.0000.0000 host 0000.0000.0000
…and apply the ACL to the OTV Inside interface.
You might notice that OTV automatically adds another ACL:
Extended IP access list otv_fhrp_filter_acl
10 deny udp any any eq 1985 3222 (57416 matches)
20 deny 112 any any
30 permit ip any any (51921 matches)
This ACL blocks the UDP ports used for HSRP and GLBP, as well as IP Protocol 112, VRRP. This must be the portion that is added by default, but it doesn’t seem to be sufficient.
Conclusion
I asked Cisco about why the extra ACL was necessary when the documentation indicates that FHRP was built-in and enabled by default. As soon as I hear something I’ll provide an update. As far as I know the Nexus 7K still requires you to manually configure these ACL’s, but it seems that, for now, so do the ASR’s.
Update: I heard back from Cisco TAC about my issue and they think my problem stems from the fact that I’m trying to use the same physical hardware for both the L2 bridging and the L3 gateway:
Due to the ASR1k architecture, it is recommended that you move FHRP off the ASR. It is unlike N7k architecture where we can keep FHRP on the same device and use a mix of MACLs, VACLs, etc to filter out the virtual MAC from going across the overlay. The only way to really prevent the virtual MAC from being learned across the overlay is to prevent the ASR from ever learning it in the first place.
In regards to the default OTV FHRP filtering, TAC confirmed that the otv_fhrp_filter_acl is added when OTV is configured. It doesn’t attempt to prevent L2 information from being learned however — it only attempts to block actual HSRP communication across the overlay.