The beauty of OTV is that you are no longer limited to segregating your L2 VLAN’s based on site, or location within your network. When used in conjunction with Virtual Machines, this means you can migrate machines between locations without having to modify IP addressing, giving you the ability to move entire server farms with only a few clicks.
Beauty has an ugly side, however. One of the not insignificant challenges with OTV is knowing how to best reach endpoints within the overlay network. Improper planning in this area can result in inefficient traffic flows through your network, and could possibly block end-end traffic altogether. Consider the following network:
Let’s say your host is in DC1, but your gateway is in DC2. How will traffic move through the network? Often called ‘traffic tromboning,’ this is where traffic enters through one side of your network, and uses the overlay to trombone across to the opposite side before returning back through the original datacenter.
It’s ugly, but you can fix that by using an FHRP to have a gateway in each site. As we know, the ASR’s have FHRP filtering configured and enabled by default, and there is documentation on how to configure filtering for the N7K. After adding gateways to both sites, you end up with this:
Well, that might be ok — if you turn a blind eye to the traffic flowing across your core multiple times, but it’s certainly not the most efficient. But to add insult to injury, what if your two sites have their own path out to the internet? How will your edge firewall respond when it receives traffic for a connection it doesn’t know about?
Conclusion
These are just some of the issues that need to be considered when evaluating an OTV solution. Multiple entry/exit points, firewall placement, flow lifetime, load-balancers, etc. combine to make the overall design complicated very quickly. Add in endpoint mobility (the whole point, right?), and you have to ensure that new flows will know how to reach the correct endpoint, and old flows either persist or can be reestablished quickly. In my next post, I’ll discuss one of the solutions I’m exploring to solve these issues.