A glimpse into LISP Control-Plane traffic

In our lab we were able to configure LISP and verify connectivity between our two hosts. One thing I noticed was the loss of the first two ICMP packets. Let’s walk through how LISP functions and examine what was happening behind the scene.

Upon receipt of the first packet, the SITE1 router (acting as an ITR) checked the LISP map cache to see if it already had an RLOC mapping for the destination EID (172.16.30.101):

SITE1#sh ip lisp map-cache
LISP IPv4 Mapping Cache for EID-table default (IID 0), 2 entries

0.0.0.0/0, uptime: 00:10:16, expires: never, via static send map-request
  Negative cache entry, action: send-map-request

This negative cache entry tells the router that it needs to send a map request to see if there’s an RLOC mapping available. The router will send a map request for EID 172.16.30.101/32 to the Map resolver at 192.168.255.3, and drop the initial data packet (ping #1) from our host:

LISP-Map-Reply

The Map Resolver/Map Server checks the namespaces that have registered, and looks for the RLOC address of the ETR that is authoritative for the EID prefix:

MR_MS#sh lisp site name SITE2
Site name: SITE2
Allowed configured locators: any
Allowed EID-prefixes:
  EID-prefix: 172.16.30.0/24
    First registered:     00:31:21
    Routing table tag:    0
    Origin:               Configuration
    Merge active:         No
    Proxy reply:          No
    TTL:                  1d00h
    State:                complete
    Registration errors:
    Authentication failures:   0
    Allowed locators mismatch: 0
    ETR 192.168.100.2, last registered 00:00:50, no proxy-reply, map-notify
                        TTL 1d00h, no merge, hash-function sha1, nonce 0x0524E21F-0x489614BB
                        state complete, no security-capability
                        xTR-ID 0xDC4A8044-0x87093251-0x602669CB-0x5B720F12
                        site-ID unspecified
    Locator        Local  State      Pri/Wgt
    192.168.100.2  yes    up          10/50 

Once the Map Server determines the RLOC for the authoritative ETR, it forwards the Map-Request message.  The ETR receives the forwarded Map-Request, and responds with a Map-Reply:

LISP-Map-Reply

Once the SITE1 router receives the reply, it updates the local cache:

SITE1#sh ip lisp map-cache
LISP IPv4 Mapping Cache for EID-table default (IID 0), 3 entries

0.0.0.0/0, uptime: 00:10:35, expires: never, via static send map-request
  Negative cache entry, action: send-map-request
0.0.0.0/1, uptime: 00:09:25, expires: 00:05:34, via map-reply, forward-native
  Negative cache entry, action: forward-native
172.16.30.0/24, uptime: 00:09:22, expires: 23:50:38, via map-reply, complete
  Locator        Uptime    State      Pri/Wgt
  192.168.100.2  00:09:22  up          10/50

Now that the router has a complete LISP cache, it can encapsulate packets in LISP headers and send them on their way.

LISP-Data-Packet-Header

LISP Data Packet Payload

In this setup, it’s interesting to note that we lost the first two ICMP packets to the control-plane process. The first packet was dropped by the SITE1 router as it went through the Map Request/Map Reply process to build the local cache. The second packet actually made it through to the other host, but the response was dropped by the SITE2 router as it also had to build the local cache. You can see some of that below:

Ping response sequence

Once the caches have been built, subsequent attempts are 100% successful:

[root@SITE1 ~]# ping -c 5 172.16.30.101
PING 172.16.30.101 (172.16.30.101) 56(84) bytes of data.
64 bytes from 172.16.30.101: icmp_seq=1 ttl=255 time=1.62 ms
64 bytes from 172.16.30.101: icmp_seq=2 ttl=255 time=1.41 ms
64 bytes from 172.16.30.101: icmp_seq=3 ttl=255 time=1.36 ms
64 bytes from 172.16.30.101: icmp_seq=4 ttl=255 time=1.48 ms
64 bytes from 172.16.30.101: icmp_seq=5 ttl=255 time=1.33 ms

--- 172.16.30.101 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4000ms
rtt min/avg/max/mdev = 1.338/1.442/1.626/0.114 ms

Conclusion

I think there are a few important items to remember about the LISP forwarding process. These probably seem obvious, but I still want to point them out:

  1. The mapping system is really more of a ‘director’, in that it doesn’t actually know the answers to queries, but knows who to talk to find out.
  2. LISP control-plane always uses the same source and destination ports: UDP/4342
  3. LISP data-plane packets are always destined for UDP/4341, but the source port will change.

Cisco has a website dedicated to LISP and you can find a great deal of information there, including the devices and software versions that support LISP.  I’d highly recommend checking it out:  Cisco LISP

Simple LISP Lab

We’re going to start with a very simple topology:

LISP-topology

Instead of using one of our standard routing protocols to advertise the host networks between the SITE1 and SITE2 routers, we’ll rely on LISP. OSPF will be used to advertise the loopback on the MS/MR router within the Core network. We’ll start configuring LISP assuming the basic network is ready to go.

First, lets setup LISP the SITE1 router:

router lisp
 locator-set SITE1
 192.168.100.1 priority 10 weight 50
 exit
!
database-mapping 192.168.5.0/24 locator-set SITE1
!
ipv4 itr map-resolver 192.168.255.3
ipv4 itr
ipv4 etr map-server 192.168.255.3 key secretkey
ipv4 etr
!

Now we’ll configure LISP on the SITE2 router:

router lisp
 locator-set SITE2
 192.168.100.2 priority 10 weight 50
 exit
!
database-mapping 172.16.30.0/24 locator-set SITE2
!
ipv4 itr map-resolver 192.168.255.3
ipv4 itr
ipv4 etr map-server 192.168.255.3 key secretkey
ipv4 etr
!

Finally, we’ll configure the Mapping Server and Mapping resolver functionality on the MS/MR router:

router lisp
site SITE1
 authentication-key secretkey
 eid-prefix 192.168.5.0/24
 exit
!
site SITE2
 authentication-key secretkey
 eid-prefix 172.16.30.0/24
 exit
!
ipv4 map-server
ipv4 map-resolver
exit

Once this is configured, our LISP infrastructure should be complete. Let’s check:

SITE1#sh ip lisp
  Instance ID:                      0
  Router-lisp ID:                   0
  Locator table:                    default
  EID table:                        default
  Ingress Tunnel Router (ITR):      enabled
  Egress Tunnel Router (ETR):       enabled
  Proxy-ITR Router (PITR):          disabled
  Proxy-ETR Router (PETR):          disabled
  Map Server (MS):                  disabled
  Map Resolver (MR):                disabled
  Delegated Database Tree (DDT):    disabled
  Map-Request source:               192.168.5.1
  ITR Map-Resolver(s):              192.168.255.3
  ETR Map-Server(s):                192.168.255.3 (00:00:55)
  xTR-ID:                           0x3D4C0900-0x95932BE0-0x2F5AF1F6-0x4C919E94
  ...

And over on SITE2:

SITE2#sh ip lisp
  Instance ID:                      0
  Router-lisp ID:                   0
  Locator table:                    default
  EID table:                        default
  Ingress Tunnel Router (ITR):      enabled
  Egress Tunnel Router (ETR):       enabled
  Proxy-ITR Router (PITR):          disabled
  Proxy-ETR Router (PETR):          disabled
  Map Server (MS):                  disabled
  Map Resolver (MR):                disabled
  Delegated Database Tree (DDT):    disabled
  Map-Request source:               172.16.30.1
  ITR Map-Resolver(s):              192.168.255.3
  ETR Map-Server(s):                192.168.255.3 (00:00:16)
  xTR-ID:                           0xDC4A8044-0x87093251-0x602669CB-0x5B720F12
  ...  

On the MS/MR Router, we can see that the ETR’s have registered with the LISP mapping system. Without this, LISP wouldn’t know the EID-to-RLOC mapping for each EID.

OTV-RTR3#sh lisp site
LISP Site Registration Information

Site Name      Last      Up   Who Last             Inst     EID Prefix
               Register       Registered           ID
SITE1          00:00:14  yes  192.168.100.1                 192.168.5.0/24
SITE2          00:00:17  yes  192.168.100.2                 172.16.30.0/24

Now, to show that LISP is working correctly, lets first show that we don’t have a route to the opposite site:

SITE1#sh ip route 172.16.30.101
% Network not in table
SITE1#ping 172.16.30.101
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.30.101, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)

SITE2# sh ip route 192.168.5.101
% Network not in table
SITE2#ping 192.168.5.101
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.5.101, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)

And since we didn’t configure a default route, there isn’t anything to fall back on.

Let’s test by pinging the SITE2 host:

[root@SITE1 ~]# ping 172.16.30.101
PING 172.16.30.101 (172.16.30.101) 56(84) bytes of data.
64 bytes from 172.16.30.101: icmp_seq=3 ttl=255 time=1.26 ms
64 bytes from 172.16.30.101: icmp_seq=4 ttl=255 time=4.57 ms
64 bytes from 172.16.30.101: icmp_seq=5 ttl=255 time=1.31 ms
^C
--- 172.16.30.101 ping statistics ---
5 packets transmitted, 3 received, 40% packet loss, time 4421ms
rtt min/avg/max/mdev = 1.268/1.374/1.466/0.086 ms

Looks good except for the two we lost at the beginning. Let’s try the reverse ping now:

[root@SITE2 ~]# ping 192.168.5.101
PING 192.168.5.101 (192.168.5.101) 56(84) bytes of data.
64 bytes from 192.168.5.101: icmp_seq=1 ttl=255 time=1.49 ms
64 bytes from 192.168.5.101: icmp_seq=2 ttl=255 time=1.31 ms
64 bytes from 192.168.5.101: icmp_seq=3 ttl=255 time=1.26 ms
64 bytes from 192.168.5.101: icmp_seq=4 ttl=255 time=4.57 ms
64 bytes from 192.168.5.101: icmp_seq=5 ttl=255 time=1.31 ms
^C
--- 192.168.5.101 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4001ms
rtt min/avg/max/mdev = 1.178/1.278/1.398/0.090 ms

So we have two way communication across our network, using LISP to locate and reach the opposite site. Next time we’ll dig a little deeper on what’s happening behind the scenes.

Cisco LISP

With technologies like OTV, we need a method to optimize traffic destined for our mobile virtualized hosts, by tracking the location and updating our underlying routing system. One possible solution is known as LISP.

LISP stands for Locator ID Separation Protocol, and its function is to allow you to separate the location component of an IP address, from the identity portion of the address. Or as the Cisco LISP configuration guide states, LISP “implements the use of two namespaces instead of a single IP address.” What does this mean?

With LISP you are introduced to two new concepts:

  • Endpoint Identifiers (EID)
  • Routing Locators (RLOC)

The endpoint identifier (EID) is the address used to identify a specific host — this is the same as the IP addresses you use today, and it is said to be in the LISP namespace. The Routing Locator (RLOC) is the address of a router that is part of the normal routing domain but is connected to the LISP namespace and the non-LISP namespace. The RLOC is said to be part of the non-LISP namespace.

One significant difference with LISP is that you no longer have to advertise the EID address space into the normal routing domain. Instead, you rely on LISP to provide mappings between EIDs and RLOCs, and you route based on the RLOC address.

I like to think of LISP as a DNS-like system for routing, because an important part of LISP is a mapping system that maintains EID-to-RLOC mappings. Just like you use DNS to query for name-to-IP mappings, a LISP router performs a query against a LISP mapping server to find out the RLOC that should be used to reach the desired EID.

Components of a LISP system

An complete LISP infrastructure will consist of many parts:

  • Ingress Tunnel Routers (ITR)
  • Egress Tunnel Routers (ETR)
  • XTR
  • Map Server (MS)
  • Map Resolver (MR)
  • Proxy Ingress Tunnel Routers (PITR)
  • Proxy Egress Tunnel Routers (PETR)
  • PXTR

ITR

The ingress tunnel router receives unencapsulated IP packets from the EID namespace, and is responsible for performing lookups to identify EID-to-RLOC mappings for destination addresses. If the packet is destined for an EID in another LISP namespace, the ITR will encapsulate each packet with a LISP header and route the packet towards the identified RLOC. If the packet is destined for a non-LISP address, the packet is routed without any LISP modifications.

ETR

The egress tunnel router receives LISP encapsulated packets from the non-LISP portion of the network, removes the LISP header, and delivers the unencapsulated packets to the EID. The ETR is also responsible for keeping the Mapping system up to date with EID mappings and responding to Mapping system requests.

LISP XTR

An XTR is a router that performs both ETR and ITR functions.

Map Server

The map server receives EID registrations from ETRs, and responds to map request messages that are forwarded from map resolvers.

Map Resolver

The map resolver receives encapsulated map request messages from ITRs and forwards them to Map Servers that are authoritative for the EID namespace being queried.

PITR/PETR/PXTR

The Proxy ITR/Proxy ETR allows non-LISP sites to communicate with LISP sites, and vice-a-versa, by performing ITR and ETR functionality. They can be deployed separately, or together. If deployed together the device is referred to as a PXTR.

Conclusion

It took me a little while to wrap my head around the LISP concept, and one big help was working with it in a lab environment.  In the next post, I’ll walk through a lab scenario to demonstrate LISP in action.

OTV Traffic Flow Considerations

The beauty of OTV is that you are no longer limited to segregating your L2 VLAN’s based on site, or location within your network.  When used in conjunction with Virtual Machines, this means you can migrate machines between locations without having to modify IP addressing, giving you the ability to move entire server farms with only a few clicks.

Beauty has an ugly side, however. One of the not insignificant challenges with OTV is knowing how to best reach endpoints within the overlay network. Improper planning in this area can result in inefficient traffic flows through your network, and could possibly block end-end traffic altogether. Consider the following network:

lisp-network

Let’s say your host is in DC1, but your gateway is in DC2. How will traffic move through the network?  Often called ‘traffic tromboning,’ this is where traffic enters through one side of your network, and uses the overlay to trombone across to the opposite side before returning back through the original datacenter.

lisp-network-trombone

It’s ugly, but you can fix that by using an FHRP to have a gateway in each site. As we know, the ASR’s have FHRP filtering configured and enabled by default, and there is documentation on how to configure filtering for the N7K. After adding gateways to both sites, you end up with this:

lisp-network-trombone2

Well, that might be ok — if you turn a blind eye to the traffic flowing across  your core multiple times, but it’s certainly not the most efficient.  But to add insult to injury, what if your two sites have their own path out to the internet?  How will your edge firewall respond when it receives traffic for a connection it doesn’t know about?

Conclusion

These are just some of the issues that need to be considered when evaluating an OTV solution.  Multiple entry/exit points, firewall placement, flow lifetime, load-balancers, etc. combine to make the overall design complicated very quickly.   Add  in endpoint mobility (the whole point, right?), and you have to ensure that new flows will know how to reach the correct endpoint, and old flows either persist or can be reestablished quickly.  In my next post, I’ll discuss one of the solutions I’m exploring to solve these issues.

Cisco OTV – Overlay Transport Virtualization – Part 2

In part 1 we configured OTV using multicast as the control plane transport method. But if you don’t want to use multicast, you can use unicast instead. The main difference is that you’ll also need to designate at least one OTV Edge device to act as an Adjacency Server.

Adjacency Server

With OTV in multicast mode, the underlying multicast infrastructure handled packet replication to each of the remote OTV devices, and also allowed the dynamic discovery of OTV peers. With unicast mode, each edge device must send a unicast copy of the packet to each remote device. The question is – how do they know which devices should receive each packet?

The adjacency server function allows a router to learn of, and distribute lists of edge devices, so all members of the OTV domain are aware of each other. The adjacency server creates a Unicast Replication List (URL) and distributes this list to each edge device, updating as necessary when devices join or leave.

Once you have your adjacency servers configured, adding a new site is as easy as configuring the new device with the address of the two Adjacency servers, who will then distribute information about the new device to the rest of the overlay.

Configuration

Since OTV can use only one mode at at time (unicast OR multicast) you must completely remove any multicast specific OTV commands before you can add the adjacency server config.

So we’ll start by removing our multicast control-group information from both routers:

interface Overlay1
no otv control-group 239.1.1.1
no otv data-group 232.1.1.0/28

Since our topology only has two routers, we’ll only configure OTV-RTR1 to act as the adjacency server, and then point it to itself:

interface Overlay1
otv adjacency-server unicast-only
otv use-adjacency-server 10.80.0.2 unicast-only

On OTV-RTR2, we’ll only specify the adjacency server address:

interface Overlay1
otv use-adjacency-server 10.80.0.2 unicast-only

Now let’s look at the status of the overlay:

OTV-RTR1#sh otv overlay1
Overlay Interface Overlay1
 VPN name                 : None
 VPN ID                   : 1
 State                    : UP
 AED Capable              : Yes
 Join interface(s)        : GigabitEthernet0/0/1
 Join IPv4 address        : 10.80.0.2
 Tunnel interface(s)      : Tunnel1
 Encapsulation format     : GRE/IPv4
 Site Bridge-Domain       : 100
 Capability               : Unicast-only
 Is Adjacency Server      : Yes
 Adj Server Configured    : Yes
 Prim/Sec Adj Svr(s)      : 10.80.0.2

We can see that this router is an adjacency server, and that it has an adjacency server configured. On OTV-RTR2:

OTV-RTR2#sh otv overlay1
Overlay Interface Overlay1
 VPN name                 : None
 VPN ID                   : 1
 State                    : UP
 AED Capable              : Yes
 Join interface(s)        : GigabitEthernet0/0/1
 Join IPv4 address        : 10.70.0.2
 Tunnel interface(s)      : Tunnel1
 Encapsulation format     : GRE/IPv4
 Site Bridge-Domain       : 100
 Capability               : Unicast-only
 Is Adjacency Server      : No
 Adj Server Configured    : Yes
 Prim/Sec Adj Svr(s)      : 10.80.0.2

We can also see that the OTV adjacency is up:

OTV-RTR1#sh otv adj
Overlay 1 Adjacency Database
Hostname                       System-ID      Dest Addr       Up Time   State
OTV-RTR2                       c08c.6008.0f00 10.70.0.2       00:47:50  UP

We can also see that the URL on each router contains the opposite edge device:

OTV-RTR1#sh otv adjacency-server replication-list
Overlay 1 Unicast Replication List Database
Total num: 1

Dest Addr       Capability
10.70.0.2       Unicast

OTV-RTR2#sh otv adjacency-server replication-list
Overlay 1 Unicast Replication List Database
Total num: 1

Dest Addr       Capability
10.80.0.2       Unicast

AED

I mentioned a term in the previous post that I would like to revisit. The AED, or Authoritative Edge Device. To get some background, let’s reexamine our topology:

Sample OTV topology

In the first topology there is only one entry and exit point for overlay traffic. But in a production environment, you will likely have redundant edge devices. What happens then, when we add another OTV edge device to a site?

otv-topology-2

We’ll go ahead and add another router on the left side of the diagram, and call it OTV-RTR1A.  Now we have two interfaces on the same Vlan that don’t participate in STP, which means it’s possible for a loop to form. To prevent this from happening, edge devices at the same site will elect an AED, which is then the only device allowed to forward traffic for the overlay. This functionality applies to traffic going both directions — the AED will be the only device allowed to both encapsulate frames into IP packets towards the overlay, and decapsulate OTV packets and forward frames to the local LAN.

It’s also important to note that an AED will be elected per VLAN for automatic load-balancing. One edge device will be the AED for the even number VLANs, and the other for the odd VLANs. As far as I know, this is not configurable.

AED Election

Remember that site ID and site VLAN we configured in part 1? The Site ID is used to identify Edge devices in the same site, and the Site VLAN is used for communication between the edge devices. Another value, the OTV System ID, is used to elect the AED.

The System ID is a combination of the IS-IS system ID (viewed with the show otv isis protocol command) and the site identifier. You can view the OTV system ID with the show otv site command.

Using our new topology, let’s go back and add the site vlan to the inside interface on OTV-RTR1 and OTV-RTR1A (RTR1A has already been configred for OTV, using RTR1 as the adjacency server):

OTV-RTR1(config)#interface Gig0/0/0
OTV-RTR1(config-if)#service instance 100 ethernet
OTV-RTR1(config-if-srv)#encapsulation dot1q 100
OTV-RTR1(config-if-srv)#bridge-domain 100
OTV-RTR1(config-if-srv)#^Z
OTV-RTR1#

OTV-RTR1A(config)#interface Gig0/0/0
OTV-RTR1A(config-if)#service instance 100 ethernet
OTV-RTR1A(config-if-srv)#encapsulation dot1q 100
OTV-RTR1A(config-if-srv)#bridge-domain 100
OTV-RTR1A(config-if-srv)#^Z
OTV-RTR1A#

Now let’s verify that the two edge devices recognize each other, and that one has been elected AED.

OTV-RTR1#sh otv site
Site Adjacency Information (Site Bridge-Domain: 100)

Overlay1 Site-Local Adjacencies (Count: 1)

Hostname       System ID      Last Change Ordinal    AED Enabled Status
*OTV-RTR1      001E.4962.5400 00:02:20    0          site       overlay
 OTV-RTR1A     001E.F6B5.2600 00:02:20    1          site       overlay

OTV-RTR1A#sh otv site
Site Adjacency Information (Site Bridge-Domain: 100)

Overlay1 Site-Local Adjacencies (Count: 1)

Hostname       System ID      Last Change Ordinal    AED Enabled Status
OTV-RTR1       001E.4962.5400 00:02:39    0          site       overlay
*OTV-RTR1A     001E.F6B5.2600 00:02:39    1          site       overlay

The two devices have formed a ‘site-local adjacency’. Who’s the AED?

OTV-RTR1#sh otv vlan authoritative
Key:  SI - Service Instance

Overlay 1 VLAN Configuration Information
 Inst VLAN  Bridge-Domain  Auth  Site Interface(s)
 0    250   250            yes   Gi0/0/0:SI250
 Total VLAN(s): 1
 Total Authoritative VLAN(s): 1

OTV-RTR1A#sh otv vlan authoritative
Key:  SI - Service Instance

Overlay 1 VLAN Configuration Information
 Inst VLAN  Bridge-Domain  Auth  Site Interface(s)
 Total VLAN(s): 1
 Total Authoritative VLAN(s): 0

We can see that OTV-RTR1 is the AED at the site.

Failover

To test a failover scenario, we’ll simply shutdown the inside interface on OTV-RTR1. Once the interface is down, you’ll notice that OTV-RTR1A now considers itself authoritative for VLAN 250:

OTV-RTR1A#sh otv vlan authoritative
Key:  SI - Service Instance

Overlay 1 VLAN Configuration Information
 Inst VLAN  Bridge-Domain  Auth  Site Interface(s)
 0    250   250            yes   Gi0/0/0:SI250
 Total VLAN(s): 1
 Total Authoritative VLAN(s): 1

And when we examine the OTV event log, we see the following entry:

[09/16/13 19:44:24.724 23C9 490] OTV-APP-ISIS: AED set to UP for overlay 1 bd 250

Keep in mind that OTV is using IS-IS under the hood, so the failover processes are dependent on IS-IS timers.

Also, it’s important to note that the failover process does interrupt traffic. There is a short window of about 8-9 seconds where the remote OTV device doesn’t have an OTV route from the other site. It’s hard to grab the output exactly on time, though, so you’ll have to test in your environment to verify. In the outputs below, notice how the route for Sw-1 (0009.b709.4b80) is withdrawn and then is inserted again with OTV-RTR1A as the next hop:

OTV-RTR2#sh clock
14:05:05.745 MST Mon Sep 16 2013
OTV-RTR2#sh otv route

Codes: BD - Bridge-Domain, AD - Admin-Distance,
   SI - Service Instance, * - Backup Route

OTV Unicast MAC Routing Table for Overlay1

Inst VLAN BD     MAC Address    AD    Owner  Next Hops(s)
----------------------------------------------------------
0    250 250   0009.b709.4b80 50    ISIS   OTV-RTR1
0    250 250   0009.b717.7880 40    BD Eng Gi0/0/1:SI250

OTV-RTR2#show clock
14:05:16.767 MST Mon Sep 16 2013
OTV-RTR2#sh otv route

Codes: BD - Bridge-Domain, AD - Admin-Distance,
    SI - Service Instance, * - Backup Route

OTV Unicast MAC Routing Table for Overlay1

Inst VLAN BD     MAC Address    AD    Owner  Next Hops(s)
----------------------------------------------------------
0    250 250   0009.b717.7880 40    BD Eng Gi0/0/1:SI250

OTV-RTR2#show clock
14:05:25.800 MST Mon Sep 16 2013
OTV-RTR2#sh otv route

Codes: BD - Bridge-Domain, AD - Admin-Distance,
    SI - Service Instance, * - Backup Route

OTV Unicast MAC Routing Table for Overlay1

Inst VLAN BD     MAC Address    AD    Owner  Next Hops(s)
----------------------------------------------------------
0    250 250   0009.b709.4b80 50    ISIS   OTV-RTR1A
0    250 250   0009.b717.7880 40    BD Eng Gi0/0/1:SI250

Conclusion

As you can see, OTV is still fairly simple to configure, but there’s a lot of stuff going on behind the scenes to make it work. Before you implement anything in production, be sure to have a good understanding of what’s actually happening.

Flexible Netflow on the 4500X

I’m a big fan of Solarwinds and their suite of network management products. If you’ve never seen or tried their products, head on over to their demo site and check it out. I recently added their Netflow product Network Traffic Analyzer and wanted to add netflow collection to my new 4500X switches.

The 4500X only support Flexible Netflow, aka Version 9, and doesn’t include any prebuilt flow record templates, so there are basically 4 steps to the configuration:

  • Create a flow record
  • Create a flow exporter
  • Create a flow monitor
  • Apply the monitor to an interface

Let’s look at each and go through a basic configuration.

Flow Record

The flow record defines the fields that will be used to group traffic into unique flows. Key fields are those which are used to distinguish traffic flows from each other. If a key field is different and doesn’t match an existing flow, a new flow will be added. Key fields are matched.

Non key fields aren’t used to distinguish flows from each other, but are collected as part of the data set you want to glean from each flow. Non key fields are collected.

For my flow record I used the following configuration:

flow record IPV4-FLOW-RECORD
    match ipv4 tos
    match ipv4 protocol
    match ipv4 source address
    match ipv4 destination address
    match transport source-port
    match transport destination-port
    collect interface input
    collect interface output
    collect counter bytes long
    collect counter packets long

So in my flow record, each flow will be distinguished by ToS, Protocol, Src/Dst address, src/dst port. I’m also interested in collecting the input and output interfaces, as well as the number of bytes and packets in each flow.

Flow Exporter

A flow exporter is basically a place to send the flow data you collect. By default, Cisco will send data to UDP/9995 but Orion expects it to arrive on UDP/2055. I also specified the source interface for the data so it will match the address that Orion uses to manage this node.

flow exporter Orion
    destination 192.168.0.245
    source Loopback0
    transport udp 2055

Flow Monitor

The flow monitor is where you link records and exporters together.

flow monitor IPV4-FLOW
    description Used for Monitoring IPv4 Traffic
    record IPV4-FLOW-RECORD
    exporter Orion

Once you’ve defined all the elements, it’s time to apply to an interface.

Applying the configuration

This particular 4500X install doesn’t have any routed interfaces, so my intention was to apply the flow monitor to an SVI. This resulted in the following error:

4500X-1(config-if)#ip flow monitor IPV4-FLOW input
% Flow Monitor: Flow Monitor 'IPV4-FLOW' : Configuring Flow Monitor on SVI interfaces is not allowed.
Instead configure Flow Monitor in vlan configuration mode via the command `vlan config <vlan number>'

Ok, we’ll try again:

4500X-1(config)#vlan config 2
4500X-1(config-vlan-config)#ip flow monitor IPV4-FLOW input

No problems there!

When I attempted to configure the flow monitor in the output direction, I received this error:

4500X-1(config-if)#ip flow monitor IPV4-FLOW output
% Flow Monitor: 'IPV4-FLOW' could not be added to interface due to invalid sub-traffic type: 0

I reread the Flexible netflow section in the configuration guide, and sure enough the very first limitation for 4500’s in a VSS configuration:

  1. The Catalyst 4500 series switch supports ingress flow statistics collection for switched and routed packets; it does not support Flexible Netflow on egress traffic.

Looks I won’t be able to configure collection for output statistics,at least not at this junction in the network.

Conclusion

Overall I thought the configuration was fairly straight forward. I ended up using the same configuration on the other routers in my network and this was the only instance where I was unable to collect output traffic statistics.

Cisco OTV – Overlay Transport Virtualization

First, let’s talk about what supports OTV — not much:

  • Nexus 7K
  • ASR 1K
  • CSR 1000V (For those of you not familiar with the Cloud Services router, I’d recommend reading this

What is OTV?

OTV is an encapsulation protocol that wraps L2 frames in IP packets in order to transport them between L2 domains. Typically this would be between remote datacenters, but it could also be within a datacenter if you needed an easy (expensive) way to extend a VLAN.

You will also see OTV referred to as ‘MAC Routing’, since the OTV devices are essentially performing routing decisions based on the destination MAC address in the L2 frame.

You might be thinking “Hey, I’ve already got this with EoMPLS and/or VPLS.” And you’d be right — you have the essence of what OTV accomplishes. What OTV adds, however, is simplicity and fault isolation.

When you configure OTV, you are defining 3 elements:

  • Join interface
    This is the interface that faces the IP core that will transport OTV encapsulated packets between sites.
  • Overlay InterfaceThis is the virtual interface that will handle the encapsulation and decapsulation of OTV packets sent between OTV edge devices.
  • Inside interface This is the interface that receives the traffic that will be sent across OTV.

What do I need before I can configure OTV?

Before you can setup OTV in your environment there are a few important details to know:

  • OTV adds 42 bytes of overhead into the packet header. This has implications if your MTU size is 1500 bytes (the default in most cases). You’ll need to either enable Jumbo frames across your core, or reduce the MTU size on your servers inside the OTV domain. UPDATE: You can enable OTV fragmentation by using the global command otv fragmentation join-interface.  I don’t know if this has any performance implications, but at least it’s an option for you if changing the MTU throughout your network is difficult.
  • With the latest code releases, I believe all platforms support either Unicast or Multicast for the OTV control-plane. If you have a multicast enabled core, use multicast — it’s really not too bad.

Topology and Configuration

For my topology I’m going to use two ASR 1K’s, a 4900M with two VRFs, and two 3550 switches. I know I could’ve left out the VRFs, but I wanted to make my topology as close as possible to real-life. So we end up with this:

Sample OTV topology

So let’s move on to the OTV configuration.

OTV Site information

Part of any OTV config will be defining the site identifiers and the Site Bridge-Domain. The site identifier is how an OTV device determines whether or not it is at the same location as another OTV device.

OTV-RTR1:

otv site-identifier 0001.0001.0001

OTV-RTR2:

otv site-identifier 0002.0002.0002

The site bridge-domain is the Vlan that the OTV edge devices at the same site will use for AED election. Since this VLAN will not be part of the overlay, we can use the same command on both routers.

otv site bridge-domain 100

The Join interface

The join interface will be the source for all OTV packets sent to remote OTV routers, and it will be the destination for OTV packets that need to come to the site. For multicast control-plane implementations you’ll need to enable Passive PIM and IGMPv3.

OTV-RTR1:

interface Gig0/0/1
mtu 8192
ip address 10.80.0.2 255.255.255.0
ip pim passive
ip igmp version 3

Also note that the MTU has been adjusted to accommodate the increased size of the OTV packet. This will be the same on the second OTV-RTR except for the IP address.

Overlay Interface

In the overlay interface configuration we have to specify the multicast group used for control messaging, as well as the range of multicast groups that will be used for passing multicast data within the VLAN. We will also specify which interface will be used as the join interface. This will be the same on both routers:

interface Overlay1
otv control-group 239.1.1.1
otv data-group 232.1.1.0/28
otv join-interface GigabitEthernet0/0/1
no shutdown

Once you turn up the Overlay interface on both sides, you should see your OTV adjacency form:

OTV-RTR1#show otv adjacency
Overlay 1 Adjacency Database
Hostname                       System-ID      Dest Addr       Up Time   State
OTV-RTR2                       c08c.6008.0f00 10.70.0.2       00:00:36  UP

At this point since there isn’t a Vlan bridged to the Overlay, there will be now OTV routing information:

OTV-RTR1#show otv route

Codes: BD - Bridge-Domain, AD - Admin-Distance,
       SI - Service Instance, * - Backup Route

OTV Unicast MAC Routing Table for Overlay1

 Inst VLAN BD     MAC Address    AD    Owner  Next Hops(s)
----------------------------------------------------------

0 unicast routes displayed in Overlay1

----------------------------------------------------------
0 Total Unicast Routes Displayed

Adding Vlans to the Overlay

The last step will be to add the appropriate VLAN’s to the overlay. This config assumes that the router will receive the traffic from the switch with an 802.1Q tag:

interface GigabitEthernet0/0/0
service instance 250 ethernet
encapsulation dot1q 250
bridge-domain 250
!
interface Overlay1
service instance 250 ethernet
encapsulation dot1q 250
bridge-domain 250

Verifying

I created a Vlan interface on each switch to use as my ‘hosts’ for the ping tests.

Sw-1 VL250 = 0009.b709.4b80

Sw-2 VL250 = 0009.b717.7880

Pinging between devices is successful. Let’s look at the switches to see how it looks:

SW-1:

Vlan    Mac Address       Type        Ports
----    -----------       --------    -----
 250    0009.b709.4b80    DYNAMIC     Gi0/1

OTV-RTR1:

OTV-RTR1#sh otv route

OTV Unicast MAC Routing Table for Overlay1

Inst VLAN BD     MAC Address    AD    Owner  Next Hops(s)
----------------------------------------------------------
 0    250  250    0009.b709.4b80 50    ISIS   OTV-RTR2
 0    250  250    0009.b716.7880 40    BD Eng Gi0/0/0:SI250

So we can see that SW-1 knows to reach Sw-2 out interface Gi0/1, which connects to OTV-RTR1. OTV-RTR1 shows that it’s learned the MAC for SW-2 via OTV(ISIS) from OTV-RTR2. So anytime it receives frames for this MAC, it knows to forward them across the overlay.

OTV-RTR2:

OTV-RTR2#sh otv route

OTV Unicast MAC Routing Table for Overlay1

Inst VLAN BD     MAC Address    AD    Owner  Next Hops(s)
----------------------------------------------------------
0    250  250    0009.b709.4b80 40    BD Eng Gi0/0/0:SI250
0    250  250    0009.b716.7880 50    ISIS   OTV-RTR1

OTV-RTR2 shows that SW-2 is out the local service-instance. Any packets that come across the overlay will be decapsulated and forwarded out the local interface.

Wrap Up

Getting a basic OTV config up and running is not that difficult. Next time I’ll talk about using unicast instead of multicast, and also about AED.

Multicast Performance on ESX

Multicast data has always been somewhat of a mystery to network engineers unless they have a very specific reason for using it. Since the financial industry is a heavy user of multicast, I have been fortunate to get my hands very dirty in it throughout my career.

One item that has always vexed our group is how we can consolidate our multicast workloads, and extend the efficiency gains of virtualization to this segment of our environment. These boxes represent a significant cost, and they often go under utilized in terms of CPU/Memory. But because of the nature of the data, it’s difficult to try anything that can degrade performance.

In ESX 5.0, Vmware introduced a new technology that is supposed to help alleviate the performance bottlenecks

  • splitRxMode

I’ll summarize the feature here as described in the Technical Whitepaper:

splitRxMode

In previous versions of ESX, all network receive processing for a queue was performed inside a single context within the VMKernel. splitRxMode allows you to direct each vNIC (configured individually) to use a separate context for receive packet processing.

They make a special note to indicate that even though it improves receive processing for multicast, it does incur a CPU penalty due to the extra overhead per packet, so don’t enable it on every machine.

Performance

In their testing, VMWare labs reported that they observed 10-25% packet loss on a 16Kpps multicast stream once the number of subscriber VM’s went past 24. After they enabled splitRxMode, the packetloss was < 0.01% all the way up to 32 VM’s on the host.

My Take

Even though VMWare seems confident that the recent IO improvements with splitRxMode will increase multicast performance, there are some key considerations here:

  1. 0.01% is still a lot of packet loss — at 16Kpps, that’s still over 1pps
  2. The scenario they tested is for a one-to-many situation (one stream to multiple receivers). What if the packet rate is higher or the number of streams is higher, but the receiver count is low?

Obviously this requires a lot more testing on our part before we’d ever even consider rolling anything to production. If you have any experience in this regard, please feel free to comment and offer any insights/suggestions you might have.

NOTE: This entry is my first foray into technical blogging. I’ve learned a lot from the blogs I’ve read over the years, and I’ve also found that these types of blogs are the absolute best resource for solving real problems. I hope I can contribute something meaningful and perhaps repay some of what I’ve been given.