Introduction to Cisco Firepower Threat Defense (FTD) on ASA 5500-X

Introduction to Cisco Firepower Threat Defense (FTD) on ASA 5500-X

This week I’m working on testing out the new Firepower Thread Defense (FTD) 6.1 image for the ASA 5500-X, and hopefully getting familiar with how things work in the new setup. One of the things I’m most excited about is the onboard management interface — this is an HTML based interface that no longer requires ASDM, which is a huge step in the right direction, in my opinion.

I’m going to go cover the reimage process and see what a box looks like from a fresh start, as well as give some overviews of the management interface and the CLI. I’ll try not to dig too deep in this introduction but I’m hoping to provide a lot of screenshots of various screens and things I notice during the setup.

For your reference, you can find the 6.1.0 release notes here, and the Firepower Threat Defense 6.1.0 Configuration Guide here.

Reimaging

I’ll cover the gist of the reimage process now, but you can find the full instructions here.

The big thing to note during the reimage, is that it will wipe out everything you have on your device — configuration, ASA/ASDM images, Anyconnect packages — everything. So be sure to backup anything you want to keep.

To complete the reimage you’ll need console access to your ASA, a TFTP server, and the FTD cdisk file for your platform.

Let’s get started.

  1. Verify your hardware is the correct ROMMON version. FTD requires minimum version of 1.1.8. You can verify this using the show module command. Look at the Fw Version value.
    5515-X# sh module
    
    Mod  Card Type                                    Model              Serial No.
    ---- -------------------------------------------- ------------------ -----------
    0 ASA 5515-X with SW, 6 GE Data, 1 GE Mgmt, AC ASA5515            XXX
    ...
    
    Mod  MAC Address Range                 Hw Version   Fw Version   Sw Version
    ---- --------------------------------- ------------ ------------ ---------------
    0 84b8.022a.133f to 84b8.022a.1346  1.0          2.1(9)8      9.4(2)6
    
  2. Reload your ASA
  3. Interrupt the boot process by pressing ESC when prompted.
  4. In ROMMON, configure your network settings:
    rommon #0> interface gigabitethernet0/1
    rommon #1> address 10.2.3.11
    rommon #2> server 10.2.3.135
    rommon #3> gateway 10.2.3.135
    rommon #4> file ftd-boot-9.6.2.0.cdisk
    rommon #5> set
    
  5. Confirm your settings and commit the changes using the sync command
    rommon #6> sync
    
  6. Initiate the image download using the tftpdnld command:
    rommon #7> tftpdnld
    
  7. After the downloaded, the device will load the image and you’ll be at an FTD Boot console. From here use the setup command to configure the basic parameters for your box (Hostname, address, gateway, DNS, NTP).
  8. The last step is to download and install the actual FTD install package.
    > system install noconfirm http://10.2.3.135:8080/ftd-6.1.0-330.pkg
    

    The documentation says this step could take up to 30 minutes, but mine finished in less time.

Configuring the ASA using Firepower Device Manager

Once the box is back online, we’re now ready to test out the new onboard management interface, Firepower Device Manager. Browsing to the management address, we’re presented with a screen that almost brings a tear to my eyes:

asa-ftd-screenshot-1

Finally! After so many years of fighting with ASDM and trying to find the right Java version, we’re finally able to use a built in web interface. Ignore any limitations with the available functionality in FDM for now — just savor the moment.

The default login is admin and Admin123

After login we’re have to go through an initial setup wizard.

asa-ftd-screenshot-2

That’s fine, I guess, so let’s move through it. I’ll select Gig0/5 as my outside interface since I don’t have it hooked up to anything but the LAN right now.

asa-ftd-screenshot-3

Next we’ll setup the outside interface addresses for IPv4 and IPv6:

asa-ftd-screenshot-4

And the Management interface DNS. I choose DHCP during the CLI setup, so this info was already populated for me. All I did was change the hostname.

asa-ftd-screenshot-5

Click Next and wait patiently….

asa-ftd-screenshot-6

Uh oh, bold red text is bad, right?

asa-ftd-screenshot-7

Ok, fine. So it really wants to be able to talk to the outside world during the setup. My test box is remote, and since I only had 2 interfaces connected, I figured I would configure the box without an internet connection for now. Guess FDM had other ideas. So I’ll go back and change the outside interface to be Gig0/0, and we’ll leave the inside disconnected.

I went back through the last couple of screens after changing the outside interface, and was then asked to configure NTP:

asa-ftd-screenshot-8-configure-ntp

Now we get to the licensing page. It looks like FTD will only use Smart Licenses, so I’ll be sure to familiarize myself with that in the very near future. For now I’ll use the 90-day eval license.

asa-ftd-screenshot-9-use-smart-licensing-or-else

And bingo, we’re ready to rock and roll!

asa-ftd-screenshot-10-ready-to-go-whats-next

The Dashboard

After the confirmation window, we land at the device dashboard:

asa-ftd-screenshot-11-device-dashboard

There’s not much to say about this. It’s got a nice clean look, and it gives you quick access to most of the basic settings on your box. I would compare it to the Device Setup and Device Management sections from ASDM.

Monitoring Menu

asa-ftd-screenshot-15-monitoring

The System Dashboard within the monitoring menu is really similar to the ASDM landing page — you have some graphs of throughput, CPU, and Memory, as well as event counts and Disk usage. My test box doesn’t have anything connected to it, unfortunately, so I have to apologize since my screenshots won’t be showing much more than the layout.

If you look down the menu on the left side of the screen, you’ll see the Firepower categories. This is the same info you would see in the Firepower Management Center (FMC) console, or your Firepower Dashboards within ASDM if you’re running it direclty on the box.

Policies Menu

When you click on the Policies menu item, you land on the Access Control page. This is where you will build your policies for allowing/denying traffic and is analagous to the ASDM Access Rules page. There is a default rule already installed (part of the initial setup process) allowing all traffic from inside to outside.

asa-ftd-screenshot-14-access-control

You’ll quickly notice that much like other sections in the management interface, the access rule page feels a bit like the FMC, or even like the Palo Alto firewall interface, for those of you who are familiar with PA. The similarities are even more apparent when you add an access rule:

asa-ftd-screenshot-17-access-rule

Clicking on the NAT item, you’ll see the default NAT rule that was also added during the intial setup:

asa-ftd-screenshot-18-default-nat-rule

Adding a new NAT rule is just as easy as it was in ASDM, although now you are required to create objects for everything:

asa-ftd-screenshot-19-nat-rule-example

asa-ftd-screenshot-20-nat-rule-options-example

Something to note here that differs from ASDM — hovering over objects does not reveal the IP address of the object, only the object name again.

The last page under the Policies menu is Identity and this is where you configure policies on obtaining user identity information.

The first thing to do here is define where you will pull identity information. You can choose between Active Directory or … Active Directory. AD is currently the only supported server type, and you’re only allowed to configure one server here.

asa-ftd-screenshot-21-identity-server-configuration

Once you have a Directory Server configured you can add identity policy rules. The two types of Authentication available are Active and No Auth. Active Authentication is only used on HTTP traffic, per this note in the help documentation:

Keep in mind that regardless of your rule configuration, active authentication is performed on HTTP traffic only. Thus, you do not need to create rules to exclude non-HTTP traffic from active authentication. You can simply apply an active authentication rule to all sources and destinations if you want to get user identity information for all HTTP traffic.

Another thing to note from the help documentation is that Identity policies don’t actually block traffic — they’re used for gathering information only.

I won’t dig too deep into this right now, but there are two types of Active Authentication – HTTP and HTTP response, and one transparent method that uses integrated windows Authentication.

Objects Menu

Moving over to the objects menu, you’ll see that this is a very familiar space where we can define our host/network objects, port/port group objects, and security zones. The only thing to notice here is that we can configure application filter, URL, and geolocation objects for use in access control rules.

asa-ftd-screenshot-22-objects-menu

One final note about the Policies and Objects menus, is that much like ASDM, changes are queued for delivery to the device. As you begin making changes, you’ll noticean icon on the top bar with an orange dot:

asa-ftd-screenshot-23-ready-to-deploy

Seems simple enough — queue changes to deliver them in bulk – got it. One thing I couldn’t find, however, was a cancel or reset changes button. So at this point in time it appears that changes made through the FTD interface are a one way street — better make sure you backed up your config before you started messing around with things.

On the plus side though, after the deployment is completed you can see a record of the changes that were made:

asa-ftd-screenshot-24-deployment-summary

The CLI

The beloved ASA CLI has also changed with the FTD image. After you first login, you can see that we are no longer in Kansas, er, in ASA land anymore. Instead, we’re running the Cisco Fire Linux OS:

    Copyright 2004-2016, Cisco and/or its affiliates. All rights reserved.
    Cisco is a registered trademark of Cisco Systems, Inc.
    All other trademarks are property of their respective owners.

    Cisco Fire Linux OS v6.1.0 (build 37)
    Cisco ASA5515-X Threat Defense v6.1.0 (build 330)

    >

At first glance things appear pretty similar — you can still run most of your show commands, including:

  • Viewing translations
          > show xlate
            1 in use, 1 most used
            Flags: D - DNS, e - extended, I - identity, i - dynamic, r - portmap,
               s - static, T - twice, N - net-to-net
            NAT from outside:0.0.0.0/0 to any:0.0.0.0/0
                flags sIT idle 51:12:07 timeout 0:00:00
    
  • connections
    > show conn all
    8 in use, 14 most used
    
    UDP outside  0.0.0.0:68 NP Identity Ifc  255.255.255.255:67, idle 0:01:23, bytes 300, flags -
    UDP outside  10.2.3.102:68 NP Identity Ifc  255.255.255.255:67, idle 0:01:07, bytes 600, flags -
    UDP outside  10.2.3.97:68 NP Identity Ifc  255.255.255.255:67, idle 0:00:40, bytes 900, flags -
    UDP outside  10.2.3.47:68 NP Identity Ifc  255.255.255.255:67, idle 0:00:37, bytes 900, flags -
    UDP outside  10.2.3.147:68 NP Identity Ifc  255.255.255.255:67, idle 0:01:19, bytes 900, flags -
    
  • routes
    > show route
    
    Codes: L - local, C - connected, S - static, R - RIP, M - mobile, B - BGP
        D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
        N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
        E1 - OSPF external type 1, E2 - OSPF external type 2, V - VPN
        i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
        ia - IS-IS inter area, * - candidate default, U - per-user static route
        o - ODR, P - periodic downloaded static route, + - replicated route
    Gateway of last resort is 10.2.0.254 to network 0.0.0.0
    
    S*       0.0.0.0 0.0.0.0 [1/0] via 10.2.0.254, outside
    C        10.2.0.0 255.255.248.0 is directly connected, outside
    

and much, much more!

Suffice it to say that a lot of what you’re used to seeing from the CLI is still available as it relates to viewing your setup and troubleshooting. The big gotcha, however, is that it appears you can’t easily make changes from the CLI. There is no configure terminal any more, and the configuration commands left available to you are minimal:

> configure
  disable-https-access   Disable https access
  disable-ssh-access     Disable ssh access
  firewall               Change to Firewall Configuration Mode
  high-availability      Change to Configure High-Availability Mode
  https-access-list      Configure the https access list
  log-events-to-ramdisk  Configure Logging of Events to disk
  manager                Change to Manager Configuration Mode
  network                Change to Network Configuration Mode
  password               Change password
  ssh-access-list        Configure the ssh access list
  ssl-protocol           Configure SSL protocols for https web access.
  user                   Change to User Configuration Mode

Some of the available configure commands are a bit misleading as well. For example, configure firewall does not allow you to actually change anything about the firewall other than routed or transparent mode. Clearly the goal here is to get you out of the CLI and back into the Web interface. In fact, I wasn’t able to find anything really useful to configure from the CLI — just basic items that you would use to setup the box in the first place. If I find any more useful details I’ll update this post, but for now I’ll just assume that CLI is for troubleshooting only, and all configuration should be done from the GUI.

My Observations

First of all, I love the direction this is going and have wondered for years why Cisco stayed with ASDM given that the competitors are using built-in interfaces. That being said, I also realize and acknowledge that it takes a lot of effort to move away from a management tool like ASDM. I was at Cisco Live this year in Las Vegas, and the ASDM angst was palpable. In fact, when FTD was mentioned in one of my sessions, the crowd went wild when the presenter made the comment that there was no more ASDM in FTD. Many of us have years of experience with ASA’s (or even PIX), so ASDM is very comfortable to us, but it’s hard to deny the anguish it has caused over the years.

As for Firepower Threat Defense itself, it’s a great start and I can’t wait to see what the next releases bring. I’m calmly reminding myself that this is the dot zero first 6.1 release of FTD. Things take time, and the best things take more time.

NX-OS Port Profiles

As I become more familiar with NX-OS, I frequently find features that are meant to make life easier for us network Admins and Engineers. I’ve been informed that the days of CLI jockey’s are rapidly coming to an end, and rightly so, but even with my best DevOps attempts I still find myself having to manually edit configs frequently. One of my least favorite tasks is adding a new Vlan to our ESX cluster — there are just so many interfaces to touch. There must be a better way! Turns out, there is at least one better way (of many, I’m sure) — port profiles.

Port Profile Overview

Port profiles are interface configuration templates that can be assigned to ports that have the same configuration requirements. If you’ve ever found yourself copying and pasting interface configurations on a box, then port-profiles can help you.

The limit to the number of ports that can inherit a profile is platform dependent — my Nexus 7700’s show a limit of 16384, while my Nexus 9300’s show 512.

Creating a port profile

Let’s walk through a really simple example of a port-profile

  1. First, we create the profile, and in so doing, define the type of interface to which the profile will be applied

    NX9K(config)# port-profile type ?
      ethernet          Ethernet type
      interface-vlan    Interface-Vlan type
      port-channel      Port-channel type
    

    For this example we’ll create a ethernet type. Please also note that on the Nexus 7K’s we can also use types of loopback and tunnel.

  2. Next we define the commands that will be applied to every interface

    NX9K(config)# port-profile type ethernet MY-TEST-PROFILE
    NX9K(config-port-prof)# switchport
    NX9K(config-port-prof)# switchport mode trunk
    NX9K(config-port-prof)# switchport trunk allowed vlan 10,20,30,40,50,100
    NX9K(config-port-prof)# spanning-tree port type edge trunk
    NX9K(config-port-prof)# no shutdown
    
  3. Lastly we change the state of the profile to enabled.

    NX9K(config-port-prof)# state enabled
    

That’s all you need to do to create a profile. We can review the configuration on our profile by using the show port-profile command:

NX9K# show port-profile

SHOW PORT_PROFILE

port-profile MY-TEST-PROFILE
 type: Ethernet
 description:
 status: enabled
 max-ports: 512
 inherit:
 config attributes:
  switchport
  switchport mode trunk
  switchport trunk allowed vlan 10,20,30,40,50,100
  spanning-tree port type edge trunk
  no shutdown
evaluated config attributes:
 switchport
 switchport mode trunk
 switchport trunk allowed vlan 10,20,30,40,50,100
 spanning-tree port type edge trunk
 no shutdown
assigned interfaces:

This output gives us nearly all the info we need — the type of profile we created, the commands that it contains, commands that are actually being applied (evaluated), and any interfaces that are assigned to use this profile. At this point we haven’t assigned an interface so let’s do that now.

Assigning profiles to interfaces

To assign our newly created profile, we use the inherit port-profile interface sub-command

interface eth101/1/1
  inherit port-profile MY-TEST-PROFILE

And that’s it! Very easy stuff here.

Now the best part comes days or months later when you need to modify the ports. You simply add the new command(s) to the profile, and all assigned interfaces automatically get the updated config.

Viewing interface and port profile configurations

The only thing to remember down the road is that now your interfaces won’t show the actual configuration. So your standard show interface only shows the inherit command:

interface Ethernet101/1/16
    inherit port-profile MY-TEST-PROFILE

There are two ways you can see the commands as applied to each interface. First, you can display the full interface config using the command show port-profile expand-interface name PROFILE_NAME

NX9K# sh port-profile expand-interface name MY-TEST-PROFILE

port-profile MY-TEST-PROFILE
 Ethernet101/1/16
  switchport mode trunk
  switchport trunk allowed vlan 10,20,30,40,50,100
  spanning-tree port type edge
  no shutdown

Or, you can use the command show run interface INTERFACE expand-port-profile

NX9K# sh run int eth101/1/16 expand-port-profile

interface Ethernet101/1/16
 switchport mode trunk
 switchport trunk allowed vlan 10,20,30,40,50,100
 spanning-tree port type edge
 no shutdown

The difference here is that the show port-profile expand-interface command will show you all interfaces with that profile assigned, where the show run interface is only displaying the single interface.

Inheritance

Another great feature of port-profiles is that they are inheritable. This allows you to modularize your configurations and reference them by profile name within other profiles. I came across a good example of this in a presentation from Cisco about using profile inheritance on the Nexus 1000V. In their example, they were applying the same switchport mode and vlan access settings but wanted to apply varying QoS policies. So in their example they had the following profiles:

port-profile WEB
 switchport mode access
 switchport access vlan 100
 no shut

port-profile WEB-GOLD
 inherit port-profile WEB
 service-policy output GOLD

port-profile WEB-SILVER
 inherit port-profile WEB
 service-policy ouput SILVER

interface Eth1/1
 inherit port-profile WEB-GOLD

interface Eth1/2
 inherit port-profile WEB-SILVER

The end result is that all assigned interfaces are configured as access ports in vlan 100, but the QoS policy differed. Only 4 levels of inheritance are supported, so don’t go too crazy here.

Things to remember

As you begin to work with profiles, there are some important things to remember as it relates to order of precedence in the commands that will take effect on the interface. Taken straight from the documentation:

The system applies the commands inherited by the interface or range of interfaces according to the following guidelines:

  • Commands that you enter under the interface mode take precedence over the port profile’s commands if there is a conflict. However, the port profile retains that command in the port profile.

  • The port profile’s commands take precedence over the default commands on the interface, unless the port-profile command is explicitly overridden by the default command.

  • When a range of interfaces inherits a second port profile, the commands of the initial port profile override the commands of the second port profile if there is a conflict.

  • After you inherit a port profile onto an interface or range of interfaces, you can override individual configuration values by entering the new value at the interface configuration level. If you remove the individual configuration values at the interface configuration level, the interface uses the values in the port profile again.

  • There are no default configurations associated with a port profile.

One other important detail from the documentation states that checkpoints are created anytime you enable, modify, or inherit a profile, this way the system can roll back to a good configuration in case of any errors. A profile will never be partially applied — if there are errors, the config is backed out.

So go out and make your life easier — try out port profiles today!

SNMP polling interval granularity

I recently had a need to increase the granularity of SNMP monitoring on some critical network interfaces, and I thought I’d share what I’ve learned so far.

The reason behind the change was due to an issue where we briefly maxed out the bandwidth on of our WAN circuits. For most companies this would be no big deal, especially since the burst in traffic was less than a minute in duration. However, because our customers are very sensitive to latency, this caused some noticeable delays.

The problem with interval monitoring is that you just don’t know what happens between the polls. The whole idea of interval monitoring is to get an approximation of what’s happening, so by design you’re going to miss some of the details. But how do you know what you miss?

Take a look at this picture to get an idea of what I’m talking about:

SNMP Granularity Example

As you can see, the longer the time between polls, the more detail is missed. This might not be a problem for you — I would say that the goal of most monitoring solutions is to look for traffic *trends*, and not necessarily sub minute bursts. But if you need the extra detail, then it becomes quite a challenge, and there are limited solutions depending on the size of your budget.

If money is no object, then you can buy one of the fancy network traffic monitors by Corvil or Netscout. These are awesome boxes that record every drop of data you feed through them, so that you can replay the exact issue over and over. You pay handsomely for that functionality.

cPacket networks and Riverbed also have some cool products, but price is definitely a factor as well.

The only way to get this kind of granularity on the cheap is to leverage SNMP and pump up the granularity.

IOS SNMP Configuration

Disclaimer: Do not apply these commands to your production environment until you have tested and fully understand the potential impact

By my rough estimation and testing, most Cisco devices (IOS, IOS XE, and NX-OS), by default, will update their internal SNMP statistics every 10 seconds +/-. But there’s hidden command in IOS that will let you adjust the interval at which IOS updates the internal statistics database.

snmp-server hc poll <interval>

The interval for this command is in hundredths of seconds. So to update at 1 second intervals, you need:

snmp-server hc poll 100

I have used this command successfully on Cisco ISR G2 routers, 7200 routers, and 4900M switches.

This does not work on IOS XE or NX-OS devices, including ASR1000’s and Nexus 7K or 3K’s.

Network Management Systems

You might recall that I’m a big fan of Solarwinds for network monitoring. Unfortunately, the lowest interval you can configure for statistics collection in Solarwinds NPM is 1 minute. So I had to look elsewhere for a product that could poll more frequently.

Enter SevOne. Their platform allows you to configure ‘high-speed pollers’ which can be tuned down to 1 second intervals. The folks at SevOne are great to work with, and I received a lot of good guidance as I was configuring my instance.

This isn’t meant to be a review or comparison of the two products — SevOne solved a very specific problem for me and at a price point that was 1/10th the cost of the next closest solution, so props to them.

The other possible solutions are Cacti or other RRDtool based statistics/graphing systems. Since these are so flexible, I would expect that you could grab stats down to one second with these as well, although I haven’t verified this.

Request for Comments

There’s some discussion out there about the problems you can run into if you poll too frequently, one of which is polling more frequently than the agent updates. I definitely observed this problem as I was working on this, and saw the wild numbers you can get. If you have any thoughts on why polling more frequently would be bad, or might lead you to the wrong conclusions, please feel free to comment.

Cisco EZVPN with IOS Router and ASA

I had an interesting request come across my desk, where I needed to configure a site-to-site VPN for some internet connected devices, but the devices were not allowed to connect internally to our network. So basically, I needed to tunnel the internet traffic back to our headend without allowing access to the internal network. The remote location also wouldn’t have a static IP. Having used EZVPN in the past, I figured this would be another great use case. Unfortunately I spent way too many hours trying to find a good example of how to get this setup working, so I figured I’d share my config for anyone else who may be struggling with a similar setup.

Diagram

EZVPN with IOS and ASA

IOS Router Config (EZVPN Client)

crypto ipsec client ezvpn ez
 connect auto
 group MyTunnelGroup key MySecretKey
 mode client
 peer 10.10.10.1
 username MyVPNUser password MyPassword
 xauth userid mode local
!
interface Fa0/0
 description WAN Interface
 ip address dhcp
 crypto ipsec client ezvpn ez
!
interface Fa0/1
 description LAN Interface
 ip address 192.168.0.1 255.255.255.0
 crypto ipsec client ezvpn ez inside
!

The first section defines the properties for the EZVPN connection, and there are 3 items that need special attention:

  1. The group and key you configure here will match the TunnelGroup name and IKEv1 key you configure on the ASA
  2. The username and password are also defined on the ASA. This is the actual user that is being authenticated.
  3. The xauth mode needs to be configured as local so the router doesn’t have to prompt for credentials.

Other items to note:

  1. There are three modes for EZVPN, Client, Network Extension, and Network Plus. If this were a true L2L VPN, I’d use Network Extension or Network Extension Plus so that there was direct IP-IP connectivity between hosts on either side of the VPN. Since I don’t need that, I’m configuring Client mode which is similar to a PAT for all client traffic.
  2. The peer IP will be the outside address of your EZVPN server.

ASA Configuration (EZVPN Server)

access-list EZVPN-ACL standard deny 10.0.0.0 255.0.0.0
access-list EZVPN-ACL standard permit any4
!
group-policy MyGroupPolicy internal
group-policy MyGroupPolicy attributes
 dns-server value 8.8.8.8
 vpn-access-hours none
 vpn-simultaneous-logins 3
 vpn-idle-timeout 30
 vpn-session-timeout none
 vpn-filter value EZVPN-ACL
 vpn-tunnel-protocol ikev1
 group-lock none
 split-tunnel-policy tunnelall
 split-tunnel-all-dns enable
 vlan none
 nac-settings none
!
username MyVPNUser password MyPassword
username MyVPNUser attributes
 vpn-group-policy MyGroupPolicy
!
tunnel-group MyTunnelGroup type remote-access
tunnel-group MyTunnelGroup general-attributes
 default-group-policy MyGroupPolicy
tunnel-group MyTunnelGroup ipsec-attributes
 ikev1 pre-shared-key MySecretKey

The Tunnel Group defines the preshared key for the connection that was referenced in the group MyTunnelGroup key MySecretKey command on the client. The Tunnel Group config also points to a Group Policy that will control the policy for the tunnel. I created a new policy, but you could also use the default DfltGrpPolicy if it fit your needs.

Conclusion

The beautiful thing about EZVPN is that all of the policy aspects are controlled at the Server side. So while the current requirement is to block access to internal resources, I could easily change that on the server side without worrying about messing up the config on the client and bringing the tunnel down.

IT Crisis Management: Responding to and dealing with network outages

I’ve worked in the financial industry for almost a decade now, and in that time I’ve seen my fair share of network/system outages. When I started, I heard plenty of stories about “RGE’s” or “Resume Generating Events” as they were called around the office, and was jokingly advised to always keep my resume up to date. That’s not bad advice by itself, but I wanted to share my opinion on dealing with outages and the aftermath.

When I started, I lived in fear of what would happen if something went down during my watch. My heart skipped a beat each time an alert popped up on our monitoring system. And the first time an outage occurred, I was sure it was going to cost me my job.

Thankfully, that didn’t happen.

Over the years, not only have I grown less fearful of outages, I’ve learned that there are two phases to an outage: The outage itself (including the resolution), and the post-mortem.

The Outage

Obviously, if something is going wrong, the primary objective should be restoring service as soon as possible. Key to a rapid resolution is having clear channels of communication with other teams and knowing the most important people to bring into the loop during an issue. This is called being transparent about issues. Don’t wait to open channels of communication to partner teams or managers. In the financial industry information is key. Clients might be unhappy that you’re having problems, but they will be downright livid if you had an issue and you didn’t warn them. This can also have liability issues attached, so be sure to think out these things with your management.

Also vitally important is a strong understanding of the network, how things connect to each other, and having a good monitoring system in place. It almost goes without saying, but if you don’t know how anything relates to each other, and you have no way to monitor your systems, you’ve got some other issues to deal with.

The Post Mortem

You’ve solved the issue and things are working smoothly again — disaster mitigated! What happens after the outage, is in my experience, equally as important as resolving the issue itself. I believe this is where you can really shine and set yourself apart.

A mentor of mine once told me that what people really want after an outage is the answers to three questions:

  1. What happened?
  2. What did you do to fix it?
  3. When did you know about it?

And he followed up by saying that out of the three questions, the last was really the most important. His opinion, and one I share now, is that accidents happen, equipment breaks, circuits get the backhoe treatment, etc. and you generally can’t totally avoid outages. What you can do (besides practicing good operational principles) is respond promptly to your alerting system or reports from users, and show initiative by investigating on the first call and not on the 10th. Act quickly and openly.

Depending on the nature of your team, it’s likely that only a few people will understand the true technical cause of an issue. As you communicate up the chain, the gruesome details will be lost. Its like the telephone game:

Engineer to manager: "We started taking excessive CRC errors on an interface without the line actually going down."

Manager to CTO: "We had a line problem."

CTO to CEO: "It broke."

Ok, so maybe that’s a little too simplified, but you get the idea. You don’t speak the same language as the upper echelon of management, and conversely, they don’t speak your low-level technical language. All they care about is when did we know about it, and how soon did we fix it.

Everything Else

This is not meant to be an exhaustive description of everything that should happen before/during/after an outage. I also don’t want to get into the details of what constitutes good operational principles, but I’m thinking of things like exercising caution when implementing changes, not being cavalier about production environments, and only making changes after following proper change management procedures , and then only during approved change windows.

I’m also aware that there are different cultures around the human side of outages. Some companies or managers will roast you and leave you for dead no matter how small the misstep, while others will have a more forgiving approach. That’s another topic as well.

Just remember that no matter how deep your bureaucracy, or how carefully you plan, there will always be unexpected issues. Just be sure that you do everything you can to be responsive and transparent about the issues you face.

Have thoughts on this subject? I’d love to hear your comments!

FHRP Filtering on Cisco ASR1001 with OTV

I’m finally getting the chance to deploy OTV and LISP in a live environment and wanted to share one of the issues I’ve run into.

As I mentioned in my post about OTV Traffic Flow Considerations, using HSRP (or VRRP/GLBP) at each site has the potential to cause traffic to “trombone” through the network in a sub-optimal path. Because of this behavior, FHRP filtering should be configured on your OTV routers to ensure that the HSRP device on each side of the overlay becomes an active gateway for the network. The ASR1001 is supposed to have this built-in.

Here’s the topology:

Production OTV Diagram

The Problem

After I setup OTV and LISP, I noticed that I had spotty connectivity to my host inside the overlay. A continuous ping revealed that I was missing a ping or two almost every 60 seconds. When I looked at the route for that host, the age was always less than 1 minute. Since these routes are redistributed into OSPF, I went back to the OTV/LISP routers and tried to see what was happening.

On the OTV/LISP routers, I could see that the local Lisp routes were also being inserted and withdrawn regularly, which meant that Lisp thought the EID was moving to the other router. Since the LISP mapping system is in charge of communicating EID-to-RLOC mapping changes, I ran debug lisp control-plane map-server and observed the following output (abbreviated):

Oct  2 11:09:41.623 EDT: LISP: Processing received Map-Notify message from 10.5.7.82 to 10.5.8.82
...
Oct  2 11:09:41.623 EDT: LISP-0: Local dynEID MOBILE-VMS IID 0 prefix 10.78.1.245/32, Received map notify (rlocs: 1/1).
Oct  2 11:09:41.623 EDT: LISP-0: Local dynEID MOBILE-VMS IID 0 prefix 10.78.1.245/32, Map-Notify contains new locator 10.5.7.82, dyn-EID moved (rlocs: 1/1).

Since I hadn’t moved the VM across the overlay, It surprised me to see that LISP thought the VM was moving. After banging my head on the wall with that issue, I started looking lower in the stack at OTV.

During normal operation, the OTV routing table on the local OTV router (router closest to the host) should look like this:

SAV-OTVRTR2#sh otv route
...
OTV Unicast MAC Routing Table for Overlay1

Inst VLAN BD     MAC Address    AD    Owner  Next Hops(s)
----------------------------------------------------------
0    800  800    0000.0c07.ac4e 40    BD Eng Gi0/0/0:SI800
...

Note the route for 0000.0c07.ac4e , which is the MAC for HSRP group 78. This is a FHRP address, so should it even be showing up? Since it was there I assumed that the FHRP filtering must only prevent the route from being advertised to OTV neighbors.

But during one of the blips, I noticed this:

 Inst VLAN BD     MAC Address    AD    Owner  Next Hops(s)
----------------------------------------------------------
0    800  800    0000.0c07.ac4e 30    ISIS   RAD-OTVRTR2

So not only was the HSRP MAC showing up with FHRP Filtering enabled, but it was also still being advertised across the network. This shouldn’t be.

The Solution – for now

I opened a TAC case and consulted with Cisco about the issue. They agreed that it was “odd” that the HSRP information was leaking across the overlay and recommended I put in an ACL to block FHRP information:

mac access-list extended otv_filter_fhrp
 deny   0000.0c07.ac00 0000.0000.00ff host 0000.0000.0000
 deny   0000.0c9f.f000 0000.0000.0fff host 0000.0000.0000
 deny   0007.b400.0000 0000.00ff.ffff host 0000.0000.0000
 deny   0000.5e00.0100 0000.0000.00ff host 0000.0000.0000
 permit host 0000.0000.0000 host 0000.0000.0000

…and apply the ACL to the OTV Inside interface.

You might notice that OTV automatically adds another ACL:

Extended IP access list otv_fhrp_filter_acl
    10 deny udp any any eq 1985 3222 (57416 matches)
    20 deny 112 any any
    30 permit ip any any (51921 matches)

This ACL blocks the UDP ports used for HSRP and GLBP, as well as IP Protocol 112, VRRP. This must be the portion that is added by default, but it doesn’t seem to be sufficient.

Conclusion

I asked Cisco about why the extra ACL was necessary when the documentation indicates that FHRP was built-in and enabled by default. As soon as I hear something I’ll provide an update. As far as I know the Nexus 7K still requires you to manually configure these ACL’s, but it seems that, for now, so do the ASR’s.

 

Update: I heard back from Cisco TAC about my issue and they think my problem stems from the fact that I’m trying to use the same physical hardware for both the L2 bridging and the L3 gateway:

Due to the ASR1k architecture, it is recommended that you move FHRP off the ASR. It is unlike N7k architecture where we can keep FHRP on the same device and use a mix of MACLs, VACLs, etc to filter out the virtual MAC from going across the overlay. The only way to really prevent the virtual MAC from being learned across the overlay is to prevent the ASR from ever learning it in the first place.

In regards to the default OTV FHRP filtering, TAC confirmed that the otv_fhrp_filter_acl is added when OTV is configured.  It doesn’t attempt to prevent L2 information from being learned however — it only attempts to block actual HSRP communication across the overlay.

Multicast data check with Perl IO::Socket::Multicast

I spend a lot of time dealing with multicast, and one of the more tedious tasks is verifying that I’m receiving data across all of the groups I need. Whether it’s setting a baseline prior to a network change, or verifying everything works after the change, it’s an important task that gets more and more tedious as the number of groups increases.

I’m not a programmer, but I play one on TV.

I decided to try to write something in Perl, which is no small task considering I’ve never excelled when it comes to coding/scripting. Fortunately I know enough to be dangerous, and by combining some elements I’ve seen across various forums and reading the perl documentation, I was able to cobble together something that accomplishes the task of automating my tests.

In my first version I was able to successfully read in group information from a file, and wait for data on each line. The problem was that the program executed serially, so if I had a list of 50 groups, and I needed to wait a full 60 seconds for line, I could potentially wait 50 minutes for the test to complete.

The second version forks those tests out to individual child processes, dramatically decreasing the time it takes to verify a large list of groups.

I’m posting the code here in case someone out there is struggling with a similar issue, and is also not a programmer by nature. I’m sure there are lots of better ways to accomplish this in code. As I said, I’m not a programmer. But this seems to work. If you have any comments on ways to make it more efficient or handle something better, I’d love to hear about it in the comments.

Code


#!/usr/bin/perl

use strict;
use warnings;
use IO::Socket::Multicast;


my $grouplist = $ARGV[0];

my $GROUP = '';
my $PORT = '';
my @childproc;
my $data;
print "Parsing File...\n";

##Attempt to open the file provided on the CLI
open(FILE, $grouplist) || die "can't open file: $!";

my @list = ;

print "Contents: \n", @list;

print "Processed Data:\n";
foreach (@list) {
        chomp($_);
        ($GROUP, $PORT) = split (':',$_);
        my $pid = fork();
        if ($pid) {
                push(@childproc, $pid);
        } elsif ($pid == 0) {
                datacheck ($GROUP, $PORT);
                exit 0;
        } else {
                die "Couldn't Fork: $!\n";
        }
}

close(FILE);

foreach (@childproc) {
        my $tmp = waitpid($_, 0);
}

print "End of Testing\n";

sub datacheck {
        my $GROUPSUB = $_[0];
        my $PORTSUB = $_[1];
        my $msock = IO::Socket::Multicast->new(Proto=>'udp',LocalAddr=>$GROUPSUB,LocalPort=>$PORTSUB,ReuseAddr=>1);
        $msock->mcast_add($GROUPSUB) || die "Couldn't set group: $!\n";

    eval {
        $SIG{ALRM} = sub { die "timeout" };
        ## The Alarm value decides how long to wait before giving up on each test
        alarm(10);
        $msock->recv($data,64);
        alarm(0);
        };

        if ($@) {
                print "$GROUPSUB:$PORTSUB -- No data received\n";
        } else {
                print "$GROUPSUB:$PORTSUB  -- Data received\n";
        }

}

I’ve tested this on CentOS with Perl 5.8.8. This script requires the IO::Socket::Multicast perl module, so be sure to install that before you attempt to run it.

The script expects you to provide the name of a text file that contains a group:port mapping per line.

Update: I discovered that I could run into problems if my list ever contained more than one of the same group, or more than one of the same port. After consulting the module documentation, I realized that I needed to add the LocalAddr and the ReuseAddr options to the constructor code.