I recently had a need to increase the granularity of SNMP monitoring on some critical network interfaces, and I thought I’d share what I’ve learned so far.
The reason behind the change was due to an issue where we briefly maxed out the bandwidth on of our WAN circuits. For most companies this would be no big deal, especially since the burst in traffic was less than a minute in duration. However, because our customers are very sensitive to latency, this caused some noticeable delays.
The problem with interval monitoring is that you just don’t know what happens between the polls. The whole idea of interval monitoring is to get an approximation of what’s happening, so by design you’re going to miss some of the details. But how do you know what you miss?
Take a look at this picture to get an idea of what I’m talking about:
As you can see, the longer the time between polls, the more detail is missed. This might not be a problem for you — I would say that the goal of most monitoring solutions is to look for traffic *trends*, and not necessarily sub minute bursts. But if you need the extra detail, then it becomes quite a challenge, and there are limited solutions depending on the size of your budget.
If money is no object, then you can buy one of the fancy network traffic monitors by Corvil or Netscout. These are awesome boxes that record every drop of data you feed through them, so that you can replay the exact issue over and over. You pay handsomely for that functionality.
cPacket networks and Riverbed also have some cool products, but price is definitely a factor as well.
The only way to get this kind of granularity on the cheap is to leverage SNMP and pump up the granularity.
IOS SNMP Configuration
Disclaimer: Do not apply these commands to your production environment until you have tested and fully understand the potential impact
By my rough estimation and testing, most Cisco devices (IOS, IOS XE, and NX-OS), by default, will update their internal SNMP statistics every 10 seconds +/-. But there’s hidden command in IOS that will let you adjust the interval at which IOS updates the internal statistics database.
snmp-server hc poll <interval>
The interval for this command is in hundredths of seconds. So to update at 1 second intervals, you need:
snmp-server hc poll 100
I have used this command successfully on Cisco ISR G2 routers, 7200 routers, and 4900M switches.
This does not work on IOS XE or NX-OS devices, including ASR1000’s and Nexus 7K or 3K’s.
Network Management Systems
You might recall that I’m a big fan of Solarwinds for network monitoring. Unfortunately, the lowest interval you can configure for statistics collection in Solarwinds NPM is 1 minute. So I had to look elsewhere for a product that could poll more frequently.
Enter SevOne. Their platform allows you to configure ‘high-speed pollers’ which can be tuned down to 1 second intervals. The folks at SevOne are great to work with, and I received a lot of good guidance as I was configuring my instance.
This isn’t meant to be a review or comparison of the two products — SevOne solved a very specific problem for me and at a price point that was 1/10th the cost of the next closest solution, so props to them.
The other possible solutions are Cacti or other RRDtool based statistics/graphing systems. Since these are so flexible, I would expect that you could grab stats down to one second with these as well, although I haven’t verified this.
Request for Comments
There’s some discussion out there about the problems you can run into if you poll too frequently, one of which is polling more frequently than the agent updates. I definitely observed this problem as I was working on this, and saw the wild numbers you can get. If you have any thoughts on why polling more frequently would be bad, or might lead you to the wrong conclusions, please feel free to comment.