Some time ago I posted a nagios plugin I had knocked together for monitoring network interfaces using snmp. Much to my surprise, a number of people have found the plugin useful and suggested some enhancements to make it more useful. Ive finally taken some time to implement those changes and here are the results…
# ./check_snmp_ifstatus.pl --help
Usage: check_snmp_ifstatus.pl -H hostaddress -i "if_description" [-b if_max_bps] [-C snmp_community] [-v 1|2] [-p snmp_port] [-w warn] [-c crit]
Options:
-H --host STRING or IPADDRESS
IP Address of host to check. Required.
-i --interface STRING
Full name or numeric index number of the interface to check. Required.
Examples would be "eth0", "FastEthernet0/1", 1 or 65539
-C --community STRING
SNMP Community string. Optional. Defaults to 'public'
-v --version INTEGER
SNMP Version ( 1 or 2 ). Optional. Defaults to 1
-p --port INTEGER
SNMP port. Optional. Defaults to 161
-b --bandwidth INTEGER
Interface maximum speed in bits per second. Optional.
Use this to override the value returned by SNMP if it lies about
the max speed of the interface.
-6 --64bit
Use 64bit counters for bandwidth usage. Not available on all devices.
-w --warning INTEGER
% of bandwidth usage necessary to result in warning status. Optional.
Defaults to 85%
-c --critical INTEGER
% of bandwidth usage necessary to result in critical status. Optional.
Defaults to 98%
The major difference for this new version is that it now allows you to input the ifIndex number of the interface you wish test directly rather than relying on the device supporting the ifDesc oid. This means you can now use it to test interfaces on Microsoft Windows computers and devices that dont have unique descriptions for each interface. For example, if the result of an snmpwalk of the device showed IF-MIB::ifIndex.1 = INTEGER: 1 and IF-MIB::ifDescr.1 = STRING: eth0 you would be able to refer to this interface as either -i eth0 or -i 1. For a MS Windows computer, you would typically use -i 65539 for the first LAN interface.
You may also notice that I have added some preliminary support for reading the 64bit counters on devices that support this. Please note that this is entirely untested, as I do not have any devices to test it on. If you do try this out and it works for you, please let me know.
Installing the plugin is very straight-forward. First, download the check_snmp_ifstatus_v2.tar file and extract the check_snmp_ifstatus.pl file from it.
# tar -xf check_snmp_ifstatus_v2.tar
Copy the check_snmp_ifstatus.pl file to your /usr/lib/nagios/plugins/contrib folder.
Add a command to reference the plugin in your /etc/nagios/objects/commands.cfg file, like so
define command {
command_name check_snmp_ifstatus
command_line /usr/bin/perl $USER1$/contrib/check_snmp_ifstatus.pl -H $HOSTADDRESS$ -i $ARG1$ -w $ARG2$ -c $ARG3$ $ARG4$
}
Add a service entry for each host you want to check in your /etc/nagios/objects/services.cfg file, like so
define service{
use generic-service
check_command check_snmp_ifstatus!"Ethernet0/0"!85!98
service_description Network Status: e0/0
normal_check_interval 5
retry_check_interval 1
host_name cisco1
}
define service{
use generic-service
check_command check_snmp_ifstatus!"eth0"!50!90!-b 1000000000
service_description Network Status: eth0
normal_check_interval 1
retry_check_interval 1
host_name linux1,linux2
}
define service{
use generic-service
check_command check_snmp_ifstatus!65539!25!80
service_description Network Status: LAN
normal_check_interval 1
retry_check_interval 1
host_name winserver1
}
Now verify that you havent made any mistakes in the nagios configuration files
# nagios -v nagios.cfg
And restart the nagios service if all looks well
# /sbin/service nagios restart
You should now see new service entries for your hosts listing the current interface status. Enjoy.
Hello again,
Does this mean that an overflow has occurred? RX=4.903e+08Gbps (4902789121.64%) Obviously the % graph gets skewed way above 100% when this happens….
It sounds like I need to set
Comment got cut off… It sounds like I need to set ~30 second check interval to avoid the overflow but I didn’t think nagios could go below 1 minute. Any ideas how to fix?
So I figured I could set nagios interval to .5 (duh) but that didn’t stop the % spikes. Currently my average RX is 715637281.73%, a bit higher than 100%. 🙂 Thanks in advance for any assistance….
By befault Nagios is set with a check interval of 60 seconds. To change that, edit your nagios.cfg file and look for the ‘interval_length=’ setting. Changing it from 60 to 30 will make nagios schedule checks for once every 30 seconds instead. Note that you’ll probably also want to adjust your normal service checks to account for this as well. Meaning that if you had a service check scheduled for once every 5 minutes before (check_interval 5) you’d now want to wait twice as many intervals (check_interval 10) in order to maintain the same five minute timing.
If you are not polling the interface fast enough you should be receiving a ‘WARNING: Possible counter overflow’ error message back from the plugin. Looking at the performance data, you’ll see ‘elapsed=30s;34;;;’ at the end if you have everything setup correctly to poll every 30 seconds.
Hi Shawn, I’m banging my head on something really odd here. I have 4 interfaces on a router, eth1, 2 and 3 all return traffic via your script ok, eth0 complains that ‘return code of 13 is out of bounds’. Ifd I run the script on the command line, it works fine. It only gives this error running under nagios.
Extract from the config:
define service {
use remote-service
host_name bdr-rt
service_description Mainline Traffic
check_command check_snmp_ifstatus!”eth0″!30!50!100000000
}
define service {
use remote-service
host_name bdr-rt
service_description W4 Platform Traffic
check_command check_snmp_ifstatus!”eth1″!30!50!100000000
}
The first entry fails, the second works.
Changing the command to access the interface by index rather than name produces the same result. I can see eth0 in a snmpwalk fine.
Any clues?
Hi Andy. Two thoughts come to mind. First, do an snmpwalk of oid .1.3.6.1.2.1.2.2.1.2 on your router and verify that all your interfaces are displayed (particularly eth0). Secondly, check your /tmp folder. If you were testing the plugin outside of nagios it might have created a /tmp/iftraffic_* file using your logged in user, and nagios cannot write to that file due to permissions. If that’s the case, just delete the file or chown it to the correct nagios user.
Many thanks Shawn – it was the permissions. I’d seen an earlier answer regarding permissions, but hadn’t actually looked at it – I figured permissions on the directory were what mattered, not that there were state files hiding in there.
Cheers!
Hello,
your script doesnt work correctly with 64bit (snmp v2) counters.
The counters get read correctly but you write them to disk as if they are 32bit (4294967296) max. So traffic statistics will read beyond 100% and are useless.
your code:
printf FILE ( “%s:%.0lu:%.0lu\n”, $update_time, $in_bytes, $out_bytes );
but it should be:
printf FILE ( “%s:%.3f:%.3f\n”, $update_time, $in_bytes, $out_bytes );
Cheers!
Shawn,
sorry, I meant to post this here –
I took your wonderful perl script check_snmp_ifstatus and modified it to do some extra tricks with my minimal programming skills.
1. It now can check the inverse of a port. This is handy on a switch that is not fully utilized. It will notify the admin if a port comes up and how much bandwidth is being used for it.
2. It now checks the host to see if it is available. If not it returns a critical code and says host is down rather then null or unknown status.
3. I also cleaned up the handeling of the 64 bit code. not sure if i wrote the line for writing to the file properly since the way you had it was not working with 64 bit.
I would like to send you the code so you can look at it and possibly post it since i belive in the spirit of the gpl.
Thank you and cheers.
Hi,
This is a great plugin, just about everything I was looking for. I found it realy useful. Very good job. I´ve tested it on a equipment that returns max_if_speed=0 if interface status is down, thus originating “illegal division by zero” errors along the way.
I’ve placed the instruction:
$max_if_speed = 1 if ( $max_if_speed == 0 );
just above the instruction:
if ( $if_speed == 0 ) {
$if_speed = $max_if_speed;
}
to prevent that from happening.
Thank you and cheers!
Hi!!! template dont work with latest pnp4nagios say:
Please check the documentation for information about the following error.
Undefined offset: 51
file [line]:
templates.dist/check_snmp_ifstatus.php [46]: