Wednesday, August 29, 2007

Nagios and things

I've done a couple of installations of Nagios now, and I quite like as a monitoring tool. Today I was working on coding a custom plug-in so we could keep an eye on a particular KPI. This involved running something of a complex query in Oracle and then sending the results to Nagios.
Writing the shell script wasn't difficult. The Nagios plug-in developer documentation was clear enough for me to follow. It took me a few minutes to figure out how to return the codes that Nagios is looking for - 'exit 1, or exit 2, etc'. However, to try and get Nagios to automatically run SQLPlus is a totally bear. I'm going to sleep on it tonite. I've tried changing perms on all the files, even tried changing the ownership to oracle and setting the SUID - no dice. Tried running Nagios as root (not recommended, but I had to try), and set all the Oracle environment variables in the .bash_profile for all the users I could think of that might possibly be running that code. I tried doing a su -c 'call my script here' oracle, and that didn't work either.

Oracle posts online regarding SQL*Plus 43 initiation errors etc. are pretty sparse. In my opinion, it's just more fodder for people who advocate using 3rd party free ware. At least if it's broken, somebody has likely run into the same issue and posted it online. With Oracle and Tibco, it seems like they guard their support forums with their life (maybe Oracle not so much so).

No comments: