Friday, August 31, 2007

Assumptions and Critical Thinking

A couple of weeks ago I programmed a DB load process that ran off a cron job. It ftp-ed data files from a public box to a local one, and then loaded the data based on .dat files that were extracted from the download, and .CTL files already on hand. I assumed this process worked correctly because every time I ran it manually it worked, and when I started it using cron, it 'looked' like it workded. It turned out that when it ran off the cron, it didn't do everything I wanted it to do. I set about debugging this today, and it turned out that my problem was very similar to my PLSQL problem from earlier this week.

Part of the Load process used a java program to created create the .dat files. The java worked fine when I ran it manually because the user I was running it with had JAVA_HOME set in it's profile. Cron runs differently. Even though my cron job su's to the proper user, doing the same thing manually and then checking with a 'who am i' command (which, by the way, gets different results than a 'whoami' command) returns a surprise. It's not my su'ed user. Adding JAVA_HOME into the script that gets run solved the problem.

Been looking at graphing solutions for a little thing I'm working on. GDchart seems usable. I might do some more work with it on Tuesday next week. FusionCharts seems very interesting as well. I've worked with Cewolf and JFreeCharts in the past, but I need something quicker right now.

Picked up a book called Th!nk off the discount shelf at Cole's today. I'm liking it so far. It's something of a rebuttal to another book called Blink!.

Thursday, August 30, 2007

Nagios and things part 2

So I got to work this morning (with a fresh mind) and after a bit of googling figured out my problem in 30 minutes. Basically it boiled down to adding a reference to ORACLE_HOME in the bash script itself, and the full path of SQLPlus. After that, things ran fine even with changing the permissions of the scripts and related files back to normal and running Nagios with the Nagios user.

One of the 'nice to haves' for this little monitoring things I'm working on in Nagios is graphing capability - so we can see trends of the performance data at a glance. Just before I left today, I discovered that Nagios might install with this capability commented out. I let you know for sure in my next post.

While I was working on solving the graphing problem a different way, I got focused on some graphing addins for Nagios that required perl modules I didn't have. I learned that cspan is basically like 'yum' for perl. Didn't know what it was before today. In fact, I haven't had any exposure to perl before this project. It's all good - learning new stuff every day.

I also learned a big way to correct my golf swing today. I need to grip hard with my left hand and the right hand is just for support. I was driving much better and more consistently after I made that adjustment.

Wednesday, August 29, 2007

Nagios and things

I've done a couple of installations of Nagios now, and I quite like as a monitoring tool. Today I was working on coding a custom plug-in so we could keep an eye on a particular KPI. This involved running something of a complex query in Oracle and then sending the results to Nagios.
Writing the shell script wasn't difficult. The Nagios plug-in developer documentation was clear enough for me to follow. It took me a few minutes to figure out how to return the codes that Nagios is looking for - 'exit 1, or exit 2, etc'. However, to try and get Nagios to automatically run SQLPlus is a totally bear. I'm going to sleep on it tonite. I've tried changing perms on all the files, even tried changing the ownership to oracle and setting the SUID - no dice. Tried running Nagios as root (not recommended, but I had to try), and set all the Oracle environment variables in the .bash_profile for all the users I could think of that might possibly be running that code. I tried doing a su -c 'call my script here' oracle, and that didn't work either.

Oracle posts online regarding SQL*Plus 43 initiation errors etc. are pretty sparse. In my opinion, it's just more fodder for people who advocate using 3rd party free ware. At least if it's broken, somebody has likely run into the same issue and posted it online. With Oracle and Tibco, it seems like they guard their support forums with their life (maybe Oracle not so much so).

Monday, August 27, 2007

Oracle, Tibco, and tcpdump

I've started a new project recently that uses Tibco with an Oracle backend. It still amazes me the little nuances of domain knowledge that are so key to being a productive developer on some of these 'platforms'.
My issue today was a query I was trying to run using the jdbc 'component' in Tibco. It's pretty finicky about SQL syntax. I needed aliases for column names and it turned out the only way to get those to work is to use double quotes. Single quotes work if you want to hardcode what you're returning in your select statement. Double quotes (I was told) are effectively ignored by Oracle unless you're trying to alias a column. I'm starting this blog mainly so I can keep these little nuances documented somewhere - I hope it'll help me remember in the future.

I ran into a different configuration problem with Tibco last week which I had to use tcpdump to help me solve. I used the command something like this: tcpdump -nnXi eth0 -port 8080. This helped me realized that Tibco was responding to http GET and not http POST requests.