I started re-reading Release It! by Michael Nygard this morning on the commute into work. In his first chapter he talks about a (very small) issue that turns into a colossus and takes down an airline's check-in system. The system had a monitor configured and performing checks on it, but it turned out that it wasn't checking the right things (it was looking at the http port on transactional servers when it should have been looking at the RMI port).
It totally reminded me of something that happened about a month ago. We have a bunch of web applications that run on our production server. After fine tuning our monitoring to look at pages that the application has to apply logic to to server up (rather than a static home page) we found that our monitoring corresponded much closer to complaints from users.
Think twice about what you want to monitor and where to point it.
No comments:
Post a Comment