Wednesday, May 16, 2007

RRDtool

Understanding what is really going on with a complex system is a tough job - fortunately there is a relatively easy to use tool that can help you paint a visual picture.

That tool is RRD written by Tobias Oetiker . An RRD is a "Round Robin Database" the principle is that you collect data over time, and the most interesting data is recent, but you still want to know about historic data in general. What this means is that you can define a data set to say have a granularity of every 15 minutes for the last week, but by the time you get to looking at data from a year ago, it only has a 12 hour granularity.

The beauty of this tool is that once you get it setup and start shoving data into it, it manages all the magic of compressing your data over time for you. Just keep pushing things in and it'll take care of the rest.

Now if this was all it did, it wouldn't be terribly useful - so where it's real power comes in is the ability to generate graphs. Now instead of just seeing a giant pile of numbers in your reports, you can actually see the flow of how your operation functions over time. This can help you zero in on problem areas, particularly the ones that happen at 2 am when no one is watching all the closely.

Another huge value of having tons of data shoveling into these RRD graphs all the time is that many problems don't spring up over night. So when that service starts flaking out, and you don't know why, you can go look at your graphs and see "oh wow, it's increasing the amount of Disk I/O performed every day for the last week" - so you can zero in on what action you need to take to fix a situation much more rapidly. If you aren't collecting that data all you would know is that you were now out of disk I/O and you might assume your only option is to add disks, because you can't see that the problem only really started a week ago when you added a poorly written application to the server that is slowly gobbling up all your I/O.

What I'm trying to paint out with this series of blogs is the fact that there are some excellent tools out there, and you can make really amazing things if you know how to find and then use these tools. I started this all off by talking about how I'm a perl junkie - the fact is perl is exceptional at stringing all kinds of totally unrelated applications together to make something new and uniquely useful.

On a closing note relating to RRD I would encourage you to also take a look at Cacti. The cacti developers have put together a pretty click tool with tons of templates, so if you have a relatively standard environment (routers, switches, servers, etc) odds are good you can drop Cacti in and hit the ground running with great graphs of the most important things you need to keep an eye on. You should spend you time solving new and interesting problems, not re-solving old problems.

1 comment:

Unknown said...

Also Munin is a nice tool if you need to easily add new simple graphs from your own custom services, etc.