Saturday, October 27, 2007

Splunk is for log files

Splunk is for log files


Log files and I have a sort of love-hate relationship. I know that if I look hard enough I'll find what I need, but sometimes I'm really not in the mood to have my eyes glaze over scrolling through all of the junk to find what I want. With a little doing, Rails logs can be made to be pretty damned informative. Still, I'm not constantly talking to all of my users and most of the time, when a user hits an error they won't always take the steps to tell you.

Of course, error reporting is only one use for log files, and if you've got a good tool to aggregate and scrape them for you you'll begin to realize just how useful they are. I was recently introduced to a really neat one called Splunk by a friend. I'm not going to go over how great it is here, but I will say that you can grab the free version for yourself and give it a try. It's almost entirely as functional as any of the pay versions and will eat 500mb of data a day for you.

So I've been playing around with this thing for about a month now and I've gotten it to do a few neat things on its own (track memory usage on programs I've suspected were acting somewhat naughty, track my battery usage and efficiency, etc.) but this post is about integrating it directly into Rails using Splunk's REST api. There's already plenty of information on the net about what REST is and why it is as cool as it is so I'm not going to get into that here. Splunk on Rails is basically just a mirroring of Splunk functionality into any Rails application you choose.






Get Splunk


Get Splunk on Rails

Before we get into the meat of things, installation on both ends is pretty simple. If you're running Debian like me, there's a convenient deb package, although installing Splunk from a tarball is just as easy. Just extract it under /opt and invoke it using:

> sudo ./opt/splunk/bin/splunk start

Then all you need to do is drop my plugin into your Rails app's lib folder and to call it: require 'splunkbase'

*If you're running Splunk remote or on a non-standard port, don't forget to change the SERVER variable in the plugin file

Usage case 1:

You want to make sure your application is running bug-free. But, when one does pop up, you need to know.

To begin with, create a new controller into which we're going to put some splunk goodies. I named mine SplunkController, but you can be more creative.


class SplunkController < ApplicationController
require 'splunkbase'
@@foo = SplunkBase.new
def index
end
def reports
@document =" @@foo.splunkSearch('q' => params[:query])
end
end

This is really nothing more than making available the response from splunk to your view in the form of a variable. Defining a page for it to be displayed in is no more difficult than:

<pre><%= @document %></pre>

Now we build the index page we defined so we can pass it the query:
<html>
<head>
<%= javascript_include_tag "prototype" %>
</head>
<body >
<%= form_remote_tag(:update => "graphDiv",
:url => {:action => :reports }) %>
<%= text_field_tag :query, nil, {:size => "100"} %>

<%= submit_tag "Get a report on your query" %>
<%= end_form_tag %>
<div id="graphDiv">
</div>
</body>
</html>

And believe it or not we're ready to start asking Splunk some questions. Try giving it something like:

[search sourcetype::what_you_named_your_source error starthoursago=24] | outputxml

As you've probably gathered, that'll give you a formatted list of all of the errors that have occured in the last day.






Usage case 2:

You've got a rails application running internally that you don't have the option to/don't feel comfortable with outsource analysis to something like Google Analytics.

Back to our cute little controller, we add in a new definition for the graphing page.


class SplunkController < ApplicationController
require 'splunkbase' #those two magical words
@@foo = SplunkBase.new
def index
end
def reports
@document = @@foo.splunkSearch('q' => params[:query])
end
def graph #for graphing, this fixes things up so we can display the data
@datahash = {} @queryDoc= @@foo.splunkSearch('q' => params[:query]) #here is the meat
@queryDoc.each_element("//r/") do |ele| #here we're sorting out what is useful
@datahash[ele.elements["m[@col='1']"].text] = ele.elements["m[@col='2']"].text.to_i
end
@sorted = @datahash.values.sort.reverse #sorting it for the hell of it
@chartheight = @datahash.values.max + 50 #to make it look pretty and consistent
end
end

This example is fairly simple and assumes you're just looking for basic metrics on your site's usage. You could build it larger to accept whatever you want splunk to throw back at you. This one expects to see something like "Content Name" => "value".

Now let's take a stab at setting up the graph:

<div id="xmldata">
<samp><%= @queryDoc.to_s %></samp> #gives us a raw return of the data we pulled from Splunk
</div>
<% @sorted.each do |name, height| %> #Iterate through each of the data pairs and grab the height.
<div class="columnSpacer">
<div style="margin-top: <%= 100 - ((height * 100)/@chartheight.to_f) %>%"class="graphTitle">
<%= name %>
<br>
<%= height %> hits
</div>
<div class="graphColumn" style="height: <%= (height * 100)/@chartheight.to_f)%>%">

</div>
<% end %>

In the interest of keeping things from getting too esoteric I've committed a no-no and left some programming in the view. All in all, it's pretty light math to get things displaying properly. As you should be able to glean from the code presented we're just iterating through each of the name/value pairs we extracted from the XML Splunk returned and turning them into pretty little bars on a chart. Now all we need to do is put together the index page for accessing the graph function.

<html>
<head>
<%= javascript_include_tag "prototype" %>
</head>
<body >
<%= form_remote_tag(:update => "graphDiv",
:url => {:action => :graph }) %>
<%= text_field_tag :query, nil, {:size => "100"} %>

<%= submit_tag "Get a report on your query" %>
<%= end_form_tag %>
<div id="graphDiv">
</div>
</body>
</html>

There's pretty minimal monkey business here, so let's go on to the fun part:


Queries


How about we take a look at what controllers are getting the most face-time for our users. This will give you an idea of what content sections are being percieved as being the most useful. This example comes from a site I did recently for a client and happens to be the most handy rails logfile I have within reach.
Pop in the query:
[search sourcetype="the_name_you_gave_your_source" | top 5 controller ] | outputxml



There are a myriad of options available to you through splunk's search interface, and learning to romance the queries to give you what you want would be a section all its own. This one, however consists of limiting the scope of the search ( sourcetype= ) and giving it a context to put it in ( top 5 controller ) -- in this case, the top five controllers.



Next post I will cover the possibilities afforded with the use of bundles in Splunk in conjunction with your Rails application. In the meantime I highly suggest you peruse the REST API documentation supplied in your Splunk install and the admin/developer documentation on Splunk.com to get a more in-depth understanding of what you can do.