Referer Parser

So I spent most of yesterday working on relearning how to parse Apache logs and how to get it automated and onto a working online user interface, which was rather fun. Haven’t really had much time to do web development after I started working, hence why I decided to use an existing framework and slowly build off of that.

So what I ended up with is a simple parsing of my logs directory. It’s a bit unpolished, but for now, it’ll do.

* links do nothing in this preview

When you click on a link, it calls the tool I wrote called and passes in the log file I want (i.e. parse.php?log=access.log.2006-03-10.gz). The tool can distinguish between compressed files (.gz) and raw files (no extension). If it detects it’s a gzip file, it’ll decompress it and then rerun the tool with the location of the extracted file.

What the tool does is generate the results in XML format, similar to the following:

<domainlist>
<domain name="search.yahoo.com">
<referer from="http://search.yahoo.com/" to="GET /blog/ HTTP/1.0" date="2006-03-11 04:01:56" ip="125.247.105.242" />
</domain>
</domainlist>

I actually group the entries by domain. There’s no sorting yet, but I didn’t find that necessary in my case, since all I really wanted was a easy way to see who’s linked my site and be able to collapse all the google ones if they’re not interesting.

After it generated this file, it would automatically zip that file (reduces my ~2MB xml files to ~100KB) and then email it to me. I’m trying to figure out if I’m allowed to schedule to run this once a day instead of having to do this on demand.

The next thing I needed was an XML viewer. I was contemplating at first to use IE or Firefox, but decided against it. IE was finicky about expanding and collapsing items. Firefox just hang when trying to load a 2MB xml file. So I searched online and the first couple of links in my search result ended up with either: MindFusion XML Viewer or IBM XML Viewer. The IBM one is no longer available, so I went ahead and got the MindFusion one. It was free after all and the screenshot looked decent. At first it didn’t show me all the information I wanted as fast as possible, but after enabling Name and Value in tree and Expand On Load, it was a lot better.

The final thing I wanted to do was to require authentication in order to load this page. Turns out there was a easy way just by creating a users file (htpasswd -c ~/users username) followed by a .htaccess point to that file. You can find more information on how to do this if you’re running Apache by clicking on the link under my reference list below.

Here’s an example: access.log.2006-03-11.gz

Finally, to give credit for references I used:
Apache log file parser class
PHPMailer
Using User Authentication

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.