{"id":399,"date":"2006-03-13T16:36:20","date_gmt":"2006-03-14T00:36:20","guid":{"rendered":"http:\/\/www.krunk4ever.com\/blog\/?p=399"},"modified":"2006-03-13T16:57:57","modified_gmt":"2006-03-14T00:57:57","slug":"referer-parser","status":"publish","type":"post","link":"https:\/\/www.krunk4ever.com\/blog\/2006\/03\/13\/referer-parser\/","title":{"rendered":"Referer Parser"},"content":{"rendered":"<p>So I spent most of yesterday working on relearning how to parse Apache logs and how to get it automated and onto a working online user interface, which was rather fun. Haven&#8217;t really had much time to do web development after I started working, hence why I decided to use an existing framework and slowly build off of that.<\/p>\n<p>So what I ended up with is a simple parsing of my logs directory. It&#8217;s a bit unpolished, but for now, it&#8217;ll do.<\/p>\n<ul type=\"disc\">\n<li><a href=\"#\">access.log<\/a><\/li>\n<li><a href=\"#\">access.log.0<\/a><\/li>\n<li><a href=\"#\">access.log.2006-03-08.gz<\/a><\/li>\n<li><a href=\"#\">access.log.2006-03-09.gz<\/a><\/li>\n<li><a href=\"#\">access.log.2006-03-10.gz<\/a><\/li>\n<li><a href=\"#\">access.log.2006-03-11.gz<\/a><\/li>\n<li><a href=\"#\">access.log.2006-03-12<\/a><\/li>\n<\/ul>\n<p>* links do nothing in this preview<\/p>\n<p>When you click on a link, it calls the tool I wrote called and passes in the log file I want (i.e. <strong>parse.php?log=access.log.2006-03-10.gz<\/strong>). The tool can distinguish between compressed files (.gz) and raw files (no extension). If it detects it&#8217;s a gzip file, it&#8217;ll decompress it and then rerun the tool with the location of the extracted file.<\/p>\n<p>What the tool does is generate the results in XML format, similar to the following:<\/p>\n<p><code>&lt;domainlist&gt;<br \/>\n&lt;domain name=\"search.yahoo.com\"&gt;<br \/>\n&lt;referer from=\"http:\/\/search.yahoo.com\/\" to=\"GET \/blog\/ HTTP\/1.0\" date=\"2006-03-11 04:01:56\" ip=\"125.247.105.242\" \/&gt;<br \/>\n&lt;\/domain&gt;<br \/>\n&lt;\/domainlist&gt;<\/code><\/p>\n<p>I actually group the entries by domain. There&#8217;s no sorting yet, but I didn&#8217;t find that necessary in my case, since all I really wanted was a easy way to see who&#8217;s linked my site and be able to collapse all the google ones if they&#8217;re not interesting.<\/p>\n<p>After it generated this file, it would automatically zip that file (reduces my ~2MB xml files to ~100KB) and then email it to me. I&#8217;m trying to figure out if I&#8217;m allowed to schedule to run this once a day instead of having to do this on demand.<\/p>\n<p>The next thing I needed was an XML viewer. I was contemplating at first to use IE or Firefox, but decided against it. IE was finicky about expanding and collapsing items. Firefox just hang when trying to load a 2MB xml file. So I searched online and the first couple of links in my search result ended up with either: <a href=\"http:\/\/www.mindfusion.org\/product1.html\">MindFusion XML Viewer<\/a> or <a href=\"http:\/\/www.alphaworks.ibm.com\/tech\/xmlviewer\">IBM XML Viewer<\/a>. The IBM one is no longer available, so I went ahead and got the MindFusion one. It was free after all and the screenshot looked decent. At first it didn&#8217;t show me all the information I wanted as fast as possible, but after enabling Name and Value in tree and Expand On Load, it was a lot better.<\/p>\n<p>The final thing I wanted to do was to require authentication in order to load this page. Turns out there was a easy way just by creating a users file (<code>htpasswd -c ~\/users username<\/code>) followed by a .htaccess point to that file. You can find more information on how to do this if you&#8217;re running Apache by clicking on the link under my reference list below.<\/p>\n<p>Here&#8217;s an example: <a id=\"p400\" href=\"access.log.2006_03_11.zip\" title=\"access.log.2006-03-11.gz\">access.log.2006-03-11.gz<\/a><\/p>\n<p>Finally, to give credit for references I used:<br \/>\n<a href=\"http:\/\/www.php-editors.com\/contest\/1\/78-read.html\">Apache log file parser class<\/a><br \/>\n<a href=\"http:\/\/phpmailer.sourceforge.net\/\">PHPMailer<\/a><br \/>\n<a href=\"http:\/\/www.apacheweek.com\/features\/userauth\">Using User Authentication<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>So I spent most of yesterday working on relearning how to parse Apache logs and how to get it automated and onto a working online user interface, which was rather fun. Haven&#8217;t really had much time to do web development after I started working, hence why I decided to use an existing framework and slowly &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/www.krunk4ever.com\/blog\/2006\/03\/13\/referer-parser\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Referer Parser&#8221;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[11],"tags":[],"_links":{"self":[{"href":"https:\/\/www.krunk4ever.com\/blog\/wp-json\/wp\/v2\/posts\/399"}],"collection":[{"href":"https:\/\/www.krunk4ever.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.krunk4ever.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.krunk4ever.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.krunk4ever.com\/blog\/wp-json\/wp\/v2\/comments?post=399"}],"version-history":[{"count":0,"href":"https:\/\/www.krunk4ever.com\/blog\/wp-json\/wp\/v2\/posts\/399\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.krunk4ever.com\/blog\/wp-json\/wp\/v2\/media?parent=399"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.krunk4ever.com\/blog\/wp-json\/wp\/v2\/categories?post=399"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.krunk4ever.com\/blog\/wp-json\/wp\/v2\/tags?post=399"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}