How To Parse Log Files Using cut Command

How to parse http status code in log file

How to parse http response code in log file

Filter log file using cut command


This post is about “How to parse log files” using cut command. If you want to learn How to parse log files using awk command than visit this link.

Below is a format of the log file we are considering.

example.com:80 208.88.125.227 - - [10/Oct/2012:05:05:01 -0400] "GET /status.html HTTP/1.1" 200 404 "-" "curl/7.19.7 (i486-pc-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8k zlib/1.2.3.3 libidn/1.15"


The very first thing one should know before parsing the log file is the structure of the log file. Like the first column specify the domain with port, than the second column is of IP and similarly the 10th column is of the response code.

After having the knowledge of these, we can parse the required things by running few commands.
Let’s now parse the logs.

The video tutorial explaining this post is also available on youtube.

 

“cut” command

Linux command “cut” is used for text processing. You can use this command to extract portion of text from a file by selecting columns/fields.
Parsing the HTTP response code.

First let’s print out all the response code. As we know the response code is in the 10th column and in the cut command the column number is given by -f10 where 10 is the column/field number.


cat sample.log | cut -d " " -f10


The output would be


200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
404
404
200
200
502
502
200
502
200
200
304
304
404
200
302
200
200
200
200
200
200
200
200
200
200
200
200
404
200
200
200
200
404
200


In the above command we have displayed the 10th column with “space” – ” ” as a delimiter.
Note that this result is not sorted.

Now let’s apply the sort filter to sort these response code.

cat sample.log | cut -d " " -f10 | sort


The output would be sorted.


200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
302
304
304
404
404
404
404
404
502
502
502


Now we have the sorted list of the response code. All 200 response on the top, than 302 and so on.
You must be expecting to count the number of each type of response code, don’t wait, let’s count them by applying the uniq filter


cat sample.log | cut -d " " -f10 | sort | uniq -c


The output will have the count of all the unique items in the 10th column.


  39 200
1 302
2 304
5 404
3 502


This shows that there are 39 responses with 200 status, 1 with 302 status and so on.
As you can see the output is sorted on the basis of response code. What if we want to sort it on the basis of number of count instead of response code.
Let’s do it by applying a sort filter again.


cat sample.log | cut -d " " -f10 | sort | uniq -c | sort


This will sort the output on the basis of number of counts


   1 302
2 304
3 502
5 404
39 200


This is being sorted in increasing order of number of counts.
Let’s sort it in the decreasing order now. Just put -r option in sort


cat sample.log | cut -d " " -f10 | sort | uniq -c | sort -r


Here is the sorted output


  39 200
5 404
3 502
2 304
1 302


So far we have got the desired result in all the required way.
Let’s get more details like which requests threw which error. Observe the fields in log, the 8th field is of the requests. So let’s include the 8th field and 10th field in our command and see the result.


cat sample.log | cut -d " " -f8,10 | sort


Here is the output

* 200
* 200
* 200
* 200
* 200
* 200
* 200
* 200
* 200
* 404
/call-for-participation-at-conf-kde-in-and-fossmeet-2011/ 200
/call-for-participation-at-conf-kde-in-and-fossmeet-2011/ 200
/category/FSC/feed/ 304
/category/bit-lug/ 200
/category/fedora/feed/ 304
/feed/ 200
/my-scripts/ 200
/status.html 200
/status.html 200
/status.html 200
/status.html 404
/wp-comments-post.php 302
/wp-content/themes/cordobo-green-park-2/img/logo-cgp2.png 200
/wp-content/uploads/2010/11/523433.png 200
/wp-content/uploads/2010/11/facebook.png 200
/wp-content/uploads/2010/11/facebook.png 200
/wp-content/uploads/2010/11/facebook.png 200
/wp-content/uploads/2010/11/facebook.png 404
/wp-content/uploads/2010/11/facebook.png 502
/wp-content/uploads/2010/11/flicker1.png 200
/wp-content/uploads/2010/11/linkedin.png 200
/wp-content/uploads/2010/11/linkedin.png 200
/wp-content/uploads/2010/11/linkedin.png 200
/wp-content/uploads/2010/11/linkedin.png 200
/wp-content/uploads/2010/11/linkedin.png 200
/wp-content/uploads/2010/11/linkedin.png 502
/wp-content/uploads/2010/11/rss.png 200
/wp-content/uploads/2010/11/rss.png 200
/wp-content/uploads/2010/11/rss.png 200
/wp-content/uploads/2010/11/rss.png 200
/wp-content/uploads/2010/11/rss.png 200
/wp-content/uploads/2010/11/rss.png 404
/wp-content/uploads/2010/11/twitter.png 200
/wp-content/uploads/2010/11/twitter.png 200
/wp-content/uploads/2010/11/twitter.png 200
/wp-content/uploads/2010/11/twitter.png 200
/wp-content/uploads/2010/11/twitter.png 404
/wp-content/uploads/2010/11/twitter.png 502
/wp-login.php 200
/wp-login.php 200


Now let’s again count the number of each requests with corresponding response code. You already know the trick, just apply uniq filter.

cat sample.log | cut -d " " -f8,10 | sort | uniq -c


The output will have the count of each kind of requests with their corresponding response code. The count would be based on the combination of both, request and response code.


   9 * 200
1 * 404
2 /call-for-participation-at-conf-kde-in-and-fossmeet-2011/ 200
1 /category/FSC/feed/ 304
1 /category/bit-lug/ 200
1 /category/fedora/feed/ 304
1 /feed/ 200
1 /my-scripts/ 200
3 /status.html 200
1 /status.html 404
1 /wp-comments-post.php 302
1 /wp-content/themes/cordobo-green-park-2/img/logo-cgp2.png 200
1 /wp-content/uploads/2010/11/523433.png 200
3 /wp-content/uploads/2010/11/facebook.png 200
1 /wp-content/uploads/2010/11/facebook.png 404
1 /wp-content/uploads/2010/11/facebook.png 502
1 /wp-content/uploads/2010/11/flicker1.png 200
5 /wp-content/uploads/2010/11/linkedin.png 200
1 /wp-content/uploads/2010/11/linkedin.png 502
5 /wp-content/uploads/2010/11/rss.png 200
1 /wp-content/uploads/2010/11/rss.png 404
4 /wp-content/uploads/2010/11/twitter.png 200
1 /wp-content/uploads/2010/11/twitter.png 404
1 /wp-content/uploads/2010/11/twitter.png 502
2 /wp-login.php 200


So now you have learned enough to parse any file using cut command. If you think any example/suggestion that should be included in this post than do let me know in the comments.

Leave a Reply

Your email address will not be published. Required fields are marked *