How to Parse Log file using awk command
How to parse http status code in log file
How to parse http response code in log file
Filter log file using awk command
This post is about “How to parse log files”. Below is a format of the log file we are considering.
example.com:80 208.88.125.227 - - [10/Oct/2012:05:05:01 -0400] "GET /status.html HTTP/1.1" 200 404 "-" "curl/7.19.7 (i486-pc-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8k zlib/1.2.3.3 libidn/1.15"
The very first thing one should know before parsing the log file is the structure of the log file. Like the first column specify the domain with port, than the second column is of IP and similarly the 10th column is of the response code.
How to parse log files using cut command
After having the knowledge of these, we can parse the required things by running few commands.
Let’s parse the logs now.
Here is the video tutorial explaining the example taken post
“awk” command
awk '{print $10}' sample.log
The output would be
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
404
404
200
200
502
502
200
502
200
200
304
304
404
200
302
200
200
200
200
200
200
200
200
200
200
200
200
404
200
200
200
200
404
200
In the above command we have displayed the 10th column.
Note that this result is not sorted.
Now let’s apply the sort filter to sort these response code.
awk '{print $10}' sample.log | sort
The output would be sorted.
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
302
304
304
404
404
404
404
404
502
502
502
Now we have the sorted list of the response code. All 200 response on the top, than 302 and so on.
You must be expecting to count the number of each type of response code, don’t wait, let’s count them by applying the uniq filter
awk '{print $10}' sample.log | sort | uniq -c
The output will have the count of all the unique items in the 10th column.
39 200
1 302
2 304
5 404
3 502
This shows that there are 39 responses with 200 status, 1 with 302 status and so on.
As you can see the output is sorted on the basis of response code. What if we want to sort it on the basis of number of count instead of response code.
Let’s do it by applying a sort filter again.
awk '{print $10}' sample.log | sort | uniq -c | sort
This will sort the output on the basis of number of counts
1 302
2 304
3 502
5 404
39 200
This is being sorted in increasing order of number of counts.
Let’s sort it in the decreasing order now. Just put -r option in sort
awk '{print $10}' sample.log | sort | uniq -c | sort -r
Here is the sorted output
39 200
5 404
3 502
2 304
1 302
So far we have got the desired result in all the required way.
Let’s get in more details like which requests threw which error. Observe the fields in log, the 8th field is of the requests. So let’s include the 8th field and 10th field in our command and see the result.
awk '{print $8 " " $10}' sample.log | sort
Here is the output
* 200
* 200
* 200
* 200
* 200
* 200
* 200
* 200
* 200
* 404
/call-for-participation-at-conf-kde-in-and-fossmeet-2011/ 200
/call-for-participation-at-conf-kde-in-and-fossmeet-2011/ 200
/category/FSC/feed/ 304
/category/bit-lug/ 200
/category/fedora/feed/ 304
/feed/ 200
/my-scripts/ 200
/status.html 200
/status.html 200
/status.html 200
/status.html 404
/wp-comments-post.php 302
/wp-content/themes/cordobo-green-park-2/img/logo-cgp2.png 200
/wp-content/uploads/2010/11/523433.png 200
/wp-content/uploads/2010/11/facebook.png 200
/wp-content/uploads/2010/11/facebook.png 200
/wp-content/uploads/2010/11/facebook.png 200
/wp-content/uploads/2010/11/facebook.png 404
/wp-content/uploads/2010/11/facebook.png 502
/wp-content/uploads/2010/11/flicker1.png 200
/wp-content/uploads/2010/11/linkedin.png 200
/wp-content/uploads/2010/11/linkedin.png 200
/wp-content/uploads/2010/11/linkedin.png 200
/wp-content/uploads/2010/11/linkedin.png 200
/wp-content/uploads/2010/11/linkedin.png 200
/wp-content/uploads/2010/11/linkedin.png 502
/wp-content/uploads/2010/11/rss.png 200
/wp-content/uploads/2010/11/rss.png 200
/wp-content/uploads/2010/11/rss.png 200
/wp-content/uploads/2010/11/rss.png 200
/wp-content/uploads/2010/11/rss.png 200
/wp-content/uploads/2010/11/rss.png 404
/wp-content/uploads/2010/11/twitter.png 200
/wp-content/uploads/2010/11/twitter.png 200
/wp-content/uploads/2010/11/twitter.png 200
/wp-content/uploads/2010/11/twitter.png 200
/wp-content/uploads/2010/11/twitter.png 404
/wp-content/uploads/2010/11/twitter.png 502
/wp-login.php 200
/wp-login.php 200
Now let’s again count the number of each requests with corresponding response code. You already know the trick, just apply uniq filter.
awk '{print $8 " " $10}' sample.log | sort | uniq -c
The output will have the count of each kind of requests with their corresponding response code. The count would be based on the combination of both, request and response code.
9 * 200
1 * 404
2 /call-for-participation-at-conf-kde-in-and-fossmeet-2011/ 200
1 /category/FSC/feed/ 304
1 /category/bit-lug/ 200
1 /category/fedora/feed/ 304
1 /feed/ 200
1 /my-scripts/ 200
3 /status.html 200
1 /status.html 404
1 /wp-comments-post.php 302
1 /wp-content/themes/cordobo-green-park-2/img/logo-cgp2.png 200
1 /wp-content/uploads/2010/11/523433.png 200
3 /wp-content/uploads/2010/11/facebook.png 200
1 /wp-content/uploads/2010/11/facebook.png 404
1 /wp-content/uploads/2010/11/facebook.png 502
1 /wp-content/uploads/2010/11/flicker1.png 200
5 /wp-content/uploads/2010/11/linkedin.png 200
1 /wp-content/uploads/2010/11/linkedin.png 502
5 /wp-content/uploads/2010/11/rss.png 200
1 /wp-content/uploads/2010/11/rss.png 404
4 /wp-content/uploads/2010/11/twitter.png 200
1 /wp-content/uploads/2010/11/twitter.png 404
1 /wp-content/uploads/2010/11/twitter.png 502
2 /wp-login.php 200
So now you have learned enough to parse any file using “awk” command. If you think any example/suggestion that should be included in this post than do let me know in the comments.
Can you use awk to find stats by minute showing response codes logged, total counts per code found in the log, and breakdown of number of requests in each response time range?