Parse Gunicorn logs with python script

This post will explain parsing the Gunicorn log file using Python script, so lets start:

Lets consider a general example of Gunicorn log file is which is saved as gunicorn.log in the same directory as script:

 1
2
3
4
5
6
7
8
9
10
11
12
13
14
[2016-06-13 16:29:15 +0000] [32172] [DEBUG] POST /procurement/web/v2/express/filter_all_view_v2/
[2016-06-13 16:29:15 +0000] [32183] [DEBUG] GET /procurement/web/v2/express/item/2523
[2016-06-13 16:29:15 +0000] [32183] [DEBUG] Closing connection.
[2016-06-13 16:29:15 +0000] [32181] [DEBUG] POST /procurement/web/v2/reserve/filter_all_view_v2/
[2016-06-13 16:29:16 +0000] [32172] [DEBUG] Closing connection.
[2016-06-13 16:29:16 +0000] [31483] [DEBUG] 12 workers
[2016-06-13 16:29:16 +0000] [32181] [DEBUG] Closing connection.
[2016-06-13 16:29:17 +0000] [31483] [DEBUG] 12 workers
[2016-06-13 16:29:18 +0000] [31483] [DEBUG] 12 workers
[2016-06-13 16:29:19 +0000] [31483] [DEBUG] 12 workers
[2016-06-13 16:29:20 +0000] [31483] [DEBUG] 12 workers
[2016-06-13 16:29:21 +0000] [31483] [DEBUG] 12 workers
[2016-06-13 16:29:22 +0000] [31483] [DEBUG] 12 workers
[2016-06-13 16:29:23 +0000] [32173] [DEBUG] GET /procurement/web/v2/express/filter_population/

In above example, first column is showing date and time, and fourth column is showing multiple information like API call, number of workers running currently and connection information.

In order to find out how many times a particulat API get called on a particular date, we will need to extract only that lines that contain information about the API call.

This task can be achieved by following Python script, you will just require to enter a date.

Script does following task:

  1. Requires to enter a date of which you need to extract the API call count.
  2. Opens the log file, gunicorn.log in this case.
  3. Splits the log file on the basis of new line character and stores it in list.
  4. Extract the data from the list that corresponds to the given date.
  5. Now, filters the data so that list only contains the information about API calls.
  6. Count, how many times a particular API get called.
  7. At the end, sort the data on the basis of count so that highly shooted API calls should appear first.


 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
#!/bin/python
import operator
from collections import OrderedDict

print "n===================================================="
print "Script to find API's calling count from Gunicorn log"
print "====================================================== n"

# Dictionary to store API and their count
api_dict = {}
# List to store the lines containing API calls
api_call_list = []
# Stores API of user entered date
api_list = []
# List to store log details of user entered date
filtered_file = []

# Date of which the API count need to find
date = raw_input("Enter the date in format 'yyyy-mm-dd':")
print date, 'n'

# Open the log file
file = open('gunicorn.log').read().split('n')

# Filter the log datewise
for line in file:
if date in line:
filtered_file.append(line)

# Store only API calls information and remove other one.
for line in filtered_file:
if "connection." not in line and 'workers' not in line:
api_call_list.append(line)

# Just store API calls
for call in api_call_list:
api_list.append(call.split()[-1])

# Count API calls
for api in api_list:
if api not in api_dict:
api_dict[api] = api_list.count(api)

# Sort the Dictionary
sorted_api_dict = OrderedDict(sorted(api_dict.items(), key=lambda t: t[1], reverse=True))

for i in sorted_api_dict:
print i, '=', sorted_api_dict[i]

print "==========================================="
print "===========================================
n"

Run this script by issuing following command:

python <script name>

The output will look like:

======================================================
Script to find API's calling count from Gunicorn log
======================================================

Enter the date in format 'yyyy-mm-dd':2016-06-13
2016-06-13

/procurement/web/v2/reserve/filter_all_view_v2/ = 4
/procurement/web/v2/express/filter_all_view_v2/ = 3
/procurement/web/v2/reserve/filter_population/ = 2
/procurement/web/v2/reserve/item/21 = 1
/temp_o/web/v2/user_profile/ = 1
/procurement/web/v2/express/item/2523 = 1
/procurement/web/v2/express/filter_population/ = 1
=====================================================
=====================================================

And you are done.

Leave a Reply

Your email address will not be published. Required fields are marked *