Skip to content

This repository contains a task to parse, clean, and visualize Nginx log files, as well as an ETL process to store parsed logs into a MySQL database.

Notifications You must be signed in to change notification settings

EhsanAhmadzadeh/nginx-log-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Nginx Log Analysis

This repository contains a task to parse, clean, and visualize Nginx log files, as well as an ETL process to store parsed logs into a MySQL database.

Project Context

This project was originally given to me as part of an interview task for a Data Engineer position at a tech company. I thought it was a valuable project to showcase my skills in data processing, cleaning, and visualization, so I decided to share it here.

Project Requirements

Before running the project, ensure you have the following installed:

  1. Python: Version 3.8 or higher
  2. MySQL Database: Make sure you have access to a MySQL database where you can create tables for storing the parsed log data.

Project Structure

nginx-log-analysis/
├── config/
│   └── config.ini
├── data/
│   └── Data engineering internship.pdf  # The problem explanation and the desired output by company
│   └── nginx_logs.txt  # Raw Nginx log file provided for the task
├── notebooks/
|   └── demo.ipynb  # Notebook file to show parsing process for better understanding
│   └── log_analysis.ipynb
├── reports/
│   └── task_summary.pdf  # Summary report of the task
├── scripts/
│   └── ETL.py
├── requirements.txt
├── README.md
└── .gitignore

Setup Instructions

To run the script, follow these steps:

  1. Create a virtual environment:

    python -m venv env
  2. Activate the virtual environment:

    • On Windows:
      .\env\Scripts\activate
    • On macOS/Linux:
      source env/bin/activate
  3. Install the requirements:

    pip install -r Requirements.txt
  4. Run the ETL script:

  python scripts/etl.py
  1. Run the Jupyter notebook for analysis and visualization:
  jupyter notebook notebooks/log_analysis.ipynb

About

This repository contains a task to parse, clean, and visualize Nginx log files, as well as an ETL process to store parsed logs into a MySQL database.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published