This project simulates a real-world reporting task for a newspaper website. Youβve been brought onto the team to build an internal reporting tool that provides insights into user activity by analyzing the siteβs logs.
The site and its PostgreSQL database are already running. Your task is to write a Python program that connects to the database, executes SQL queries, and prints the answers to key business questionsβwithout requiring any user input.
- Analyze web server logs and article metadata.
- Use SQL to discover user behavior patterns.
- Build a command-line reporting tool using Python and PostgreSQL.
Your program will answer the following questions:
- What are the most popular three articles of all time?
- Who are the most popular article authors of all time?
- On which days did more than 1% of requests lead to errors?
- Practice interacting with a real SQL database (PostgreSQL) from both the command line and Python.
- Work with a large dataset (over a million rows).
- Gain experience writing and refining complex SQL queries.
- Understand how logs are used for reporting in real-world applications.
- See how different parts of a system (like a web server and a reporting tool) can communicate via a shared database.
- VirtualBox
- Vagrant
- PostgreSQL
- Python 3
psycopg2Python library
-
Download and Install Tools
-
Initialize and Configure VM
- Use
vagrant upto bring the VM online. - SSH into the VM using
vagrant ssh.
Modify
Vagrantfile:config.vm.box = "bento/ubuntu-16.04" config.vm.box_version = "201912.15.0"
- Use
-
Download and Load the Database
- Download data:
newsdata.zip - Unzip it inside the Vagrant folder.
- Load it with:
psql -d news -f newsdata.sql
- Download data:
-
Connect to the Database
psql -d news
-
Run the Python Script
python3 log_analysis.py
log-analysis/
β
βββ log_analysis.py # Python script to analyze and print report
βββ newsdata.sql # SQL dump of article and log data
βββ README.md # This file
- PostgreSQL Official Docs
- Connecting to PostgreSQL
- SQL Basics
- Advanced SQL Cheat Sheet
- Tutorial Video
- Make sure you're running Python 3.
- Use
psycopg2to interact with the database. - The script does not accept user inputβit just prints a report to the terminal.