This project is a modular system for scraping car data from a website, processing it through a server, and analyzing the results. The system consists of three main components: a web scraper client, a data processing server, and a data analyzer.
├── client.py # Web scraper client
├── server.py # Data processing server
└── fileAnalyzer.py # Data analysis module
- Purpose: Scrapes car data from a target website
- Functionality:
- Authenticates with the website
- Scrapes car listings from multiple pages
- Extracts car details (company, model, year, trim, kilometer, price)
- Sends extracted data to the server
- Receives processed data and saves it to
file.txt
- Key Libraries:
requests,BeautifulSoup,json,re,socket
- Purpose: Processes scraped car data using a custom spreadsheet-like language
- Functionality:
- Listens for connections from the client
- Creates tables to store car data
- Implements a custom language for data manipulation
- Processes and analyzes car data
- Sends processed data back to the client
- Key Features:
- Custom spreadsheet language with cell references (e.g.,
A1,B2) - Arithmetic operations (+, -, *, /)
- Variable assignment and context management
- Hash-based cell addressing system
- Custom spreadsheet language with cell references (e.g.,
- Purpose: Analyzes processed car data
- Functionality:
- Reads processed data from
file.txt - Performs various statistical analyses:
- Model comparison between companies
- Production year analysis
- Price analysis by company
- Specific model analysis (e.g., Peugeot 206)
- Uses pandas and numpy for data manipulation
- Reads processed data from
- Key Libraries:
json,numpy,pandas
-
Install Dependencies:
pip install requests beautifulsoup4 numpy pandas
-
Start the Server:
python server.py
The server will start listening on port 9999.
-
Run the Client:
python client.py
The client will scrape data, send it to the server, and save processed data to
file.txt. -
Analyze the Data:
python fileAnalyzer.py
The analyzer will process the data and print statistical results.
-
Scraping Phase:
client.pyscrapes car data from the website- Data is sent to
server.pyvia socket connection
-
Processing Phase:
server.pyprocesses data using custom spreadsheet language- Processed data is sent back to
client.py
-
Analysis Phase:
fileAnalyzer.pyreads processed data fromfile.txt- Performs statistical analysis and prints results
- Modular Design: Each component has a distinct responsibility
- Custom Spreadsheet Language: Implemented in
server.pyfor data manipulation - Real-time Processing: Data is processed as it's scraped
- Statistical Analysis: Comprehensive analysis of car market data
- The scraper uses hardcoded credentials for authentication
- The server implements a custom hash-based addressing system for cells
- Analysis results are printed to the console (can be modified to save to files)
- The system is designed to handle 500 pages of car listings
file.txt: Contains processed car data in JSON format- Console output: Statistical analysis results from
fileAnalyzer.py
This modular system provides a complete solution for web scraping, data processing, and analysis of car market data. Each component can be developed and tested independently while maintaining interoperability through standardized data formats.