This project focuses on building a robust system to generate real-time insights into election voting statistics and leaderboards, fostering a transparent, efficient, and effective voting process. The architecture ensures seamless data streaming, processing, and visualization to deliver real-time updates to end-users.
- Real-Time Insights: Continuously updated voting statistics and leaderboards for instant visibility into election progress.
- Transparency: Promotes fairness and openness by providing stakeholders with up-to-date and accurate data.
- Scalability and Efficiency: Designed to handle high volumes of concurrent data without compromising performance.
- Kafka: For real-time data streaming and message brokering.
- PostgreSQL: For storing and querying election data.
- Python: The primary programming language used for data processing and analysis.
- Spark Streaming: For processing data in real-time and performing analytics.
- Streamlit: For creating the interactive web application.
- Docker: Docker is used for containerization, which simplifies deployment and ensures that the application runs consistently across different environments.
- Real-time Data Visualization: Get live updates on voting statistics and metrics.
- Dynamic Charts: Visualize data using pie charts and bar charts for better insights.
- User-friendly Interface: Easy navigation through the dashboard for viewing election data.
- Custom Refresh Interval: Users can set a refresh interval for real-time data updates.
- Python 3.9 or above installed on your machine
- Docker Compose installed on your machine
- Docker installed on your machine
- Clone this repository.
- Navigate to the root containing the Docker Compose file.
- Run the following command start Zookeeper, Kafka and Postgres containers in detached mode
docker-compose up -d- Setup a Virtual environment
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`- Install the required packages
pip install -r requirements.txt- Update config.json as per your system
{
// Database credentials
"database": {
"host": "localhost",
"username": "election_user",
"password": "election_pass",
"db_name": "voting",
"port": 5433
},
"tables": ["candidates", "voters", "votes"], # List of tables
"randomuser_url": "https://randomuser.me/api/?nat=in", # RandomUser API Base URL
"parties": ["BJP", "INC", "TDP", "BSP", "SP", "AAP"], # List of Political Parties
"total_candidates": 12, # configuration for total candidates
"total_voters": 1000, # configuration for total voters
"voting_interval": 0.5, # configuration for voting simulation interval
"kafka_topics": {
"votes_topic": "votes_topic" # List of kafka topics
},
"base_dir": "/Users/naman/Desktop/DataEngineering/RealtimeElectionPipeline/" # Base directory or root path
}- Run setup.py to create Postgres tables and generate data
python3 setup.pyTerminal 1 -> Consuming the voter information from Postgres, generating voting data and producing voting data to the Kafka topic:
python3 voting_app.pyTerminal 2 -> Spark streaming Jobs consuming the voting data from Kafka topic, enriching the data, calculate aggregates and producing data to specific topics on Kafka:
python3 spark-streaming.pyTerminal 3 -> Running the Streamlit app:
streamlit run streamlit_app/app.py