This project provides a Flask-based API to perform web scraping and text summarization. The API has two primary endpoints:
/scrape
: Triggers a web scraper to collect data and store it in a MySQL database./summarize
: Summarizes the scraped data based on a specified date using OpenAI's GPT-4.
The project is containerized using Docker for easy deployment and integration with other web services.
- Web scraping: Automates data collection from external sources.
- Summarization: Uses OpenAI's GPT-4 to summarize collected data.
- API endpoints: Allows web applications to trigger scraping and summarization tasks via HTTP requests.
- Dockerized: Easy deployment and management using Docker.
- Docker
- Docker Compose (optional, if using docker-compose)
- Python 3.9 (if running locally without Docker)
- MySQL database
git clone https://github.com/yourusername/news-crawler-service.git
cd news-crawler-service
To securely store sensitive information such as database credentials and API keys, you will need to configure environment variables.
Create a .env file in the root directory:
touch .env
In the .env file, add the following:
# MySQL Database Connection
MYSQL_HOST=your_mysql_host
MYSQL_DATABASE=your_database_name
MYSQL_USER=your_database_user
MYSQL_PASSWORD=your_database_password
# OpenAI API Key
OPENAI_API_KEY=your_openai_api_key
To build the Docker container, run the following command:
docker build -t flask-scraper-summarizer .
After building the Docker container, you can run it with:
docker run -d -p 5000:5000 --env-file .env flask-scraper-summarizer
This will start the Flask API, making it available on http://localhost:5000.
You can also use Docker Compose to manage the application more easily. Create a docker-compose.yml file if it doesn't exist, and include the following:
version: '3'
services:
flask_app:
build: .
ports:
- "5000:5000"
environment:
- MYSQL_HOST=${MYSQL_HOST}
- MYSQL_DATABASE=${MYSQL_DATABASE}
- MYSQL_USER=${MYSQL_USER}
- MYSQL_PASSWORD=${MYSQL_PASSWORD}
- OPENAI_API_KEY=${OPENAI_API_KEY}
env_file:
- .env
Then run the application with:
docker-compose up --build -d
If you want to run the Flask app locally without Docker, follow these steps:
-
Install the required Python dependencies:
pip install -r requirements.txt
-
Set up your environment variables in a .env file.
-
Start the Flask application:
flask run --host=0.0.0.0 --port=5000
Triggers the web scraper to collect data and store it in the MySQL database.
Request Example:
curl -X POST http://localhost:5000/scrape
Response Example:
{
"status": "success",
"message": "Scraping completed successfully",
"output": "Details of the scraping process"
}
Triggers the summarization process for the scraped data based on the specified date.
Request Example:
curl -X POST http://localhost:5000/summarize -H "Content-Type: application/json" -d '{"date": "yesterday"}'
Response Example:
{
"status": "success",
"message": "Summarization completed successfully",
"output": "Summarized content"
}
These are the key environment variables required for running the app:
- MYSQL_HOST: The host of your MySQL database.
- MYSQL_DATABASE: The name of the MySQL database to use.
- MYSQL_USER: The MySQL database username.
- MYSQL_PASSWORD: The MySQL database password.
- OPENAI_API_KEY: Your OpenAI API key for GPT-4.
Ensure you have access to a working MySQL instance and that the credentials provided in the .env file are correct. Make sure the OpenAI API key is valid and has access to the GPT-4 model. You can modify the API routes in app.py to add more functionality or customize the response format as needed.