This project provides a command-line tool to process web content, extract readable text, and optionally send it via email. It handles various content types and includes features for URL skipping and logging.
To set up the project, follow these steps:
-
Clone the repository:
git clone <repository_url> cd web_content_processor
-
Install dependencies:
pip install -r requirements.txt
To run the web content processor, use the WebContentProcessor.py script with the desired URL and processing mode. You can also specify sender and recipient email addresses for notifications.
python web_content_processor/WebContentProcessor.py <URL> [PROCESSING_MODE] [--from-email YOUR_EMAIL] [--to-email RECIPIENT_EMAIL]Arguments:
<URL>: The URL of the web content to process.[PROCESSING_MODE]: Optional. Specifies how the content should be processed. Possible values:-txt: Process as plain text.-rea: Process as readable text and mail.-reau: Process as readable text and mail (unconditional).
--from-email YOUR_EMAIL: Optional. The email address to use as the sender for notifications.--to-email RECIPIENT_EMAIL: Optional. The email address to send notifications to.
Examples:
-
Process a URL and send as readable text to an email:
python web_content_processor/WebContentProcessor.py https://github.com/fernand0/web_content_processor -rea --from-email sender@example.com --to-email receiver@example.com
-
Process a URL as plain text (no email):
python web_content_processor/WebContentProcessor.py https://www.example.com/another-article -txt
- Web Content Extraction: Extracts readable text from web pages.
- URL Skipping: Skips processing for predefined domains (e.g., YouTube, Vimeo).
- URL Logging: Logs processed URLs to prevent duplication.
- Email Notifications: Sends processed content via email based on processing mode.
- Error Handling: Automatically sends an email notification to the recipient email address if a download error occurs, including the problematic URL and an error message.
Contributions are welcome! Please feel free to submit pull requests or open issues.
This project is licensed under the MIT License - see the LICENSE file for details. (Note: A LICENSE file is not included in this project structure. You may want to create one.)