Skip to content

workshop-msano/python-webscrayping-app

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python Web Scraping with AWS Lambda

Python Web Scraping License

This is a Python application that demonstrates web scraping techniques using AWS Lambda and the AWS Toolkit for VSCode. It provides a flexible framework for scraping web pages, parsing data, and leveraging AWS serverless functions for scalable web scraping.

Table of Contents

Installation

  1. Clone the repository:

    git clone https://github.com/workshop-msano/python-webscrayping-app.git
  2. Navigate to the project directory:

    cd python-webscrayping-app
  3. Install the required dependencies using pip:
    Creating a virtual environment is strongly recommended.

    pip install -r requirements.txt

Prerequisites

Before proceeding, ensure that you have the following tools installed and configured:

Virtual Environment

It is strongly recommended to set up a virtual environment to isolate the project dependencies. Follow the appropriate instructions for your preferred virtual environment tool (e.g., virtualenv or conda) to create and activate a virtual environment for the project.

Usage

Configure Slack and LINE

  1. Create a .env file in the project directory.

  2. In the .env file, set values with your API token:

    like...

    BOT_USER_OAUTH_TOKEN=<your_slack_api_token>
    

Customize the Scraping Logic

Open the scraper.py file and modify the code to define the specific web scraping rules based on your requirements.

Set up AWS Lambda

Follow the AWS documentation to create an AWS Lambda function and configure the necessary permissions and triggers.

Configure AWS Credentials

Install the AWS Toolkit for VSCode and set up your AWS credentials using the AWS Command Line Interface (CLI) or VSCode's integrated AWS credentials management.

Build and Test

To build the application locally and test it:

sam build
sam local invoke

Deploy the Application to AWS Lambda

To deploy the application to AWS Lambda:

sam deploy --guided

This will also create a samconfig.toml file that contains the deployment configurations. For subsequent deployments, simply run sam deploy to deploy the app.

Monitor the Execution and Results

Check the AWS CloudWatch logs and the output generated by the Lambda function to view the scraped data.

Contributing

Contributions are welcome! If you have any ideas, suggestions, or bug reports, please open an issue or submit a pull request. Your input is highly appreciated.

To contribute to the project, follow these steps:

  1. Fork the repository.

  2. Create a new branch:

    git checkout -b my-feature-branch
  3. Make your changes and commit them:

    git commit -m "Add new feature"
  4. Push your changes to the forked repository:

    git push origin my-feature-branch
  5. Open a pull request with a detailed description of your changes.

  6. Wait for the project maintainers to review and merge your pull request.

Further more...

It's possible to trigger a Lambda function by a specific time using EventBridge. By utilizing EventBridge rules, you can schedule the execution of your Lambda function at a predetermined time. This allows you to automate the retrieval of data without manual intervention.
You can get details in Amazon EventBridge

License

This project is licensed under the MIT License. See the LICENSE file for details.


Feel free to explore, use, and enhance this web scraping application. If you have any questions or need assistance, please don't hesitate to reach out. Happy web scraping!

Reference

Run Selenium in AWS Lambda for UI testing

About

An app that retrieves information from web-site and posts it to a channel of Slack and LINE

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published