Python Web Scraping with AWS Lambda

This is a Python application that demonstrates web scraping techniques using AWS Lambda and the AWS Toolkit for VSCode. It provides a flexible framework for scraping web pages, parsing data, and leveraging AWS serverless functions for scalable web scraping.

Installation

Clone the repository:

git clone https://github.com/workshop-msano/python-webscrayping-app.git

Navigate to the project directory:
```
cd python-webscrayping-app
```
Install the required dependencies using pip:
_{Creating a virtual environment is strongly recommended.}
```
pip install -r requirements.txt
```

Prerequisites

Before proceeding, ensure that you have the following tools installed and configured:

Virtual Environment

It is strongly recommended to set up a virtual environment to isolate the project dependencies. Follow the appropriate instructions for your preferred virtual environment tool (e.g., virtualenv or conda) to create and activate a virtual environment for the project.

Usage

Configure Slack and LINE

Create a .env file in the project directory.
In the .env file, set values with your API token:

like...
```
BOT_USER_OAUTH_TOKEN=<your_slack_api_token>
```

Slack: Sending messages
LINE: Building a bot

Customize the Scraping Logic

Open the scraper.py file and modify the code to define the specific web scraping rules based on your requirements.

Set up AWS Lambda

Follow the AWS documentation to create an AWS Lambda function and configure the necessary permissions and triggers.

Configure AWS Credentials

Install the AWS Toolkit for VSCode and set up your AWS credentials using the AWS Command Line Interface (CLI) or VSCode's integrated AWS credentials management.

Build and Test

To build the application locally and test it:

sam build
sam local invoke

Deploy the Application to AWS Lambda

To deploy the application to AWS Lambda:

sam deploy --guided

This will also create a samconfig.toml file that contains the deployment configurations. For subsequent deployments, simply run sam deploy to deploy the app.

Monitor the Execution and Results

Check the AWS CloudWatch logs and the output generated by the Lambda function to view the scraped data.

Contributing

Contributions are welcome! If you have any ideas, suggestions, or bug reports, please open an issue or submit a pull request. Your input is highly appreciated.

To contribute to the project, follow these steps:

Fork the repository.
Create a new branch:
```
git checkout -b my-feature-branch
```
Make your changes and commit them:
```
git commit -m "Add new feature"
```
Push your changes to the forked repository:
```
git push origin my-feature-branch
```
Open a pull request with a detailed description of your changes.
Wait for the project maintainers to review and merge your pull request.

Further more...

It's possible to trigger a Lambda function by a specific time using EventBridge. By utilizing EventBridge rules, you can schedule the execution of your Lambda function at a predetermined time. This allows you to automate the retrieval of data without manual intervention.
You can get details in Amazon EventBridge

License

This project is licensed under the MIT License. See the LICENSE file for details.

Feel free to explore, use, and enhance this web scraping application. If you have any questions or need assistance, please don't hesitate to reach out. Happy web scraping!

Reference

Run Selenium in AWS Lambda for UI testing

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.aws-sam		.aws-sam
events		events
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
samconfig.toml		samconfig.toml
template.yaml		template.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Python Web Scraping with AWS Lambda

Table of Contents

Installation

Prerequisites

Virtual Environment

Usage

Configure Slack and LINE

Customize the Scraping Logic

Set up AWS Lambda

Configure AWS Credentials

Build and Test

Deploy the Application to AWS Lambda

Monitor the Execution and Results

Contributing

Further more...

License

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Languages

workshop-msano/python-webscrayping-app

Folders and files

Latest commit

History

Repository files navigation

Python Web Scraping with AWS Lambda

Table of Contents

Installation

Prerequisites

Virtual Environment

Usage

Configure Slack and LINE

Customize the Scraping Logic

Set up AWS Lambda

Configure AWS Credentials

Build and Test

Deploy the Application to AWS Lambda

Monitor the Execution and Results

Contributing

Further more...

License

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages