Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
gilbertekalea committed Apr 14, 2022
2 parents 63699af + c6734f2 commit 550cbc8
Show file tree
Hide file tree
Showing 3 changed files with 108 additions and 52 deletions.
70 changes: 70 additions & 0 deletions .github/workflows/codeql-analysis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# For most projects, this workflow file will not need changing; you simply need
# to commit it to your repository.
#
# You may wish to alter this file to override the set of languages analyzed,
# or to provide custom queries or build logic.
#
# ******** NOTE ********
# We have attempted to detect the languages in your repository. Please check
# the `language` matrix defined below to confirm you have the correct set of
# supported CodeQL languages.
#
name: "CodeQL"

on:
push:
branches: [ main ]
pull_request:
# The branches below must be a subset of the branches above
branches: [ main ]
schedule:
- cron: '39 6 * * 3'

jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
permissions:
actions: read
contents: read
security-events: write

strategy:
fail-fast: false
matrix:
language: [ 'cpp', 'javascript', 'python' ]
# CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 'python', 'ruby' ]
# Learn more about CodeQL language support at https://git.io/codeql-language-support

steps:
- name: Checkout repository
uses: actions/checkout@v3

# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
with:
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
# By default, queries listed here will override any specified in a config file.
# Prefix the list here with "+" to use these queries and those in the config file.
# queries: ./path/to/local/query, your-org/your-repo/queries@main

# Autobuild attempts to build any compiled languages (C/C++, C#, or Java).
# If this step fails, then you should remove it and run the build manually (see below)
- name: Autobuild
uses: github/codeql-action/autobuild@v2

# ℹ️ Command-line programs to run using the OS shell.
# 📚 https://git.io/JvXDl

# ✏️ If the Autobuild fails above, remove it and uncomment the following three lines
# and modify them (or add more) to build your code if your project
# uses a compiled language

#- run: |
# make bootstrap
# make release

- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2
69 changes: 17 additions & 52 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,25 @@
## booking.com_crawler
## Booking.com_crawler

An advanced crawler for extracting hotels data from *[Booking.com](https://www.booking.com/)*. The bot is powered by Selenium webdriver and purely written in python.
The primary intended audience is anyone interest in data mining or web scraping and would want to scrape hotel data from booking.com website. if that's you; go ahead clone this repo or fork it. For audience who are new to programming I have worked so hard to make sure that the crawler works for you. However, there are few things you need to do on your own, Your need to setting up environmental variables for the webdriver. The webdriver is like an engine that controls the crawler behaior.
An advanced web scraper for extracting hotel data from *[Booking.com](https://www.booking.com/)*. No sign up or log in required.
The code is meant to be simple, easy to use and modify. However,there are few configuration and setups that are necessary for the code program to work.

## Summary
Booking.com is an online travel agency for lodging reservations & other travel products. The booking.com_crawler is an web scraping bot that crawls the booking.com website to extract hotel data. The crawler is designed to automatically generate date range and therefore the end-user is required to entered relevant data in a csv file found in a folder named.
Please read the following the following sections carefully.

client_input/destination_param.csv
## Summary
Booking.com is an online travel agency for lodging reservations & other travel products. The booking.com_crawler is an web scraping bot that crawls the booking.com website to extract hotel data and stores the scrape data in csv file.

## Bot Features
## Scraper Features

- Apply filters *can be customized*
- Browser window switch
- Switch browsers tabs
- Generate date ranges for checkin and checkout
- Click and follow the link
- Perform Pagenation
- Web automation
- Data conversion - get in csv format or json format.
- Proxy - not yet implemented

## Data Features

- city_name
- property_name
- property_description
Expand All @@ -37,30 +37,24 @@ Booking.com is an online travel agency for lodging reservations & other travel p

## Getting Started

To get started using booking.com_crawler follow the following instructions.


### Installation

Two ways to intall the project.
### Clone the repository

1. Clone repository.
To clone this repository using Git, use

git clone https://github.com/gilbertekalea/booking.com_crawler.git
git clone https://github.com/gilbertekalea/booking.com_crawler.git

2. Download the project files.
### Installing Dependencies
The official python package manager for installing dependecies is **pip**.

Save the files on your computer.

Once you have it installed, open code editor/terminal/command line of your choice and navigate to the folder where you saved the project files.
If you're new to python please checkout this article on [how to install pip](https://stackoverflow.com/questions/4750806/how-can-i-install-pip-on-windows)

### Activate Virtual Environment

To activate virtual environment run the following script in command line. Please refer here [Python Virtual Environment](https://docs.python.org/3/tutorial/venv.html) on how to activate venv in your machine.

For windows powershell :

my_bot\project_folder_dir> venv\Scripts\activate.ps1
my_bot\project_folder_dir> venv\Scripts\activate.ps1

Now install the dependencies using the requirements.txt file.

Expand Down Expand Up @@ -116,36 +110,7 @@ To run the bot you simply type
The bot you automatically open your boooking.com in chrome browser window.

## Event Loops

The bot remains live until all event loops are completed.

In the current version, there are three event loops:

- The first event loop is for collecting the data from the csv file. The length of the list will be used to determine how many times the loop will run.
- The second event loop is for searching for the hotels. The length depend on the number of dates generated.
- The third event loop is for parsing the data from each deal box and following the next page link. The range is determined by calculating the number of properties found divide by number of properties per page.
- Example:

* in runbot.py*
-
with Booking() as bot:
# loop through each params given by user in csv file then call the get_user_data_from_csv function
# a wrapper loop will be used to call the get_user_data_from_csv function
First event loop
for _, data in enumerate(helpers.get_csv_data("./client_input/destination_param.csv")):
GIVEN_DATE = ....do something
the second event loop
for i, date in enumerate(GIVEN_DATE):
....do something
The third loop is called when bot.report_results method is called.
bot.report_results()
for i in range(math.ceil(count / 25)):
...do something
next_page.go_next_page()

21 changes: 21 additions & 0 deletions SECURITY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Security Policy

## Supported Versions

Use this section to tell people about which versions of your project are
currently being supported with security updates.

| Version | Supported |
| ------- | ------------------ |
| 5.1.x | :white_check_mark: |
| 5.0.x | :x: |
| 4.0.x | :white_check_mark: |
| < 4.0 | :x: |

## Reporting a Vulnerability

Use this section to tell people how to report a vulnerability.

Tell them where to go, how often they can expect to get an update on a
reported vulnerability, what to expect if the vulnerability is accepted or
declined, etc.

0 comments on commit 550cbc8

Please sign in to comment.