From a36e439191a9644fd612b5f65a33d2e5a053b5e0 Mon Sep 17 00:00:00 2001 From: Ekaale <62475199+gilbertekalea@users.noreply.github.com> Date: Tue, 12 Apr 2022 12:48:55 -0400 Subject: [PATCH 1/3] Create codeql-analysis.yml --- .github/workflows/codeql-analysis.yml | 70 +++++++++++++++++++++++++++ 1 file changed, 70 insertions(+) create mode 100644 .github/workflows/codeql-analysis.yml diff --git a/.github/workflows/codeql-analysis.yml b/.github/workflows/codeql-analysis.yml new file mode 100644 index 0000000..f278395 --- /dev/null +++ b/.github/workflows/codeql-analysis.yml @@ -0,0 +1,70 @@ +# For most projects, this workflow file will not need changing; you simply need +# to commit it to your repository. +# +# You may wish to alter this file to override the set of languages analyzed, +# or to provide custom queries or build logic. +# +# ******** NOTE ******** +# We have attempted to detect the languages in your repository. Please check +# the `language` matrix defined below to confirm you have the correct set of +# supported CodeQL languages. +# +name: "CodeQL" + +on: + push: + branches: [ main ] + pull_request: + # The branches below must be a subset of the branches above + branches: [ main ] + schedule: + - cron: '39 6 * * 3' + +jobs: + analyze: + name: Analyze + runs-on: ubuntu-latest + permissions: + actions: read + contents: read + security-events: write + + strategy: + fail-fast: false + matrix: + language: [ 'cpp', 'javascript', 'python' ] + # CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 'python', 'ruby' ] + # Learn more about CodeQL language support at https://git.io/codeql-language-support + + steps: + - name: Checkout repository + uses: actions/checkout@v3 + + # Initializes the CodeQL tools for scanning. + - name: Initialize CodeQL + uses: github/codeql-action/init@v2 + with: + languages: ${{ matrix.language }} + # If you wish to specify custom queries, you can do so here or in a config file. + # By default, queries listed here will override any specified in a config file. + # Prefix the list here with "+" to use these queries and those in the config file. + # queries: ./path/to/local/query, your-org/your-repo/queries@main + + # Autobuild attempts to build any compiled languages (C/C++, C#, or Java). + # If this step fails, then you should remove it and run the build manually (see below) + - name: Autobuild + uses: github/codeql-action/autobuild@v2 + + # ℹī¸ Command-line programs to run using the OS shell. + # 📚 https://git.io/JvXDl + + # ✏ī¸ If the Autobuild fails above, remove it and uncomment the following three lines + # and modify them (or add more) to build your code if your project + # uses a compiled language + + #- run: | + # make bootstrap + # make release + + - name: Perform CodeQL Analysis + uses: github/codeql-action/analyze@v2 From e354ab860cdf08723d42920925060a6971441682 Mon Sep 17 00:00:00 2001 From: Ekaale <62475199+gilbertekalea@users.noreply.github.com> Date: Wed, 13 Apr 2022 00:16:36 -0400 Subject: [PATCH 2/3] Create SECURITY.md --- SECURITY.md | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) create mode 100644 SECURITY.md diff --git a/SECURITY.md b/SECURITY.md new file mode 100644 index 0000000..034e848 --- /dev/null +++ b/SECURITY.md @@ -0,0 +1,21 @@ +# Security Policy + +## Supported Versions + +Use this section to tell people about which versions of your project are +currently being supported with security updates. + +| Version | Supported | +| ------- | ------------------ | +| 5.1.x | :white_check_mark: | +| 5.0.x | :x: | +| 4.0.x | :white_check_mark: | +| < 4.0 | :x: | + +## Reporting a Vulnerability + +Use this section to tell people how to report a vulnerability. + +Tell them where to go, how often they can expect to get an update on a +reported vulnerability, what to expect if the vulnerability is accepted or +declined, etc. From c6734f2ea7ef1927ec9862f34fa167d4822e0d98 Mon Sep 17 00:00:00 2001 From: Ekaale <62475199+gilbertekalea@users.noreply.github.com> Date: Thu, 14 Apr 2022 00:21:15 -0400 Subject: [PATCH 3/3] Update README.md --- README.md | 69 ++++++++++++++----------------------------------------- 1 file changed, 17 insertions(+), 52 deletions(-) diff --git a/README.md b/README.md index 62598f1..c99af5c 100644 --- a/README.md +++ b/README.md @@ -1,17 +1,18 @@ -## booking.com_crawler +## Booking.com_crawler -An advanced crawler for extracting hotels data from *[Booking.com](https://www.booking.com/)*. The bot is powered by Selenium webdriver and purely written in python. -The primary intended audience is anyone interest in data mining or web scraping and would want to scrape hotel data from booking.com website. if that's you; go ahead clone this repo or fork it. For audience who are new to programming I have worked so hard to make sure that the crawler works for you. However, there are few things you need to do on your own, Your need to setting up environmental variables for the webdriver. The webdriver is like an engine that controls the crawler behaior. +An advanced web scraper for extracting hotel data from *[Booking.com](https://www.booking.com/)*. No sign up or log in required. +The code is meant to be simple, easy to use and modify. However,there are few configuration and setups that are necessary for the code program to work. -## Summary -Booking.com is an online travel agency for lodging reservations & other travel products. The booking.com_crawler is an web scraping bot that crawls the booking.com website to extract hotel data. The crawler is designed to automatically generate date range and therefore the end-user is required to entered relevant data in a csv file found in a folder named. +Please read the following the following sections carefully. - client_input/destination_param.csv +## Summary +Booking.com is an online travel agency for lodging reservations & other travel products. The booking.com_crawler is an web scraping bot that crawls the booking.com website to extract hotel data and stores the scrape data in csv file. -## Bot Features +## Scraper Features - Apply filters *can be customized* - - Browser window switch + - Switch browsers tabs + - Generate date ranges for checkin and checkout - Click and follow the link - Perform Pagenation - Web automation @@ -19,7 +20,6 @@ Booking.com is an online travel agency for lodging reservations & other travel p - Proxy - not yet implemented ## Data Features - - city_name - property_name - property_description @@ -37,22 +37,16 @@ Booking.com is an online travel agency for lodging reservations & other travel p ## Getting Started -To get started using booking.com_crawler follow the following instructions. - - -### Installation - -Two ways to intall the project. +### Clone the repository -1. Clone repository. +To clone this repository using Git, use - git clone https://github.com/gilbertekalea/booking.com_crawler.git + git clone https://github.com/gilbertekalea/booking.com_crawler.git -2. Download the project files. +### Installing Dependencies +The official python package manager for installing dependecies is **pip**. - Save the files on your computer. - -Once you have it installed, open code editor/terminal/command line of your choice and navigate to the folder where you saved the project files. +If you're new to python please checkout this article on [how to install pip](https://stackoverflow.com/questions/4750806/how-can-i-install-pip-on-windows) ### Activate Virtual Environment @@ -60,7 +54,7 @@ To activate virtual environment run the following script in command line. Please For windows powershell : - my_bot\project_folder_dir> venv\Scripts\activate.ps1 + my_bot\project_folder_dir> venv\Scripts\activate.ps1 Now install the dependencies using the requirements.txt file. @@ -116,36 +110,7 @@ To run the bot you simply type The bot you automatically open your boooking.com in chrome browser window. - ## Event Loops - - The bot remains live until all event loops are completed. - - In the current version, there are three event loops: - -- The first event loop is for collecting the data from the csv file. The length of the list will be used to determine how many times the loop will run. -- The second event loop is for searching for the hotels. The length depend on the number of dates generated. -- The third event loop is for parsing the data from each deal box and following the next page link. The range is determined by calculating the number of properties found divide by number of properties per page. -- Example: - - * in runbot.py* -- - with Booking() as bot: - # loop through each params given by user in csv file then call the get_user_data_from_csv function - # a wrapper loop will be used to call the get_user_data_from_csv function - - First event loop - for _, data in enumerate(helpers.get_csv_data("./client_input/destination_param.csv")): - GIVEN_DATE = ....do something - - the second event loop - for i, date in enumerate(GIVEN_DATE): - ....do something - - The third loop is called when bot.report_results method is called. - bot.report_results() - for i in range(math.ceil(count / 25)): - ...do something - next_page.go_next_page() +