The training project "PageLoader" on the Python Development course on Hexlet.io.
Languages, frameworks and libraries used in the implementation of the project:
List of dependencies, without which the project code will not work correctly:
- python = "^3.8"
- requests = "^2.28.1"
- beautifulsoup4 = "^4.11.1"
- progress = "^1.6"
PageLoader is a command line utility that downloads pages from the Internet and saves them to your computer. Together with the page, it downloads all the resources (pictures, styles and js) making it possible to open the page without the Internet.
By the same principle, saving pages in the browser is arranged.
The utility multi-threadedly downloads resources and shows the progress for each resource in the terminal.
Before installing the package, you need to make sure that you have Python version 3.8 or higher installed:
# Windows, Ubuntu, MacOS:
>> python --version # or python -V
Python 3.8.0+
python3 --version
.
If you have an older version installed, update with the following commands:
# Windows:
>> pip install python --upgrade
# Ubuntu:
>> sudo apt-get upgrade python3.X
# MacOS:
>> brew update && brew upgrade python
# * X - version number to be installed
If you don't have Python installed, you can download and install it from the official Python website. If you are an Ubuntu or MacOS user, then it is better to do this procedure through package managers. Open a terminal and run the command for your operating system:
# Ubuntu:
>> sudo apt update
>> sudo apt install python3.X
# MacOS:
# https://brew.sh/index_ru.html
>> brew install python3.X
# * X - version number to be installed
β The configuration of assemblies of different versions of operating systems can vary greatly from each other, which makes it impossible to write a common instruction. If you're running an OS other than the above, or you're having errors after the suggested commands, search Stack Overflow for answers, maybe someone else has come across them before you! Setting up the environment is not easy! π
The project uses the Poetry manager. Poetry is a tool for dependency management and packaging in Python. It allows you to declare the libraries your project depends on and it will manage (install/update) them for you. You can read more about this tool on the official Poetry website.
Poetry provides a custom installer that will install poetry isolated from the rest of your system by vendorizing its dependencies. This is the recommended way of installing poetry.
# Windows (WSL), Linux, MacOS:
>> curl -sSL https://install.python-poetry.org | python3 -
# Windows (Powershell):
>> (Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | py -
# If you have installed Python through the Microsoft Store, replace "py" with "python" in the command above.
python
may still refer to Python 2 instead of Python 3. The Poetry Team suggests a python3
binary to avoid ambiguity.
~/Library/Application Support/pypoetry
on MacOS.~/.local/share/pypoetry
on Linux/Unix.%APPDATA%\pypoetry
on Windows.
If you wish to change this, you may define the $POETRY_HOME environment variable:
>> curl -sSL https://install.python-poetry.org | POETRY_HOME=/etc/poetry python3 -
Add Poetry to your PATH.
Once Poetry is installed and in your $PATH, you can execute the following:
>> poetry --version
To work with the package, you need to clone the repository to your computer. This is done using the git clone
command. Clone the project on the command line:
# clone via HTTPS:
>> git clone https://github.com/IgorGakhov/python-project-51.git
# clone via SSH:
>> git clone git@github.com:IgorGakhov/python-project-51.git
It remains to move to the directory and install the package:
>> cd python-project-51
>> poetry build
>> python3 -m pip install --user dist/*.whl
# If you have previously installed a package and want to update it, use the following command:
# >> python3 -m pip install --user --force-reinstall dist/*.whl
Finally, we can move on to using the project functionality!
from page_loader import download
file_path = download(url_address, destination)
The utility provides the ability to call the help command if you find it difficult to use:
>> page-loader --help
usage: page-loader [-h] [--output DESTINATION] url_address
Downloads the page from the network and puts it in the specified existing directory (default: working directory).
positional arguments:
url_address page being downloaded
options:
-h, --help show this help message and exit
--output DESTINATION output directory (default: current dir)
β‘ Only absolute file paths are supported.
The utility downloads resources and shows the progress of each resource in the terminal.
Example:
>> page-loader --output /home/user/page_storage https://page-loader.hexlet.repl.co/
12:41:24 INFO: Initiated download of page https://page-loader.hexlet.repl.co/ to local directory Β«/home/user/page_storageΒ» ...
12:41:25 INFO: Response from page https://page-loader.hexlet.repl.co/ received.
Page available for download!
Resources Loading |ββββββββ | 25% [1/4]
12:41:26 INFO: [+] Resource https://page-loader.hexlet.repl.co/script.js saved successfully!
Resources Loading |ββββββββββββββββ | 50% [2/4]
12:41:26 INFO: [+] Resource https://page-loader.hexlet.repl.co/assets/professions/nodejs.png saved successfully!
Resources Loading |ββββββββββββββββββββββββ | 75% [3/4]
12:41:26 INFO: [+] Resource https://page-loader.hexlet.repl.co/assets/application.css saved successfully!
Resources Loading |ββββββββββββββββββββββββββββββββ| 100% [4/4]
12:41:26 INFO: [+] Resource https://page-loader.hexlet.repl.co/courses saved successfully!
12:41:26 INFO: FINISHED! Loading is complete successfully!
The downloaded page is located in the Β«/home/user/page_storage/page-loader-hexlet-repl-co.htmlΒ» file.
/home/user/page_storage/page-loader-hexlet-repl-co.html
List of dev-dependencies:
- flake8 = "^4.0.1"
- pytest = "^7.1.3"
- pytest-cov = "^3.0.0"
- requests-mock = "^1.10.0"
>> tree .
.
βββ page_loader
βΒ Β βββ __init__.py
βΒ Β βββ load_processor
βΒ Β βΒ Β βββ __init__.py
βΒ Β βΒ Β βββ downloader.py
βΒ Β βΒ Β βββ file_system_guide.py
βΒ Β βΒ Β βββ html_parser.py
βΒ Β βΒ Β βββ name_converter.py
βΒ Β βΒ Β βββ data_loader.py
βΒ Β βΒ Β βββ saver.py
βΒ Β βββ cli.py
βΒ Β βββ logger.py
βΒ Β βββ progress.py
βΒ Β βββ scripts
βΒ Β βββ __init__.py
βΒ Β βββ run.py
βββ tests
β βββ auxiliary.py
β βββ fixtures
β βΒ Β βββ downloaded_nodejs_course.html
β βΒ Β βββ mocks
β βΒ Β βββ assets-application.css
β βΒ Β βββ assets-professions-nodejs.png
β βΒ Β βββ courses.html
β βΒ Β βββ packs-js-runtime.js
β βΒ Β βββ source_nodejs_course.html
β βββ test_cli.py
β βββ test_downloader.py
β βββ test_file_system_guide.py
β βββ test_html_parser.py
βββ journal.log
βββ Makefile
βββ poetry.lock
βββ pyproject.toml
βββ README.md
βββ setup.cfg
The commands most used in development are listed in the Makefile:
make package-install
- Installing a package in the user environment.
make build
- Building the distribution of he Poetry package.
make package-force-reinstall
- Reinstalling the package in the user environment.
make lint
- Checking code with linter.
make test
- Tests the code.
make fast-check
- Builds the distribution, reinstalls it in the user's environment, checks the code with tests and linter.
Thank you for attention!
π¨βπ» Author: @IgorGakhov