Rasp is a python library of 'engines' for getting rendered web data.
It is an open source project maintained by Anidata, a non-profit corporation consisting of established and aspiring data scientists who have come together with the purpose of applying data science to make a difference in the community.
Rasp is freely licensed under the BSD-3-Clause license.
Currently this repo is still in development and is not on pypi or conda yet, so install with:
$ pip install git+https://github.com/anidata/rasp.git
There are 4 types of engine supported by rasp, differing by how they are accessing web pages. They all implement the same general functionality:
- get_page_source(url): takes in a url, and returns a Webpage object, containing all we can find out about that webpage. Notably its source.
The 4 implemented engines are:
- DefaultEngine: backed by requests
- SeleniumEngine: backed by selenium and the firefox driver
- TorEngine: which is backed by Tor
To run the tests install the dev dependencies listed in requirements-dev.txt
and run pytest