Note : This repository is under construction 🛠️
Hands-on workshop material on Web scraping using Python.
- git clone https://github.com/MonashDataFluency/python-web-scraping.git
- cd python-web-scraping
- virtualenv -p python3 venv
- source venv/bin/activate
- pip install -r requirements.txt
- mkdocs serve
Note : wptools might throw an error during installtion, in which case install other depencies as :
- sudo apt install libcurl4-openssl-dev libssl-dev
and then proceed to install wptools (via step 5 above)
Note: run jupyter nbconvert --output-dir='markdowns/' --to markdown notebooks/*.ipynb
from the root directory to generate the markdowns from notebooks.
- Rename the files
- Add References
- Compile and build the website
- githook for auto compile and build
- As many images (with brief explantions within) as possible : (LucidChart, Google draw)
- Add a Reference section
- Archive the website
- Backup code/cell for requests
- Complete the DF and regex section (pythex website)
- Move the variable argument section to advanced topics
- Add more text/explanations
- Add more about html in text and add an image
- Give more description in the image
- Fix the issue of broken image
- Add json section -- added in section 0
- DOM inspector
- Talk a bit about RESTful WS
- Long term: Add images/flowchart for better explanation
- Shorten/Format the big html chunk
- Prettify the output of html
- Put more details on get/put requests (possibly visually)
- Add more explanations
- itertools (show vanilla python way to do it)
- explain regex in detail (breakdown)
- matplotlib funcs
- Add MCQs - Scenario based legal/grey questions