Skip to content

MonashDataFluency/python-web-scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

python-web-scraping

Note : This repository is under construction 🛠️⚠️

Hands-on workshop material on Web scraping using Python.

To build and run site locally :

  1. git clone https://github.com/MonashDataFluency/python-web-scraping.git
  2. cd python-web-scraping
  3. virtualenv -p python3 venv
  4. source venv/bin/activate
  5. pip install -r requirements.txt
  6. mkdocs serve

Note : wptools might throw an error during installtion, in which case install other depencies as :

  • sudo apt install libcurl4-openssl-dev libssl-dev

and then proceed to install wptools (via step 5 above)

Note: run jupyter nbconvert --output-dir='markdowns/' --to markdown notebooks/*.ipynb from the root directory to generate the markdowns from notebooks.

TODO:

General:

  • Rename the files
  • Add References
  • Compile and build the website
  • githook for auto compile and build
  • As many images (with brief explantions within) as possible : (LucidChart, Google draw)
  • Add a Reference section
  • Archive the website
  • Backup code/cell for requests

Section 0

  • Complete the DF and regex section (pythex website)
  • Move the variable argument section to advanced topics
  • Add more text/explanations

Section 1

  • Add more about html in text and add an image
  • Give more description in the image
  • Fix the issue of broken image
  • Add json section -- added in section 0
  • DOM inspector
  • Talk a bit about RESTful WS
  • Long term: Add images/flowchart for better explanation

Section 2

  • Shorten/Format the big html chunk
  • Prettify the output of html
  • Put more details on get/put requests (possibly visually)

Section 3

  • Add more explanations

Section 4

  • itertools (show vanilla python way to do it)
  • explain regex in detail (breakdown)
  • matplotlib funcs

Section 5

  • Add MCQs - Scenario based legal/grey questions

About

Hands-on workshop material on Web scraping using Python

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •