Skip to content

ttavni/PyWebScraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SimpleWebScrapes

A set of functions and classes to help web scraping and simple web audits

from pyscraper.sitemapper import Sitemapper
from pyscraper.scrapper import BatchScrape

sitemap = 'https://www.wunderman.com/sitemap.xml'

page_urls = Sitemapper(sitemap)
completed_urls, broken_urls = BatchScrape(page_urls)

In addition you can now visualise the hierachical nature of the sitemap and produce a d3.js visualisation

# Visualise pages
from pyscraper.viz import VisualiseSitemap
VisualiseSitemap(page_urls)

Visualisation

About

A set of functions and classes to help web scraping and simple web audits

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published