Skip to content
#

beatifulsoup

Here are 157 public repositories matching this topic...

A simple Python web crawler that processes URLs from web pages, handles redirects, and skips non-HTML content. It supports HTTP/HTTPS, calculates same-domain link ratios, avoids duplicate URLs, and saves results in a TSV file. Designed for easy scalability and future extensions.

  • Updated Oct 15, 2024
  • Python

Improve this page

Add a description, image, and links to the beatifulsoup topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the beatifulsoup topic, visit your repo's landing page and select "manage topics."

Learn more