pr0x-tube - an automated scraping and processing for tube sites

pr0x-tube is an ongoing long-term project, which scrapes tube sites for its content, converts them into a JSON file and reposts them on your own Blog or CMS. This project is only for educational purpose and was 'forked' from a very old own project, written in PHP and Javascript. pr0x-tube will be fully written and developed in Python, using Mezzanine as CMS and an own API to handle the JSON files correctly.

How does it work?

pr0x-tube mainly uses BeautifulSoup and requests to first load an sitemap-url, then loads all existing blog post urls into an excel file. If pr0x-tube has a stable set of urls, it can load all urls in that excel step by step, to scrape all needed information for the upcoming JSON file. The needed information we need are "title", "content", "categories", "media link" and "download link" (if possible). Those information will be written in the JSON file, then the API sends it via POST-request to our CMS/Blog and creates a blog post.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.idea		.idea
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
csv_m.py		csv_m.py
parser.py		parser.py
posts-api.py		posts-api.py
requirements.txt		requirements.txt
uploader.py		uploader.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pr0x-tube - an automated scraping and processing for tube sites

How does it work?

About

Releases

Packages

Contributors 2

Languages

License

N0W3N/pr0x-tube

Folders and files

Latest commit

History

Repository files navigation

pr0x-tube - an automated scraping and processing for tube sites

How does it work?

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages