Skip to content

Soldy/simple_scrapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Usage

At the moment is working only in Linux. But I suggest trying it in a Linux container. A Debian-based container has provided an install.sh to prepare the dependencies.

sh install.sh

Also can work with pip but has not been tested.

pip i

In theoretically working on windows as well. I'm not sure about that. For a test run just start the start.sh.

sh start.sh

and also the pytest test.py is there.

additional

Most of my web scrapping is in rust and based in DL. Doing this with the hand is a little bit different. The best solution is probably the web driver. GreaseMonkey is one of the alternative solutions. Things like curl only work if the page has no javascript or wasm inside. If we stay in PHP, javascript, python, and java. My solution is python. Even though is python, far not my strongest skill, but is an all-in tool. The only third-party package that we need after that is selenium. NodeJS has much more options like the puppeteer, cheerio. My javascript skills are strong. The problem is node has too many options, typescript, and test tools (jest, mocha, chai, karma, ava, or jasmine), documenting...
PHP is a good lang one of my strongest skills. Not a bad choice. But not good as python for this. Java will probably be a better option than python. But my java skills more are limited. That should take more time for me.

About

Why not ?

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published