Code by 🧑💻Trong-Dat Ngo.
🕳️CygnusX1 is a multithreaded tool 🛠️, used to search and download images from popular search engines 🔎. It is straightforward to set up and run!
- 🥰 No knowledge is required to set up and to run.
- 🚀 Download image using customizable number of threads.
- ⛏️Crawl all possible images (search results and recommendations).
This repository is tested on Python 3.6+ and PyTorch selenium 3.141.0+, as well as it works fine on macOS, Windows, Linux.
You should setup and run 🕳️CygnusX1 in a virtual environment. If you're unfamiliar with Python virtual environments, check out the user guide here.
First, create a virtual environment with the version of Python you're going to use and activate it. (Can be omitted if you want to set up directly on the OS environment)
source venv/bin/activate
Install 🕳️CygnusX1 by pip:
pip install CygnusX1
Download 🕳️CygnusX1 from Github:
git clone https://github.com/dat821168/CygnusX1.git
Finally install dependencies in requirements.txt
:
pip install -r requirements.txt
Use cygnusx1 command line:
cygnusx1 --keywords "keyword 1, keyword 2" --workers 8 --use_suggestions --headless
Use run.py
to start the script:
python run.py --keywords "keyword 1, keyword 2" --workers 8 --use_suggestions --headless
Argument details:
--keywords
: Indicate the keywords/keyphrases you want to search. For multiple keywords, separate them with commas.--out_dir
: Path where to save results. Default = './IMAGES'.--workers
: The maximum number of workers used to crawl image. Default = 2.--use_suggestions
: Crawl search engine suggestions/recommendations. Default = False.--headless
: Hide browser during scraping. Default = False.
-
Suppor Google search engine. - Support Bing search engine.
- Support Baidu search engine.
- Limit amount of images which you want to download.
- Detect duplicated or very similar images.