ProxyCrawl

An easy-to-use Python HTTP requests library for scraping and crawling websites using ProxyCrawl API.

Currently support requests, aiohttp and scrapy!

What is ProxyCrawl API?

ProxyCrawl API allows you scrape while being anonymous and bypass any restriction, blocks or captchas.

For more information and registration, please go to proxycrawl.com.

Registration

To use ProxyCrawl, you need to register an account and obtain your token at proxycrawl.com. You can find your token under Dashboard / API Documentation / URL parameters.

Installation

Installing from GitHub (bash):

pip3 install git+https://github.com/SYAN83/proxycrawl

Installing from PyPi is not available yet.

How to use

proxycrawl includes three classes: ProxySession for synchronous HTTP requests, AsyncProxySession for asynchronous HTTP Requests, as well as ScrapyProxyRequest for scrapy.

ProxySession inherits requests.Session class:

from proxycrawl import ProxySession

session = ProxySession(token='****************')
response = session.get(url='https://github.com/')
print(response.text)

To verify that the IP address changes every time when you make a request call, you can run:

from proxycrawl import ProxySession

session = ProxySession(token='****************')
ip = session.test()
print(ip)

AsyncProxySession inherits aiohttp.ClientSession class:

import aiohttp
import asyncio
from proxycrawl import AsyncProxySession

async def fetch(session, url):
    async with session.get(url) as response:
        return await response.text()

async def main():
    async with AsyncProxySession(token='****************', 
                                 connector=aiohttp.TCPConnector(ssl=False)) as session:
        content = await fetch(session=session, url='https://github.com/')
        print(content)

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

AsyncProxySession.test() allows you to verify that the IP address changes in async mode:

import aiohttp
import asyncio
from proxycrawl import AsyncProxySession

async def main():
    async with AsyncProxySession(token='****************', 
                                 connector=aiohttp.TCPConnector(ssl=False)) as session:
        ip = await session.test()
        print(ip)

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

ScrapyProxyRequest inherits scrapy.http.Request class:

import scrapy
from proxycrawl import scrapyProxyRequest


ScrapyProxyRequest = scrapyProxyRequest(token='****************')


class QuotesSpider(scrapy.Spider):
    name = "quotes"

    def start_requests(self):
        urls = [
            'http://quotes.toscrape.com/page/1/',
            'http://quotes.toscrape.com/page/2/',
        ]
        for url in urls:
            yield ScrapyProxyRequest(url=url, callback=self.parse)

    def parse(self, response):
        page = response.url.split("/")[-2]
        filename = 'quotes-%s.html' % page
        with open(filename, 'wb') as f:
            f.write(response.body)
        self.log('Saved file %s' % filename)

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
proxycrawl		proxycrawl
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ProxyCrawl

What is ProxyCrawl API?

Registration

Installation

How to use

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ProxyCrawl

What is ProxyCrawl API?

Registration

Installation

How to use

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages