Skip to content

feat: Add possibility to use None as no proxy in tiered proxies#760

Merged
Pijukatel merged 6 commits into
masterfrom
none-tiered-proxy
Dec 10, 2024
Merged

feat: Add possibility to use None as no proxy in tiered proxies#760
Pijukatel merged 6 commits into
masterfrom
none-tiered-proxy

Conversation

@Pijukatel

@Pijukatel Pijukatel commented Nov 28, 2024

Copy link
Copy Markdown
Collaborator

Using None in input lists for tiered proxy configuration will be interpreted as no proxy. This allows to have for example lowest tier without proxy.

Add tests for it.
Update docs.

Add tests for it.
Update docs.

Unrelated:
Also delete ignore that was no longer needed.
@Pijukatel Pijukatel added enhancement New feature or request. t-tooling Issues with this label are in the ownership of the tooling team. labels Nov 28, 2024
@github-actions github-actions Bot added this to the 103rd sprint - Tooling team milestone Nov 28, 2024
@github-actions github-actions Bot added the tested Temporary label used only programatically for some analytics. label Nov 28, 2024
"""
self._next_custom_url_index = 0
self._used_proxy_urls = dict[str, URL]()
self._used_proxy_urls = dict[str, Union[URL, None]]()

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mypy was very aggressive when I wanted to use | on this specific line. I got bullied into using Union

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, AFAIK in generics the pipe is still not possible

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Pull Request Tookit has failed!

Pull request is neither linked to an issue or epic nor labeled as adhoc!



async def test_rotates_proxies_uniformly_with_no_request_no_proxy() -> None:
"""Not sure how real is this use case, but it is supported."""

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I keep this test? I am not even sure if this usecase makes much sense.

Remove unrelated pyproject change as it was done in different PR.
@Pijukatel Pijukatel marked this pull request as ready for review November 29, 2024 10:42
@Pijukatel Pijukatel assigned vdusek and unassigned vdusek Nov 29, 2024
@Pijukatel Pijukatel requested review from barjin and vdusek November 29, 2024 10:43
@barjin

barjin commented Nov 29, 2024

Copy link
Copy Markdown
Member

Looking good to me 👏🏽 (I'm glad to see this really was as simple as in the JS version). Have you tested this locally with a crawler (i.e. won't other parts of Crawlee mind the None instead of a proxy URL?) Do we already support this case with new_url_function returning None?

@Pijukatel

Pijukatel commented Nov 29, 2024

Copy link
Copy Markdown
Collaborator Author

Looking good to me 👏🏽 (I'm glad to see this really was as simple as in the JS version). Have you tested this locally with a crawler (i.e. won't other parts of Crawlee mind the None instead of a proxy URL?) Do we already support this case with new_url_function returning None?

I tested running

...
crawler = BeautifulSoupCrawler(
            proxy_configuration=ProxyConfiguration(tiered_proxy_urls=[[None, None], [None]]),
        )
await crawler.run(["https://crawlee.dev"])
...

I am not sure what you mean by "Do we already support this case with new_url_function returning None?". Is there a usecase example?

@Pijukatel

Copy link
Copy Markdown
Collaborator Author

Looking good to me 👏🏽 (I'm glad to see this really was as simple as in the JS version). Have you tested this locally with a crawler (i.e. won't other parts of Crawlee mind the None instead of a proxy URL?) Do we already support this case with new_url_function returning None?

I tested running

...
crawler = BeautifulSoupCrawler(
            proxy_configuration=ProxyConfiguration(tiered_proxy_urls=[[None, None], [None]]),
        )
await crawler.run(["https://crawlee.dev"])
...

I am not sure what you mean by "Do we already support this case with new_url_function returning None?". Is there a usecase example?

Ok I think I get it. I looked only at tiered proxies. Currently this will still not work:

crawler = BeautifulSoupCrawler( proxy_configuration=ProxyConfiguration(proxy_urls=[None, None]),)
await crawler.run(["https://crawlee.dev"])

I will make it work consistently for all possible ways of setting the proxies to None

@Pijukatel Pijukatel marked this pull request as draft November 29, 2024 14:26
@barjin

barjin commented Nov 29, 2024

Copy link
Copy Markdown
Member

Right, sorry, I might have been confusing - in JS version, we support the newUrlFunction (alternative of new_url) returning null to turn the proxy off. I'm actually not sure if you can pass null in proxyUrls (although it would make sense).

obrazek

I will make it work consistently for all possible ways of setting the proxies to None

Sounds good! (and hopefully not too much work 🤞🏽 )

@Pijukatel Pijukatel marked this pull request as ready for review December 3, 2024 13:56
@vdusek vdusek changed the title feat: Add possibility to use None as no proxy in tiered proxies. feat: Add possibility to use None as no proxy in tiered proxies Dec 5, 2024

@vdusek vdusek left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one note about the test documentation; otherwise, it looks good. Thanks.

Comment thread tests/unit/proxy_configuration/test_tiers.py
Comment thread tests/unit/proxy_configuration/test_tiers.py Outdated
@Pijukatel Pijukatel merged commit 0fbd017 into master Dec 10, 2024
@Pijukatel Pijukatel deleted the none-tiered-proxy branch December 10, 2024 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request. t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tieredProxyUrls accept null for switching the proxy off

3 participants