Skip to content

Add Oxylabs Web Scraping tools #2905

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 23, 2025

Conversation

oxy-rostyslav
Copy link
Contributor

@oxy-rostyslav oxy-rostyslav commented May 26, 2025

Changes

This PR adds 4 Oxylabs Web Scraping tools:

  • OxylabsAmazonProductScraperTool
  • OxylabsAmazonSearchScraperTool
  • OxylabsGoogleSearchScraperTool
  • OxylabsUniversalScraperTool

Example

from crewai_tools import OxylabsUniversalScraperTool

tool = OxylabsUniversalScraperTool()

result = tool.run(url="https://ip.oxylabs.io")

print(result)

Link to the CrewAI-tools PR

@joaomdmoura
Copy link
Collaborator

Disclaimer: This review was made by a crew of AI Agents.

Code Review Comment for PR #2905

Overview

This pull request introduces four new Oxylabs web scraping tools:

  • OxylabsAmazonProductScraperTool
  • OxylabsAmazonSearchScraperTool
  • OxylabsGoogleSearchScraperTool
  • OxylabsUniversalScraperTool

The changes also include comprehensive documentation to facilitate ease of integration and usage.

Code Quality Findings

  1. Code Structure and Clarity:

    • The new tools are well-structured, and the naming conventions are consistent with previous tools in the project, enhancing readability.
    • Example code provided in the documentation is clear and straightforward. However, it could be beneficial to implement additional examples that cover various scenarios or common edge cases.

    Suggestion: Add unit tests to ensure that each scraper handles input and errors correctly. For instance:

    tool = OxylabsAmazonProductScraperTool(username="USERNAME", password="PASSWORD")
    # Test with a valid product URL
    assert tool.run(url="https://www.amazon.com/dp/B08X9FGGF2").results[0].content is not None
    # Test with an invalid URL
    assert tool.run(url="https://www.invalid-url.com").results[0].content is None
  2. Documentation:

    • The documentation is thorough but can be enhanced by including links to relevant previous PRs where similar functionalities were discussed.
    • It might also be useful to include a separate section on “Common Issues” and best practices for each tool to assist new users.
  3. Error Handling:

    • Implementations should ensure robust error handling for network failures or invalid parameters.
    • Consider using exception handling to manage cases where the scraping fails.

    Example Suggestion:

    try:
        result = tool.run(url="https://example.com")
    except Exception as e:
        print(f"Error during scrape: {e}")

Historical Context

  • The discussions from PR #312 delve into previous iterations of scraping tools that were more limited in scope. Key user feedback emphasized expanding functionality across different platforms, which this PR successfully addresses.
  • Previous PRs highlighted the importance of thorough documentation and usability enhancements; it’s encouraging to see that feedback has been incorporated.

Implications for Related Files

  • Scalability: The addition of these tools significantly enhances the scraping capabilities of the existing system, enabling it to fetch data from more platforms effectively. This could introduce potential future performance overhead.
  • Backward Compatibility: Care should be taken to ensure that the existing code continues to function as expected before merging, particularly for users relying solely on the previous scraping tools.

Conclusion

Overall, this PR is a significant enhancement. A few improvements related to error handling, additional examples, and improved documentation could further solidify its robustness and user-friendliness. Consider implementing these suggestions to optimize the integration of the new tools.

Great work on these additions!

@oxy-rostyslav
Copy link
Contributor Author

Hi @joaomdmoura,

Regarding the changes described in the comment above:

  1. I added basic unit tests from the examples of the other tools that verify the instantiation of the tool classes. Not sure if it makes sense to add any other tests, since all the _run methods in tools just call the methods in the Oxylabs SDK.
  2. Examples of the tools' usage were included in the documentation. I believe, they should be enough for the user to understand how the tools can be used.
  3. Error handling is implemented on the SDK side, so there should not be any problem in the tool classes.

Please tell me if anything else should be changed.

@oxy-rostyslav oxy-rostyslav force-pushed the add-oxylabs-tools branch 2 times, most recently from cb6e0f2 to bd68b7a Compare June 17, 2025 07:33
Copy link
Contributor

@tonykipkemboi tonykipkemboi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please put all the tools in one file since this is from one provider? also, minimize the links in each doc to at least 1 or 2.

@oxy-rostyslav
Copy link
Contributor Author

@tonykipkemboi done

@tonykipkemboi tonykipkemboi merged commit c96d4a6 into crewAIInc:main Jun 23, 2025
7 of 11 checks passed
dhyeyinf pushed a commit to dhyeyinf/crewAI that referenced this pull request Jul 3, 2025
* Add Oxylabs tools

* Review updates

* Review updates

---------

Co-authored-by: Tony Kipkemboi <iamtonykipkemboi@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants