-
Notifications
You must be signed in to change notification settings - Fork 0
feat: Added proxy support for mediaWiki ETL! #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe changes enhance proxy support for the Mediawiki ETL process. A new environment variable, Changes
Sequence Diagram(s)sequenceDiagram
participant ETL as MediawikiETL
participant Env as Environment
participant Crawler as WikiteamCrawler
ETL->>Env: Read MEDIAWIKI_PROXY_URL
Env-->>ETL: Return proxy URL (or empty string)
ETL->>Crawler: Instantiate with proxy_url parameter
Crawler->>Crawler: Append '--proxy' option if proxy_url provided
Poem
Tip ⚡💬 Agentic Chat (Pro Plan, General Availability)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
hivemind_etl/mediawiki/etl.py (1)
20-22: Proxy URL configuration from environment variables.The code retrieves the proxy URL from environment variables and logs when it's being used. Good practice to inform users about the proxy configuration.
Remove the unnecessary
fprefix from the string that doesn't contain any placeholders:- logging.info(f"Proxy is set to be used!") + logging.info("Proxy is set to be used!")🧰 Tools
🪛 Ruff (0.8.2)
22-22: f-string without any placeholders
Remove extraneous
fprefix(F541)
hivemind_etl/mediawiki/wikiteam_crawler.py (1)
55-57: Added proxy URL to crawling parameters.The code now correctly appends the proxy URL to the parameters list when provided, enabling proxy support during the crawling process.
Remove the unnecessary
fprefix from the string that doesn't contain any placeholders:- params.append(f"--proxy") + params.append("--proxy")🧰 Tools
🪛 Ruff (0.8.2)
56-56: f-string without any placeholders
Remove extraneous
fprefix(F541)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
.env.example(1 hunks)hivemind_etl/mediawiki/etl.py(2 hunks)hivemind_etl/mediawiki/wikiteam_crawler.py(3 hunks)requirements.txt(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
hivemind_etl/mediawiki/etl.py (1)
hivemind_etl/mediawiki/wikiteam_crawler.py (1)
WikiteamCrawler(7-91)
🪛 Ruff (0.8.2)
hivemind_etl/mediawiki/etl.py
22-22: f-string without any placeholders
Remove extraneous f prefix
(F541)
hivemind_etl/mediawiki/wikiteam_crawler.py
56-56: f-string without any placeholders
Remove extraneous f prefix
(F541)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: ci / lint / Lint
- GitHub Check: ci / test / Test
🔇 Additional comments (5)
.env.example (1)
31-31: Environment variable added for proxy support.The addition of
MEDIAWIKI_PROXY_URLallows configuration of proxy settings for the MediaWiki ETL process. This is a good practice to document the new configuration option in the example environment file.hivemind_etl/mediawiki/etl.py (2)
2-2: Added os module import for environment variable access.The
osmodule is imported to access theMEDIAWIKI_PROXY_URLenvironment variable, which is necessary for the proxy functionality.
24-26: WikiteamCrawler now initialized with proxy support.The
WikiteamCrawleris now correctly initialized with the proxy URL parameter, enabling proxy support during the crawling process.hivemind_etl/mediawiki/wikiteam_crawler.py (2)
15-15: Added proxy_url parameter to constructor.The
proxy_urlparameter has been added to theWikiteamCrawlerconstructor with a sensible default of an empty string, making the proxy support optional.
24-24: Proxy URL stored as instance variable.The
proxy_urlis stored as an instance variable to be used in the crawling process, maintaining consistent access to this parameter throughout the class.
| motor>=3.6, <4.0.0 | ||
| tc-temporal-backend==1.0.0 | ||
| wikiteam3==4.4.1 | ||
| wikiteam3-fork-proxy==1.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Dependency updated to support proxy functionality.
The library has been changed from wikiteam3==4.4.1 to wikiteam3-fork-proxy==1.0.0, which appears to be a fork with added proxy support capabilities.
🏁 Script executed:
#!/bin/bash
# Verify the availability and metadata of the new package
pip show wikiteam3-fork-proxy 2>/dev/null || echo "Package not found in PyPI"
# Try to find information about the package repository
pip install yolk3k -q 2>/dev/null
yolk -M wikiteam3-fork-proxy 2>/dev/null || echo "No metadata available"Length of output: 200
Dependency Update Verification: Potential Source Configuration Issue
The dependency in requirements.txt (line 11):
wikiteam3-fork-proxy==1.0.0
has been updated to support proxy functionality. However, our verification revealed that the package is not available on PyPI and no metadata could be retrieved. Please confirm if this forked dependency is intended to be installed from a non-PyPI source (such as a custom repository or a direct VCS reference). If so, consider updating the dependency specification (or the installation instructions) accordingly to ensure that users can correctly obtain the package.
Summary by CodeRabbit
New Features
Chores