-
Notifications
You must be signed in to change notification settings - Fork 0
fix: Added incomplete file removal! #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
not to face errors when the crawling is interrupted!
WalkthroughThe Changes
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
hivemind_etl/mediawiki/etl.py (1)
37-44: Good addition of error handling for incomplete file cleanup.The added try-except block properly handles errors during the crawling process and cleans up any incomplete files that might be left behind. This is a good defensive programming practice.
A few suggestions to consider:
- You might want to catch more specific exceptions if you know what types of errors can occur during crawling, rather than catching all exceptions with
Exception.- Consider adding a debug log before deleting the directory that includes the path being removed for easier troubleshooting.
try: self.wikiteam_crawler.crawl(api_url, dump_dir) - except Exception as e: + except (IOError, ConnectionError, Exception) as e: # Be more specific if possible logging.error(f"Error crawling {api_url}: {e}") logging.warning("Removing incomplete dumped data if available!") if os.path.exists(dump_dir): + logging.debug(f"Removing directory: {dump_dir}") shutil.rmtree(dump_dir) raise e
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
hivemind_etl/mediawiki/etl.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
hivemind_etl/mediawiki/etl.py (1)
hivemind_etl/mediawiki/wikiteam_crawler.py (1)
crawl(26-72)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: ci / test / Test
- GitHub Check: ci / lint / Lint
not to face errors when the crawling is interrupted!
Summary by CodeRabbit