-
-
Notifications
You must be signed in to change notification settings - Fork 306
Add metadata quality checker tool #5278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
👋 Hi @Preetam-77! This pull request needs a peer review before it can be merged. Please request a review from a team member who is not:
Once a valid peer review is submitted, this check will pass automatically. Thank you! |
WalkthroughIntroduces a new Metadata Quality Checker Tool for OWASP projects. A Python CLI system analyzes project metadata files, validates them against defined rules, computes numeric quality scores, and reports findings. The tool comprises a main orchestrator, validation rules engine, scoring logic, sample data, and documentation. Changes
Sequence DiagramsequenceDiagram
participant User as User/CLI
participant Main as checker.main()
participant Loader as load_metadata()
participant Rules as check_rules()
participant Score as calculate_score()<br/>& get_status()
participant Output as Report Output
User->>Main: Execute script
Main->>Loader: Load metadata file
Loader-->>Main: Parsed projects[]
loop For each project
Main->>Rules: Validate project metadata
Rules-->>Main: Issues list
Main->>Score: Calculate score from project
Score-->>Main: Numeric score (0-100)
Main->>Score: Get status for score
Score-->>Main: Status (good/needs improvement/poor)
Main->>Output: Print project report<br/>(name, score + status, issues)
end
Output-->>User: Complete quality report
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
❌ Pre-commit checks failedThe pre-commit hooks found issues that need to be fixed. Please run the following commands locally to fix them: # Install pre-commit if you haven't already
pip install pre-commit
# Run pre-commit on all files
pre-commit run --all-files
# Or run pre-commit on staged files only
pre-commit runAfter running these commands, the pre-commit hooks will automatically fix most issues. 💡 Tip: You can set up pre-commit to run automatically on every commit by running: pre-commit installPre-commit outputFor more information, see the pre-commit documentation. |
❌ Tests failedThe Django tests found issues that need to be fixed. Please review the test output below and fix the failing tests. How to run tests locally# Install dependencies
poetry install --with dev
# Run all tests
poetry run python manage.py test
# Run tests with verbose output
poetry run python manage.py test -v 3
# Run a specific test
poetry run python manage.py test app.tests.TestClass.test_methodTest output (last 100 lines)For more information, see the Django testing documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
metadata_quality_checker/score.py (1)
7-8: Inconsistent dictionary access style.Line 7 uses
project.get("tags")for the type check but then switches toproject["tags"]for length. While functionally safe due to short-circuit evaluation, consider using consistent.get()access for readability:- if isinstance(project.get("tags"), list) and len(project["tags"]) >= 2: + if isinstance(project.get("tags"), list) and len(project.get("tags", [])) >= 2:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (5)
metadata_quality_checker/README.md(1 hunks)metadata_quality_checker/checker.py(1 hunks)metadata_quality_checker/rules.py(1 hunks)metadata_quality_checker/sample_metadata.json(1 hunks)metadata_quality_checker/score.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
metadata_quality_checker/checker.py (2)
metadata_quality_checker/rules.py (1)
check_rules(4-38)metadata_quality_checker/score.py (2)
calculate_score(1-23)get_status(26-31)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Run Tests
- GitHub Check: docker-test
🔇 Additional comments (4)
metadata_quality_checker/sample_metadata.json (1)
1-20: Sample data effectively demonstrates validation.The two projects provide good contrast: one with multiple quality issues (empty tags, short pitch, inactive) and one with complete metadata. This effectively showcases the tool's validation capabilities.
metadata_quality_checker/score.py (1)
26-31: Status mapping logic is clear and correct.The thresholds appropriately categorize metadata quality into three tiers.
metadata_quality_checker/checker.py (2)
24-43: Main orchestration logic is clear and correct.The function properly coordinates the validation, scoring, and reporting workflow. The output formatting with emojis makes the results easy to read.
46-47: Correct entry point implementation.
| def load_metadata(): | ||
| # If user passes a file path: python checker.py file.json | ||
| if len(sys.argv) > 1: | ||
| metadata_path = Path(sys.argv[1]) | ||
| else: | ||
| # Default to sample file | ||
| metadata_path = Path(__file__).parent / "sample_metadata.json" | ||
|
|
||
| if not metadata_path.exists(): | ||
| print(f"❌ Metadata file not found: {metadata_path}") | ||
| sys.exit(1) | ||
|
|
||
| with open(metadata_path, "r", encoding="utf-8") as f: | ||
| return json.load(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add JSON error handling and clarify CLI interface.
Issues identified:
- Discrepancy with PR objectives: The PR description states "Run:
python checker.py --file sample_metadata.json" suggesting a--fileflag, but the implementation uses a positional argument. Consider adding proper argument parsing with argparse:
import argparse
def load_metadata():
parser = argparse.ArgumentParser(description="OWASP Metadata Quality Checker")
parser.add_argument("--file", type=str, help="Path to metadata JSON file")
args = parser.parse_args()
if args.file:
metadata_path = Path(args.file)
else:
metadata_path = Path(__file__).parent / "sample_metadata.json"
# ... rest of logic- Missing JSON error handling: Malformed JSON will crash with an unhandled
json.JSONDecodeError:
with open(metadata_path, "r", encoding="utf-8") as f:
- return json.load(f)
+ try:
+ data = json.load(f)
+ if not isinstance(data, list):
+ print(f"❌ Metadata file must contain a JSON array")
+ sys.exit(1)
+ return data
+ except json.JSONDecodeError as e:
+ print(f"❌ Invalid JSON in metadata file: {e}")
+ sys.exit(1)🤖 Prompt for AI Agents
In metadata_quality_checker/checker.py around lines 8 to 21, replace the
positional-arg logic with argparse to accept a --file flag (use args.file if
provided, otherwise default to sample_metadata.json), keep the existence check,
and add JSON error handling: when opening/reading the file, catch
json.JSONDecodeError (and optionally OSError) to print a clear error message
including the file path and the JSON error, then sys.exit(1); ensure behavior
and exit codes remain consistent when the file is missing or invalid.
| ## Usage | ||
|
|
||
| ```bash | ||
| python checker.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Usage documentation is incomplete.
The usage section doesn't document the CLI argument for specifying a custom metadata file. Based on the implementation in checker.py (line 10-11), update the documentation to show:
## Usage
```bash
# Use the default sample file
python checker.py
# Analyze a custom metadata file
python checker.py path/to/metadata.json
Note: The PR objectives mention a `--file` flag, but the implementation uses a positional argument. Consider whether the implementation should be updated to match the documented behavior.
<details>
<summary>🤖 Prompt for AI Agents</summary>
In metadata_quality_checker/README.md around lines 14 to 17, the Usage section
omits the CLI positional argument for specifying a custom metadata file and
mismatches the PR note about a --file flag; update the README to show both
examples: running with the default sample (python checker.py) and running with a
custom metadata file path (python checker.py path/to/metadata.json), and add a
short note that the script currently accepts a positional file argument (not a
--file flag) so maintainers can decide whether to change the implementation to
accept a --file/--path option instead of a positional argument.
</details>
<!-- fingerprinting:phantom:poseidon:puma -->
<!-- This is an auto-generated comment by CodeRabbit -->
| last_commit = project.get("last_commit") | ||
| if last_commit: | ||
| try: | ||
| commit_date = datetime.fromisoformat(last_commit) | ||
| if commit_date < datetime.now() - timedelta(days=365): | ||
| issues.append("Project inactive (no commits in last 12 months)") | ||
| except ValueError: | ||
| issues.append("Invalid last_commit date format") | ||
| else: | ||
| issues.append("Missing activity data") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Timezone-aware datetime comparison may cause TypeError.
Line 31 compares commit_date with datetime.now(), which is timezone-naive. If last_commit contains timezone information (e.g., "2025-11-20T10:00:00+00:00"), datetime.fromisoformat() returns a timezone-aware datetime, causing a TypeError when compared with the naive datetime.now().
Apply this diff to handle both timezone-aware and naive datetimes:
try:
commit_date = datetime.fromisoformat(last_commit)
+ # Make comparison timezone-aware if needed
+ now = datetime.now(commit_date.tzinfo) if commit_date.tzinfo else datetime.now()
- if commit_date < datetime.now() - timedelta(days=365):
+ if commit_date < now - timedelta(days=365):
issues.append("Project inactive (no commits in last 12 months)")Alternatively, standardize on UTC:
+from datetime import datetime, timedelta, timezone
...
try:
commit_date = datetime.fromisoformat(last_commit)
+ # Convert to UTC for comparison
+ if commit_date.tzinfo is None:
+ commit_date = commit_date.replace(tzinfo=timezone.utc)
- if commit_date < datetime.now() - timedelta(days=365):
+ if commit_date < datetime.now(timezone.utc) - timedelta(days=365):
issues.append("Project inactive (no commits in last 12 months)")|
please move this to the OWASP-metadata project |
Summary
This PR introduces a Metadata Quality Checker Tool under the
Tools/metadata_quality_checker/folder.The tool is designed to analyze and validate OWASP project metadata for completeness, consistency, and quality.
Features
Motivation
High-quality metadata is crucial for the OWASP Metadata Aggregation Project to provide accurate recommendations and analytics.
This tool helps contributors and maintainers ensure metadata is complete and standardized.
Files Added
Tools/metadata_quality_checker/checker.pyTools/metadata_quality_checker/rules.pyTools/metadata_quality_checker/score.pyTools/metadata_quality_checker/sample_metadata.jsonTools/metadata_quality_checker/README.mdHow to Test
Tools/metadata_quality_checker/