-
-
Notifications
You must be signed in to change notification settings - Fork 307
Implement 10-Year GSOC PR Activity Report Aggregated Across Multiple OWASP Repositories #5230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
👋 Hi @Nachiket-Roy! This pull request needs a peer review before it can be merged. Please request a review from a team member who is not:
Once a valid peer review is submitted, this check will pass automatically. Thank you! |
|
Warning Rate limit exceeded@DonnieBLT has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 1 minutes and 55 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (4)
WalkthroughAdds a GSOC PR reporting feature: new URL and Django view that aggregates merged PRs by year/repo, a template embedding escaped JSON, client-side JS to parse/render/export charts and reports, navigation links, and DB indexes plus migration to support PR queries. Changes
Sequence DiagramsequenceDiagram
actor User
participant Browser
participant DjangoView as Django View (gsoc_pr_report)
participant Database
participant TemplateEngine as Template Engine
participant ClientJS as Client-side JS
User->>Browser: Navigate to /gsoc/pr-report/
Browser->>DjangoView: HTTP GET /gsoc/pr-report/
DjangoView->>Database: Query merged PR records (GitHubIssue)
Database-->>DjangoView: PR rows
DjangoView->>DjangoView: Aggregate yearly & repo stats, build JSON payload
DjangoView->>TemplateEngine: Render `gsoc_pr_report.html` (embed escaped JSON)
TemplateEngine-->>Browser: HTML with embedded data + script tag
Browser->>ClientJS: Load & execute `gsoc_pr_report.js`
ClientJS->>ClientJS: Parse embedded JSON into window.gsocData (fallbacks if needed)
ClientJS->>ClientJS: Normalize data and initialize ApexCharts (bar + donut)
User-->>Browser: Trigger export/download/view raw actions
Browser->>ClientJS: Export actions (JSON/PDF/HTML) using dynamic script loading
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (3)
website/views/project.py (3)
2234-2234: Move the import to the module level.The
defaultdictimport should be at the top of the file with other imports rather than inside the function.Apply this diff:
At the top of the file (around line 1-11), add:
+from collections import defaultdict import concurrent.futures import ipaddress import jsonThen remove the import from inside the function:
- from collections import defaultdict - repo_totals = defaultdict(int)
2256-2269: Remove duplicate context data.The context contains duplicate JSON-encoded data:
yearly_chart_data_jsonandyearly_contribution_data_jsonboth encode the sameyearly_chart_datalist (line 2264-2265)- Both
report_dataandgsoc_data_jsoncontain the same information in different formatsThis creates unnecessary data duplication and may cause confusion.
Apply this diff to remove the duplicate:
context = { "report_data": report_data, "start_year": start_year, "end_year": current_year, "total_years": total_years, "total_repos": total_repos, "total_prs": total_prs, "summary_data": summary_data, "yearly_chart_data_json": json.dumps(yearly_chart_data), - "yearly_contribution_data_json": json.dumps(yearly_chart_data), "gsoc_data_json": json.dumps(report_data), "avg_prs_per_year": avg_prs_per_year, "top_repos_chart_data_json": json.dumps(top_repos_chart_data), }Then update the template to use
yearly_chart_data_jsoninstead ofyearly_contribution_data_json.
2197-2271: Add error handling for database queries and JSON serialization.The function lacks error handling for database operations and JSON serialization. If queries fail or data cannot be serialized, users will see a generic 500 error instead of a helpful message.
Consider wrapping the main logic in a try-except block:
def gsoc_pr_report(request): try: current_year = datetime.now().year start_year = current_year - 10 # ... rest of the logic ... return render(request, "projects/gsoc_pr_report.html", context) except GitHubIssue.DoesNotExist: messages.error(request, "No PR data available for GSOC analysis.") return redirect("project_list") except Exception as e: logger.error(f"Error generating GSOC PR report: {e}") messages.error(request, "An error occurred while generating the report. Please try again later.") return redirect("project_list")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (3)
blt/urls.py(2 hunks)website/templates/projects/gsoc_pr_report.html(1 hunks)website/views/project.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
blt/urls.py (1)
website/views/project.py (1)
gsoc_pr_report(2197-2271)
website/views/project.py (1)
website/models.py (1)
GitHubIssue(2089-2315)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: auto-assign
- GitHub Check: add_changes_requested_label
- GitHub Check: check_peer_review
- GitHub Check: Run Tests
- GitHub Check: Run pre-commit
- GitHub Check: docker-test
🔇 Additional comments (5)
blt/urls.py (2)
307-307: LGTM!The import is correctly placed and the function exists in the target module.
1244-1244: Verify that public access to the gsoc PR report endpoint is intentional.This URL pattern does not include authentication decorators. If
gsoc_pr_reportlacks authentication at the view level and should be restricted to authenticated users, add the@login_requireddecorator or appropriate permission checks.website/views/project.py (1)
47-47: LGTM!The GitHubIssue import is correctly placed and necessary for the new GSOC PR report functionality.
website/templates/projects/gsoc_pr_report.html (2)
338-353: LGTM!The data parsing logic includes proper error handling with try-catch, which prevents the page from breaking if JSON parsing fails.
384-417: LGTM!The chart initialization includes proper guards to only render when data exists, preventing JavaScript errors on empty datasets.
❌ Tests failedThe Django tests found issues that need to be fixed. Please review the test output below and fix the failing tests. How to run tests locally# Install dependencies
poetry install --with dev
# Run all tests
poetry run python manage.py test
# Run tests with verbose output
poetry run python manage.py test -v 3
# Run a specific test
poetry run python manage.py test app.tests.TestClass.test_methodTest output (last 100 lines)For more information, see the Django testing documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (2)
website/views/project.py (2)
2212-2219: Consider filtering out PRs without associated repositories.The
GitHubIssue.repofield is nullable. PRs without an associated repo will result inrepo__nameandrepo__repo_urlbeingNone, which could cause issues in the template when rendering links or aggregating by repo name.Add a filter to exclude issues without repos:
repos_with_prs = ( GitHubIssue.objects.filter( - type="pull_request", is_merged=True, merged_at__gte=start_date, merged_at__lte=end_date + type="pull_request", is_merged=True, merged_at__gte=start_date, merged_at__lte=end_date, + repo__isnull=False ) .values("repo__name", "repo__repo_url") .annotate(pr_count=Count("id"), unique_contributors=Count("contributor", distinct=True)) .order_by("-pr_count") )
2208-2228: Consider single-query optimization if performance becomes a concern.The current implementation executes one query per year (11 queries total). This is acceptable for a low-traffic report page, but if performance becomes an issue, consider fetching all data in a single query using
ExtractYearand grouping in Python.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (2)
website/templates/projects/gsoc_pr_report.html(1 hunks)website/views/project.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
website/views/project.py (1)
website/models.py (3)
GitHubIssue(2089-2315)Repo(1903-1980)UserProfile(896-1109)
🔇 Additional comments (5)
website/views/project.py (2)
4-4: LGTM!Import additions are appropriate for the new GSOC PR report functionality.
Also applies to: 9-9, 49-49, 57-57
2279-2282: LGTM!Error handling appropriately logs the error with context, provides a user-friendly message, and redirects to a safe fallback page.
website/templates/projects/gsoc_pr_report.html (3)
7-12: LGTM!Data container correctly uses
escapejsfilter on pre-serialized JSON data from the view.
360-393: LGTM!Chart initialization correctly checks for data existence before rendering and uses appropriate ApexCharts configurations.
396-411: LGTM!Export function correctly creates a downloadable JSON file with timestamp in filename.
|
@coderabbitai Other than the current fixes, could you please suggest what else remains to be done in this PR (charts, filters, pagination, etc.)? |
|
Does this PR need indexes for better performance, @coderabbitai? |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 28478 @Nachiket-Roy Yes, this PR definitely needs database indexes for optimal performance! 🎯 🔍 Current Performance IssuesYour GitHubIssue.objects.filter(
type="pull_request", # ❌ No index
is_merged=True, # ❌ No index
merged_at__gte=start_date, # ❌ No index
merged_at__lte=end_date, # ❌ No index
)Problem: The
Without indexes, Django will perform full table scans on every query, which becomes extremely slow as data grows. 🚀 Recommended Indexes1. Composite Index (Highest Priority) ⭐Create a composite index covering all three filter fields: # In website/models.py, add to GitHubIssue.Meta:
class GitHubIssue(models.Model):
# ... existing fields ...
class Meta:
unique_together = ("issue_id", "repo")
indexes = [
# Composite index for GSOC PR report queries
models.Index(
fields=["type", "is_merged", "merged_at"],
name="githubissue_pr_merged_idx",
),
# Additional index for date range queries
models.Index(
fields=["merged_at"],
name="githubissue_merged_date_idx",
),
]Why this works:
2. Create Database MigrationGenerate and apply the migration: # Generate migration
python manage.py makemigrations website --name add_githubissue_indexes
# Review the migration file (should look like below)
cat website/migrations/XXXX_add_githubissue_indexes.py
# Apply migration
python manage.py migrate websiteExpected migration content: from django.db import migrations, models
class Migration(migrations.Migration):
dependencies = [
("website", "XXXX_previous_migration"),
]
operations = [
migrations.AddIndex(
model_name="githubissue",
index=models.Index(
fields=["type", "is_merged", "merged_at"],
name="githubissue_pr_merged_idx",
),
),
migrations.AddIndex(
model_name="githubissue",
index=models.Index(
fields=["merged_at"],
name="githubissue_merged_date_idx",
),
),
]📊 Performance ImpactBefore Indexes:After Indexes:Expected improvement: 10-100x faster depending on data volume! 🚀 🛠️ Additional OptimizationAdd # In website/views/project.py, update the query:
repos_qs = (
GitHubIssue.objects
.select_related('repo', 'contributor') # ← Add this line
.filter(
type="pull_request",
is_merged=True,
merged_at__gte=start_date,
merged_at__lte=end_date,
)
# ... rest of query
)This reduces the number of queries from N+1 to 1 by fetching related objects in a single JOIN. ✅ Testing Strategy1. Test Without Datapython manage.py migrate
python manage.py test website.tests # Ensure no regressions2. Test With Sample Data# Check query performance in Django shell
python manage.py shell
from website.models import GitHubIssue
from django.db import connection
from django.test.utils import CaptureQueriesContext
with CaptureQueriesContext(connection) as queries:
# Run your query
results = GitHubIssue.objects.filter(
type="pull_request",
is_merged=True,
merged_at__gte="2024-05-01",
merged_at__lte="2024-10-01",
).count()
# Check queries
for q in queries:
print(q['sql'])
print(f"Time: {q['time']}s\n")Look for 3. Test in Staging
📋 Summary
🎯 RecommendationInclude the indexes in this PR because:
The migration is safe and can be applied without downtime on most databases. Would you like me to:
Great work on the new chart UI! 🎨 The indexes will ensure it stays fast even with years of data! 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
website/views/project.py (1)
2207-2229: Fix GSOC end boundary to a true half-open interval (__lt, not__lte).
Right now an Oct 1st merge exactly at00:00:00would be included.- end_date = timezone.make_aware(datetime(year, 10, 1)) + end_date = timezone.make_aware(datetime(year, 10, 1)) ... - merged_at__gte=start_date, - merged_at__lte=end_date, + merged_at__gte=start_date, + merged_at__lt=end_date,
🧹 Nitpick comments (2)
website/models.py (1)
2161-2172: Indexes align well with GSOC report query shape (equality → range).
Composite(type, is_merged, merged_at)+merged_atis a sensible, low-risk performance win for the new reporting view.If you’re on Postgres, consider a conditional/partial index (e.g., only for
type='pull_request' AND is_merged=true) to keep the index smaller and more selective; otherwise current approach is fine.website/views/project.py (1)
2212-2235:select_related(...).values(...)likely has no effect.
Since you call.values(...), the join/related-object caching fromselect_relatedwon’t be used (and can be removed for clarity).- repos_qs = ( - GitHubIssue.objects.select_related("repo", "contributor") + repos_qs = ( + GitHubIssue.objects .filter( type="pull_request", is_merged=True, merged_at__gte=start_date, - merged_at__lte=end_date, + merged_at__lt=end_date, )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (4)
website/migrations/0262_rename_website_soc_user_id_provid_idx_website_soc_user_id_fb0840_idx_and_more.py(1 hunks)website/models.py(1 hunks)website/static/js/gsoc_pr_report.js(1 hunks)website/views/project.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
website/views/project.py (2)
website/models.py (3)
GitHubIssue(2090-2328)Repo(1904-1981)UserProfile(896-1109)website/utils.py (1)
admin_required(247-248)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Run Tests
- GitHub Check: docker-test
🔇 Additional comments (11)
website/migrations/0262_rename_website_soc_user_id_provid_idx_website_soc_user_id_fb0840_idx_and_more.py (2)
19-26: Migration indexes match model declaration (good).
TheseAddIndexops correctly reflect the newGitHubIssue.Meta.indexes.
14-18:RenameIndexcan be fragile across environments—please verify it exists everywhere.
If any environment doesn't havewebsite_soc_user_id_provid_idx, this migration will fail at deploy time.Please run migrations on a copy of prod schema / CI from a fresh clone + from an existing DB snapshot to confirm the rename succeeds.
website/static/js/gsoc_pr_report.js (9)
1-14: LGTM! Clean global initialization and error logging.The global
window.gsocDataprovides a clear API surface, andsafeLogproperly replaces console statements to satisfy CI requirements while preserving error tracking via Sentry when available.
16-73: LGTM! Robust data extraction with comprehensive fallbacks.The function handles multiple data formats gracefully and safely coerces types. The 30-character truncation for repository names (lines 49, 56, 61) prevents chart layout issues while maintaining readability.
75-130: LGTM! Safe JSON parsing with proper error handling.The escaped JSON parser correctly handles unicode sequences and common escapes. Error handling uses
safeLog(line 92, 107) which properly addresses the CI console statement requirement from previous reviews.
132-452: LGTM! Well-structured chart initialization with proper safeguards.The charts are initialized with comprehensive validation (lines 136-146), dark mode support (lines 165-168, 327-329), and proper error handling (lines 313-316, 443-446). Storing chart instances globally (lines 312, 442) enables the PDF export feature.
454-513: LGTM! Export and fallback chart with proper safeguards.The JSON export uses Blob API correctly with cleanup (line 477). The fallback chart function includes the
safeMaxguard (line 486) that addresses the division-by-zero issue from previous reviews, and properly checks for empty arrays (line 482).
516-716: Excellent! All critical PDF generation issues resolved.This comprehensive implementation addresses all previous review concerns:
- ✅ Null-check for button (lines 521-525)
- ✅ SRI-pinned CDN scripts (lines 531-536) with library validation (537-539)
- ✅ Explicit
String()coercion for jsPDF (lines 569, 578, 587, 596, 668, 678)- ✅ Safe filename with 'unknown' fallback (line 698)
- ✅ Error logging via
safeLog(line 707)- ✅ Proper button state restoration on error (lines 711-713)
The pagination logic (lines 661-683), chart export with fallbacks (lines 609-650), and multi-page footer handling (lines 687-695) are well-implemented.
718-751: LGTM! Robust script loader with security features.The dynamic script loader implements best practices: SRI support (lines 729-732), duplicate prevention (lines 721-724), timeout protection (735-737), and proper cleanup in both success and error paths.
754-900: Excellent! HTML report generation is secure and complete.The
escapeHtmlimplementation (lines 754-761) correctly handles all necessary entities. The function is consistently applied throughout the HTML report generation to all dynamic content (lines 797, 798, 803, 807, 811, 815, 838, 839, 859-861), preventing script injection vulnerabilities identified in previous reviews.The filename construction (line 895) correctly uses raw year values without HTML escaping, as the
downloadattribute expects plain text—properly addressing the feedback from prior reviews.
902-913: LGTM! Raw data viewer uses safe DOM manipulation.The function correctly uses
textContent(line 911) instead ofdocument.write, addressing the injection risk flagged in previous reviews. Popup blocking detection (lines 905-908) provides good user feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
website/views/project.py (2)
2200-2315: Good fixes for year span, TZ-awareness, Sep 30 inclusivity, and template-friendly data shape.One robustness tweak: aggregating and keying by
repo__namecan merge distinct repos that share a name (different owners/orgs), skewing totals and “top repos”. Consider keying by a stable unique identifier (e.g.,repo__repo_url, orrepo_id) while keeping a display label.Example adjustment:
- def _repo_key(repo_row): - return repo_row.get("repo__name") or "Unknown repo" + def _repo_key(repo_row): + # Prefer a unique key to avoid conflating repos with the same name across orgs. + return repo_row.get("repo__repo_url") or repo_row.get("repo__name") or "Unknown repo"
2213-2221:exclude(merged_at__isnull=True)is likely redundant.
merged_at__gte/merged_at__ltalready drop NULLs in SQL; you can remove the extra exclude unless you’re guarding against ORM/DB edge behavior.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (1)
website/views/project.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
website/views/project.py (1)
website/models.py (3)
GitHubIssue(2090-2328)Repo(1904-1981)UserProfile(896-1109)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Run Tests
- GitHub Check: docker-test
🔇 Additional comments (2)
website/views/project.py (2)
4-10: Module-level logger setup is fine.Also applies to: 57-57
2212-2225: Bot filtering looks consistent; keep an eye on false positives fromname__icontains="bot".If you see legitimate contributors excluded (e.g., username contains “bot” but isn’t a bot), consider tightening to
iendswith="[bot]"+contributor_type(and/or maintain an allowlist).
❌ Tests failedThe Django tests found issues that need to be fixed. Please review the test output below and fix the failing tests. How to run tests locally# Install dependencies
poetry install --with dev
# Run all tests
poetry run python manage.py test
# Run tests with verbose output
poetry run python manage.py test -v 3
# Run a specific test
poetry run python manage.py test app.tests.TestClass.test_methodTest output (last 100 lines)For more information, see the Django testing documentation. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 221846 @Pritz395 Great question! After investigating the test suite and codebase, here's what I found: ✅ Tests Are Passing Because the Refactor Was CompletedThe 🔍 Test File AnalysisLooking at Test methods now include:
Key indicators the refactor is complete: # Tests mock the REST API endpoint (not Search API)
@patch("website.management.commands.fetch_gsoc_prs.requests.get")
def test_fetch_gsoc_prs_rest_api_format(self, mock_get, mock_sleep):
# Returns LIST of PRs (REST API format), not {"items": [...]} (Search API)
mock_response.json.return_value = [
{
"number": 123,
"title": "Test PR",
"merged_at": "2024-01-01T00:00:00Z",
# REST API fields...
}
]The tests expect REST API response format (array of PRs), not Search API format ( 🔍 About the "Out of Scope" ReferenceI searched the entire codebase for:
Result: ❌ No matches found in the current codebase or this PR's comment chain. Possible Explanations:
📋 Summary
🎯 Current StatusThis PR (#5230):
The refactor work (completed separately):
The strategy of splitting the refactor into a separate PR (as recommended) worked perfectly! Both pieces are now complete and tests are green. 🎉 Does this clear up the confusion, or were you thinking of a specific test/issue that you'd like me to investigate further? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
website/views/project.py (1)
2212-2243: Consider excluding PRs where repo is NULL.The query aggregates by
repo__nameandrepo__repo_url, but if therepoforeign key is NULL, both fields will be NULL and all such PRs will later be grouped under "Unknown repo" (line 2252). This could inflate counts or create misleading entries in the report.Add a filter to exclude PRs without a repo:
repos_qs = ( GitHubIssue.objects.filter( type="pull_request", is_merged=True, merged_at__gte=start_date, merged_at__lt=end_date, ) + .exclude(repo__isnull=True) .exclude(merged_at__isnull=True)Alternatively, if PRs without repos should be included, consider adding a separate count or handling them explicitly in the UI.
Note: The timezone-aware datetime handling, bot filtering, and year-range logic all look correct and address past review comments. ✓
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (1)
website/views/project.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
website/views/project.py (1)
website/models.py (2)
GitHubIssue(2090-2328)Repo(1904-1981)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Run Tests
- GitHub Check: docker-test
🔇 Additional comments (3)
website/views/project.py (3)
4-4: LGTM – Imports and logger setup.The new imports (logging, defaultdict, GitHubIssue) and logger instance are appropriate for the GSOC PR reporting feature.
Also applies to: 9-9, 49-49, 57-57
2287-2312: Data structure and context look good.The code correctly addresses past review comments:
- ✓ Provides
gsoc_dataas a dict keyed by year (line 2288-2297) for template's.itemsusage- ✓ Includes all required fields in
summary_data(lines 2278-2285):start_year,end_year,avg_prs_per_year- ✓ Provides both list and dict formats to satisfy different consumers
The context is comprehensive and includes both JSON-serialized data for JavaScript and structured data for the template.
2314-2319: Exception handling is appropriate.The error handling provides proper logging, user-friendly feedback, and graceful degradation by redirecting to the project list. This prevents users from seeing raw error pages while ensuring exceptions are captured for debugging.
| def _repo_key(repo_row): | ||
| return repo_row.get("repo__repo_url") or repo_row.get("repo__name") or "Unknown repo" | ||
|
|
||
| def _repo_url(repo_row): | ||
| return repo_row.get("repo__repo_url") or "" | ||
|
|
||
| for year_block in report_data: | ||
| total_prs += year_block["total_prs"] | ||
| yearly_chart_data.append({"year": year_block["year"], "prs": year_block["total_prs"]}) | ||
|
|
||
| for repo in year_block["repos"]: | ||
| all_repos.add(_repo_key(repo)) | ||
|
|
||
| # Top repos across all years | ||
| repo_totals = defaultdict(int) | ||
| for year_block in report_data: | ||
| for repo in year_block["repos"]: | ||
| repo_totals[_repo_key(repo)] += repo["pr_count"] | ||
|
|
||
| top_repos_chart_data = [ | ||
| {"repo": repo, "prs": count} | ||
| for repo, count in sorted(repo_totals.items(), key=lambda x: x[1], reverse=True)[:5] | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
fd -type f -name "*gsoc_pr_report*" | head -20Repository: OWASP-BLT/BLT
Length of output: 227
🏁 Script executed:
find . -name "*gsoc_pr_report*" -type fRepository: OWASP-BLT/BLT
Length of output: 143
🏁 Script executed:
cat -n ./website/templates/projects/gsoc_pr_report.htmlRepository: OWASP-BLT/BLT
Length of output: 24132
🏁 Script executed:
cat -n ./website/static/js/gsoc_pr_report.jsRepository: OWASP-BLT/BLT
Length of output: 42898
🏁 Script executed:
sed -n '2270,2275p' ./website/views/project.pyRepository: OWASP-BLT/BLT
Length of output: 275
Swap _repo_key priority to display user-friendly repository names in charts.
The _repo_key helper (line 2252) currently prioritizes repo__repo_url over repo__name, causing the Top Repositories chart to display URLs instead of friendly repo names. The table view shows repo__name, creating inconsistency. Change the priority order:
def _repo_key(repo_row):
return repo_row.get("repo__name") or repo_row.get("repo__repo_url") or "Unknown repo"This ensures repos are keyed by name in aggregations and charts, matching the table display and improving UX. The URL is still available for linking via _repo_url().
🤖 Prompt for AI Agents
In website/views/project.py around lines 2251 to 2273, _repo_key currently
prefers repo__repo_url over repo__name causing charts and aggregations to use
URLs instead of user-friendly names; swap the priority so _repo_key returns
repo__name first, then repo__repo_url, and fall back to "Unknown repo" so
top-repos aggregations and charts display friendly repository names while
_repo_url remains for links.
DonnieBLT
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we filter by year? Also where is the data coming from any scripts to load it?

Closes : #3389
Summary
This PR introduces a dedicated GSOC Pull Request Report view, accessible from the projects page.




It provides a historical overview of all merged PRs during each year’s GSOC period (May–September) for the past 10 years, grouped by repository.
Note : New Chart UI:

Key Features
Technical Notes
/gsoc/pr-report//gsoc_pr_report.htmlThis report allows the organization to:
Summary by CodeRabbit
New Features
Chores
Stability
✏️ Tip: You can customize this high-level summary in your review settings.