Skip to content

Track Accurate PR cycle timing #661

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

Ayyanaruto
Copy link
Contributor

@Ayyanaruto Ayyanaruto commented May 29, 2025

Linked Issue(s) #634

Acceptance Criteria fulfillment

  • Used github timeline API to fetch all the timeline events which include ready_for_review
  • Removed unecessary github review API as timeline already includes it
  • Added TypeDict and new GithubPRTimeline Dataclass for Type Safety
  • Calculating Cycle time from ready for review in analytics

Proposed changes (including videos or screenshots)

Further comments

This pull request introduces significant enhancements to the handling and processing of GitHub pull request timeline events in the analytics server. The changes include the addition of new schemas and models for timeline events, the implementation of a method to fetch and process these events from the GitHub API, and updates to the logic for calculating pull request performance metrics. These updates aim to provide more granular insights into pull request activities and improve the accuracy of analytics.

These changes collectively enhance the system's ability to analyze pull request workflows, providing deeper insights into developer activities and pull request lifecycles.

Summary by CodeRabbit

  • New Features
    • Added support for retrieving and processing a wide range of GitHub pull request timeline events, enabling more detailed event tracking and analytics.
    • Expanded event type coverage to include actions such as assignment, labeling, merging, draft status changes, and more.
  • Improvements
    • Enhanced calculation of pull request metrics (cycle time, review times) to use "ready for review" events when available for more accurate reporting.
    • Improved robustness in parsing ISO 8601 datetime strings from GitHub events.
  • Bug Fixes
    • Metrics calculations now accurately filter and process only relevant review-related events.
  • Tests
    • Added comprehensive test coverage for new event types and improved metric calculations.
    • Introduced new test factories for timeline events to support expanded testing scenarios.

Copy link

coderabbitai bot commented May 29, 2025

"""

Walkthrough

This update introduces comprehensive support for GitHub pull request timeline events. It adds new schemas and models for representing timeline events, implements event retrieval and adaptation in the API service, and updates the ETL and analytics logic to process these events for metrics calculations. Enums and utility functions are extended, and corresponding tests are added and updated.

Changes

File(s) Change Summary
mhq/exapi/github.py, mhq/exapi/models/github_timeline.py, mhq/exapi/schemas/timeline.py Added timeline event schemas and a dataclass for GitHub PR timeline events; implemented API method for fetching and adapting timeline events.
mhq/service/code/sync/etl_github_handler.py Refactored event processing to use timeline events instead of only reviews; updated method signatures and event mapping logic accordingly.
mhq/service/code/sync/etl_code_analytics.py Modified analytics logic to filter and process only relevant review-related events; adjusted cycle time and response time calculations to use READY_FOR_REVIEW events when present.
mhq/store/models/code/enums.py Changed PullRequestEventState values to lowercase; expanded PullRequestEventType enum with many new event types.
mhq/utils/time.py Enhanced ISO 8601 datetime parsing to support multiple formats and improved timezone handling.
tests/factories/models/exapi/github.py Added factory for generating timeline event objects for testing.
tests/factories/models/code.py Updated PR event state default to use enum value instead of string.
tests/service/code/sync/test_etl_code_analytics.py Added tests for cycle/response time calculations with READY_FOR_REVIEW events; ensured correct fallback and earliest event usage.
tests/service/code/sync/test_etl_github_handler.py Updated tests to use new timeline event factory and attributes; replaced review-specific test data and assertions with timeline event equivalents.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant GithubApiService
    participant GitHub API
    participant GithubPullRequestTimelineEvents

    Client->>GithubApiService: get_pr_timeline_events(repo, pr_number)
    GithubApiService->>GitHub API: GET /repos/{repo}/pulls/{pr_number}/timeline (with pagination)
    GitHub API-->>GithubApiService: JSON timeline events
    GithubApiService->>GithubApiService: _adapt_github_timeline_events(raw_events)
    GithubApiService->>GithubPullRequestTimelineEvents: Instantiate for each valid event
    GithubApiService-->>Client: List[GithubPullRequestTimelineEvents]
Loading
sequenceDiagram
    participant ETLHandler
    participant GithubApiService
    participant AnalyticsService

    ETLHandler->>GithubApiService: get_pr_timeline_events()
    GithubApiService-->>ETLHandler: List[TimelineEvents]
    ETLHandler->>ETLHandler: _to_pr_events(timeline_events)
    ETLHandler->>AnalyticsService: get_pr_performance(pr, pr_events)
    AnalyticsService->>AnalyticsService: Filter, sort, and compute metrics using review/ready_for_review events
Loading

Poem

In the warren, code hops anew,
With timelines tracked and events in view.
From reviews to merges, each hop’s in line,
Cycle times measured, analytics fine.
Now rabbits can see, with data so bright—
Every PR’s journey, from draft to green light!
🐇✨
"""

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

- Clean up imports and remove unused code
- Enhance formatting and consistency in function definitions
- Update timeline event processing logic for better clarity
- Adjust test cases for cycle time and first response time calculations
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🔭 Outside diff range comments (1)
backend/analytics_server/mhq/exapi/schemas/timeline.py (1)

180-183: 🛠️ Refactor suggestion

Inconsistent naming for container TypedDict
The union type is GitHubPullTimelineEvent, but the container is named GitHubPrTimelineEventsDict (plural “Events” and abbreviated “Pr”). For clarity and consistency, rename it to match the singular “Pull” terminology:

-class GitHubPrTimelineEventsDict(TypedDict):
+class GitHubPullTimelineEventDict(TypedDict):
     event: str
     data: GitHubPullTimelineEvent
🧹 Nitpick comments (3)
backend/analytics_server/mhq/utils/time.py (1)

276-302: Enhanced datetime parsing looks good, but verify timezone handling logic.

The function correctly implements multiple format parsing with proper fallback mechanisms. However, there's a potential issue with timezone handling:

In the loop where "%z" in fmt, the variable dt_without_timezone is misleading since it may actually contain timezone info when parsing formats with %z.

Consider this minor improvement for clarity:

-            if "%z" in fmt:
-                dt_without_timezone = datetime.strptime(normalized_dt_str, fmt)
-            else:
-                dt_without_timezone = datetime.strptime(j_str_dt, fmt).replace(
-                    tzinfo=pytz.UTC
-                )
-
-            return dt_without_timezone.astimezone(pytz.UTC)
+            if "%z" in fmt:
+                parsed_dt = datetime.strptime(normalized_dt_str, fmt)
+            else:
+                parsed_dt = datetime.strptime(j_str_dt, fmt).replace(
+                    tzinfo=pytz.UTC
+                )
+
+            return parsed_dt.astimezone(pytz.UTC)
backend/analytics_server/mhq/exapi/schemas/timeline.py (2)

4-24: Refactor repetitive URL fields in UserDict
The UserDict TypedDict lists many URL attributes that can be grouped into a nested UserUrlsDict for better readability and reuse across schemas.

+class UserUrlsDict(TypedDict):
+    html_url: str
+    followers_url: str
+    following_url: str
+    gists_url: str
+    starred_url: str
+    subscriptions_url: str
+    organizations_url: str
+    repos_url: str
+    events_url: str
+    received_events_url: str

 class UserDict(TypedDict):
     login: str
     id: int
     node_id: str
     avatar_url: str
     gravatar_id: str
-    url: str
-    html_url: str
-    followers_url: str
-    following_url: str
-    gists_url: str
-    starred_url: str
-    subscriptions_url: str
-    organizations_url: str
-    repos_url: str
-    events_url: str
-    received_events_url: str
+    url: str
+    urls: UserUrlsDict
     type: str
     user_view_type: str
     site_admin: bool

96-102: Evaluate necessity of no-op subclasses
Classes like MergedEvent, ClosedEvent, and ReadyForReviewEvent inherit IssueEvent without adding new fields. Unless you expect to extend them, you can simplify the schema by using IssueEvent directly and dispatching based on its event value.

-GitHubPullTimelineEvent = Union[
-    CommitEvent,
-    CommentEvent,
-    IssueEvent,
-    ReviewRequestedEvent,
-    ReadyForReviewEvent,
-    MergedEvent,
-    ClosedEvent,
-    ReviewEvent,
-    CrossReferencedEvent,
-    ConvertToDraftEvent,
-    HeadRefDeletedEvent,
-    HeadRefForcePushedEvent,
-    LabeledEvent,
-    UnlabeledEvent,
-]
+GitHubPullTimelineEvent = Union[
+    CommitEvent,
+    CommentEvent,
+    IssueEvent,             # covers merged, closed, ready_for_review, etc.
+    ReviewRequestedEvent,
+    ReviewEvent,
+    CrossReferencedEvent,
+    ConvertToDraftEvent,
+    HeadRefDeletedEvent,
+    HeadRefForcePushedEvent,
+    LabeledEvent,
+    UnlabeledEvent,
+]

Also applies to: 124-126

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 65b1873 and b5bc6d8.

📒 Files selected for processing (11)
  • backend/analytics_server/mhq/exapi/github.py (2 hunks)
  • backend/analytics_server/mhq/exapi/models/timeline.py (1 hunks)
  • backend/analytics_server/mhq/exapi/schemas/timeline.py (1 hunks)
  • backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py (5 hunks)
  • backend/analytics_server/mhq/service/code/sync/etl_github_handler.py (3 hunks)
  • backend/analytics_server/mhq/store/models/code/enums.py (1 hunks)
  • backend/analytics_server/mhq/utils/time.py (1 hunks)
  • backend/analytics_server/tests/factories/models/code.py (2 hunks)
  • backend/analytics_server/tests/factories/models/exapi/github.py (2 hunks)
  • backend/analytics_server/tests/service/code/sync/test_etl_code_analytics.py (2 hunks)
  • backend/analytics_server/tests/service/code/sync/test_etl_github_handler.py (6 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (4)
backend/analytics_server/tests/factories/models/code.py (1)
backend/analytics_server/mhq/store/models/code/enums.py (1)
  • PullRequestEventState (24-27)
backend/analytics_server/tests/factories/models/exapi/github.py (3)
backend/analytics_server/mhq/exapi/models/timeline.py (1)
  • GithubPRTimelineEvent (9-14)
backend/analytics_server/mhq/store/models/code/enums.py (1)
  • PullRequestEventType (30-51)
backend/analytics_server/mhq/utils/time.py (1)
  • time_now (10-11)
backend/analytics_server/tests/service/code/sync/test_etl_github_handler.py (4)
backend/analytics_server/tests/factories/models/exapi/github.py (3)
  • get_github_pr_timeline_event (164-188)
  • raw_data (41-48)
  • raw_data (83-101)
backend/analytics_server/mhq/store/models/code/enums.py (1)
  • PullRequestEventType (30-51)
backend/analytics_server/mhq/exapi/github.py (3)
  • timestamp (429-437)
  • id (444-448)
  • raw_data (440-441)
backend/analytics_server/tests/factories/models/code.py (1)
  • get_pull_request_event (80-108)
backend/analytics_server/tests/service/code/sync/test_etl_code_analytics.py (4)
backend/analytics_server/mhq/store/models/code/enums.py (2)
  • PullRequestEventType (30-51)
  • PullRequestState (18-21)
backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py (2)
  • CodeETLAnalyticsService (17-219)
  • get_pr_performance (62-136)
backend/analytics_server/mhq/utils/time.py (1)
  • time_now (10-11)
backend/analytics_server/tests/factories/models/code.py (1)
  • get_pull_request_event (80-108)
🪛 Ruff (0.11.9)
backend/analytics_server/mhq/exapi/github.py

298-300: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


320-322: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


353-355: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: All file linting
🔇 Additional comments (25)
backend/analytics_server/mhq/store/models/code/enums.py (2)

31-51: LGTM! Comprehensive event type expansion.

The expanded enum properly covers the full range of GitHub pull request timeline events. The uppercase values are consistent and the additions are backward compatible.


25-27:

✅ Verification successful

Breaking change: Verify all references to event state values are updated.

The enum values changed from uppercase (e.g., "APPROVED") to lowercase with underscores (e.g., "approved"). This is a breaking change that could cause runtime errors if not all references are updated consistently.

Verify that all code referencing these enum values has been updated:


🏁 Script executed:

#!/bin/bash
# Description: Search for potential hardcoded references to old enum values
# Expected: No matches for the old uppercase values in non-test code

echo "Searching for hardcoded references to old enum values..."
rg -i "CHANGES_REQUESTED|APPROVED|COMMENTED" --type py | grep -v test | grep -v enum

Length of output: 3683


No outstanding breaking changes for PullRequestEventState usage

All non-test code comparisons use PullRequestEventState.<STATE>.value, which now resolves to the updated lowercase strings. A search for raw "CHANGES_REQUESTED", "APPROVED", and "COMMENTED" found only:

  • Enum definitions in backend/analytics_server/mhq/store/models/code/enums.py
  • Valid .value references in:
    • backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py
    • backend/analytics_server/mhq/service/code/sync/etl_gitlab_handler.py
  • Definitions for GitlabNoteType in backend/analytics_server/mhq/exapi/models/gitlab.py (unrelated to PullRequestEventState)

The expanded PullRequestEventType additions are purely additive and non-breaking. No further changes required.

backend/analytics_server/tests/factories/models/code.py (2)

16-16: LGTM! Added import for enum usage.

Good addition to support enum-based event state values.


98-98: LGTM! Proper use of enum value instead of hardcoded string.

This change aligns with the enum modifications and follows best practices by using enum values instead of hardcoded strings.

backend/analytics_server/tests/service/code/sync/test_etl_github_handler.py (4)

16-16: LGTM! Updated import to use new timeline event factory.

Correctly replaced the old review-specific factory with the generic timeline event factory.


18-18: LGTM! Added necessary enum import.

Required for specifying event types in the new timeline event factory.


203-212: LGTM! Tests correctly updated for timeline event structure.

The test functions properly migrated from review-specific events to generic timeline events while maintaining the same test logic. The new factory usage with explicit parameters is clear and correct.

Also applies to: 244-253


222-222: LGTM! Correctly updated attribute reference.

Changed from submitted_at to timestamp to match the new timeline event structure.

Also applies to: 231-231, 260-260, 274-274

backend/analytics_server/mhq/exapi/models/timeline.py (1)

1-14: LGTM!

The GithubPRTimelineEvent dataclass is well-structured with appropriate type annotations and clear field names.

backend/analytics_server/tests/factories/models/exapi/github.py (1)

164-188: LGTM!

The factory function is well-implemented with sensible defaults and proper handling of optional parameters.

backend/analytics_server/mhq/exapi/github.py (1)

283-507: Well-structured implementation of timeline event handling!

The new timeline event fetching and processing functionality is comprehensive and handles various edge cases properly. The Event class provides a clean abstraction for extracting data from different event types.

🧰 Tools
🪛 Ruff (0.11.9)

298-300: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


320-322: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


353-355: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py (3)

4-4: LGTM!

The import is correctly added to support the new event type filtering logic.


63-136: Well-implemented ready_for_review event handling!

The refactoring correctly separates review events from ready_for_review events and implements the new cycle time calculation logic. The improvements include:

  • Proper event type filtering for reviews and ready_for_review events
  • Safer dictionary access using .get() to prevent KeyError
  • Logical start time calculation that prefers ready_for_review timestamp
  • Correct calculation of first_review_time relative to the new start_time

This implementation aligns well with the PR objectives for better handling of draft PR workflows.


144-198: Correct filtering of review events for rework cycles.

The changes appropriately filter only review events for rework cycle calculations, which makes sense since ready_for_review events don't contain review state information. The safer dictionary access pattern using .get() is consistently applied.

backend/analytics_server/tests/service/code/sync/test_etl_code_analytics.py (7)

5-5: LGTM!

The import is correctly added to support the new test cases.


611-631: Well-structured test for ready_for_review cycle time calculation.

The test correctly verifies that cycle time is calculated from the ready_for_review event timestamp rather than PR creation time.


633-646: Correct test for fallback behavior.

The test appropriately verifies that cycle time falls back to using PR creation time when no ready_for_review events are present.


648-677: Good test coverage for multiple ready_for_review events.

The test correctly verifies that when multiple ready_for_review events exist, the earliest one is used for cycle time calculation.


679-702: Correct test for first response time with ready_for_review.

The test appropriately verifies that first response time is calculated from the ready_for_review event rather than PR creation time.


704-717: Good test for first response time fallback.

The test correctly verifies that first response time falls back to using PR creation time when no ready_for_review events exist.


719-747: Excellent test for draft PR workflow.

The test effectively models a real-world draft PR scenario where the PR remains in draft status for 2 days before being marked ready for review. The assertions correctly verify both cycle time and first response time calculations.

backend/analytics_server/mhq/service/code/sync/etl_github_handler.py (3)

18-18: LGTM!

The import is correctly added to support the new timeline event handling.


172-184: Clean refactoring to timeline events API.

The change from get_pr_reviews to get_pr_timeline_events with filtering for REVIEW events is well-implemented. This sets up the codebase to handle additional event types in the future while maintaining backward compatibility.


292-314: Correct adaptation to GithubPRTimelineEvent structure.

The method properly handles the new timeline event objects, correctly accessing properties like event.type.value, event.timestamp, and event.user_login. The implementation maintains backward compatibility while supporting the new event model.

backend/analytics_server/mhq/exapi/schemas/timeline.py (1)

1-2:

❓ Verification inconclusive

Ensure Python version compatibility for TypedDict
TypedDict is available in the standard typing module starting from Python 3.8. If the analytics server needs to support older Python versions, you should import TypedDict, Optional, Union, and List from typing_extensions instead of typing.

Run this script to confirm the minimum Python version requirement declared in your project:


🏁 Script executed:

#!/bin/bash
# Check python_requires in setup or pyproject
fd pyproject.toml setup.cfg setup.py | xargs grep -E "python_requires"

Length of output: 222


I couldn’t find any pyproject.toml, setup.cfg, or setup.py at the repo root. Let’s locate any project metadata files and then search them (and the rest of the repo) for Python version constraints:

This will tell us where (if anywhere) the minimum Python version is declared.


🏁 Script executed:

#!/bin/bash
# 1. List any metadata files
fd -e toml -e cfg -e py | sed 's|^\./||'

# 2. Search entire repo for python_requires
grep -R "python_requires" -n .

# 3. If using Poetry, look for Python constraints in pyproject.toml
grep -R "python" -n pyproject.toml || echo "no pyproject.toml"

# 4. Check setup.cfg classifiers for Python versions
grep -R "Programming Language :: Python" -n setup.cfg || echo "no setup.cfg"

Length of output: 11324


Confirm Python version compatibility for TypedDict
TypedDict is only available in the standard typing module from Python 3.8 onward. If this analytics server must support earlier Python versions, switch to importing TypedDict, Optional, Union, and List from typing_extensions instead. I couldn’t locate any python_requires declaration in pyproject.toml, setup.cfg, or setup.py, so please verify your project’s minimum supported Python version and update as needed.

• File: backend/analytics_server/mhq/exapi/schemas/timeline.py (lines 1-2)

@Ayyanaruto Ayyanaruto changed the title Cycle timing Track Accurate PR cycle timing May 29, 2025
…t with GithubPullRequestTimelineEvents, update related imports, and improve event processing logic
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
backend/analytics_server/mhq/exapi/models/github_timeline.py (3)

73-76: Simplify the conditional logic.

The elif after return is unnecessary and can be simplified.

Apply this diff to remove the unnecessary elif:

-        if isinstance(user_data, dict) and "login" in user_data:
-            return user_data["login"]
-        elif hasattr(user_data, "login"):
-            return user_data.login
+        if isinstance(user_data, dict) and "login" in user_data:
+            return user_data["login"]
+        if hasattr(user_data, "login"):
+            return user_data.login
🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 73-76: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)


94-96: Consider type consistency.

The raw_data property casts GitHubPullTimelineEvent to Dict, which might indicate a type mismatch. Consider whether the data parameter should be typed as Dict[str, Any] from the start if it's primarily used as a dictionary.


107-129: Consider extracting the event type mapping.

The large event type mapping dictionary could be extracted as a class constant for better maintainability and testability.

Consider extracting the mapping as a class constant:

+    EVENT_TYPE_MAPPING = {
+        "assigned": PullRequestEventType.ASSIGNED,
+        "closed": PullRequestEventType.CLOSED,
+        # ... rest of the mapping
+    }
+
     @property
     def type(self) -> Optional[PullRequestEventType]:
-        event_type_mapping = {
-            "assigned": PullRequestEventType.ASSIGNED,
-            # ... rest of the mapping
-        }
-        return event_type_mapping.get(self.event_type, PullRequestEventType.UNKNOWN)
+        return self.EVENT_TYPE_MAPPING.get(self.event_type, PullRequestEventType.UNKNOWN)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 12f4426 and ba5db24.

📒 Files selected for processing (6)
  • backend/analytics_server/mhq/exapi/github.py (2 hunks)
  • backend/analytics_server/mhq/exapi/models/github_timeline.py (1 hunks)
  • backend/analytics_server/mhq/exapi/schemas/timeline.py (1 hunks)
  • backend/analytics_server/mhq/service/code/sync/etl_github_handler.py (3 hunks)
  • backend/analytics_server/tests/factories/models/exapi/github.py (2 hunks)
  • backend/analytics_server/tests/service/code/sync/test_etl_github_handler.py (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • backend/analytics_server/tests/service/code/sync/test_etl_github_handler.py
  • backend/analytics_server/tests/factories/models/exapi/github.py
  • backend/analytics_server/mhq/exapi/github.py
  • backend/analytics_server/mhq/service/code/sync/etl_github_handler.py
  • backend/analytics_server/mhq/exapi/schemas/timeline.py
🧰 Additional context used
🪛 Pylint (3.3.7)
backend/analytics_server/mhq/exapi/models/github_timeline.py

[refactor] 73-76: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: All file linting
🔇 Additional comments (4)
backend/analytics_server/mhq/exapi/models/github_timeline.py (4)

16-42: Well-designed configuration-driven approach.

The EVENT_CONFIG dictionary provides a clean, maintainable way to handle different event types. The structure is consistent and allows for easy extension.


44-46: Simple and effective constructor.

The constructor is straightforward and follows dataclass patterns well.


48-57: Robust nested value extraction.

The _get_nested_value method implements good defensive programming with proper type checking and graceful handling of missing keys.


78-81: Good defensive logging.

The warning log when login field is missing provides valuable debugging information for data quality issues.

…ing into a class attribute and improve type hinting for raw_data property
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
backend/analytics_server/mhq/exapi/models/github_timeline.py (2)

14-65: Well-designed configuration structure with good separation of concerns.

The class structure effectively separates event-specific configuration from the mapping logic. The EVENT_CONFIG and EVENT_TYPE_MAPPING provide clear documentation of how different GitHub event types are handled.

Consider making the configuration immutable for better safety:

+from typing import Final
+
 @dataclass
 class GithubPullRequestTimelineEvents:
-    EVENT_CONFIG = {
+    EVENT_CONFIG: Final = {
         # ... existing config
     }
-    EVENT_TYPE_MAPPING = {
+    EVENT_TYPE_MAPPING: Final = {
         # ... existing mapping
     }

81-103: Comprehensive user extraction logic with proper fallback handling.

The user property handles different event types appropriately, with special handling for committed events and fallback logic for various data structures.

Consider extracting the user extraction logic into a separate method for better testability:

+    def _extract_user_from_data(self, user_data: Any) -> Optional[str]:
+        """Extract user login from various data structures."""
+        if not user_data:
+            return None
+        if isinstance(user_data, dict) and "login" in user_data:
+            return user_data["login"]
+        if hasattr(user_data, "login"):
+            return user_data.login
+        return None
+
     @property
     def user(self) -> Optional[str]:
         config = self.EVENT_CONFIG.get(self.event_type, self.EVENT_CONFIG["default"])
         actor_path = config["actor_path"]
 
         if not actor_path:
             return None
 
         if self.event_type == "committed":
             return self._get_nested_value(actor_path)
 
-        user_data = self._get_nested_value(actor_path)
-        if not user_data:
-            return None
-        if isinstance(user_data, dict) and "login" in user_data:
-            return user_data["login"]
-        if hasattr(user_data, "login"):
-            return user_data.login
-
-        LOG.warning(
-            f"User data does not contain login field for event type: {self.event_type}"
-        )
-        return None
+        user_data = self._get_nested_value(actor_path)
+        extracted_user = self._extract_user_from_data(user_data)
+        
+        if extracted_user is None and user_data is not None:
+            LOG.warning(
+                f"User data does not contain login field for event type: {self.event_type}"
+            )
+        
+        return extracted_user
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba5db24 and 6b6eecb.

📒 Files selected for processing (1)
  • backend/analytics_server/mhq/exapi/models/github_timeline.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: All file linting
🔇 Additional comments (5)
backend/analytics_server/mhq/exapi/models/github_timeline.py (5)

70-79: Robust implementation of nested value extraction.

The _get_nested_value method provides safe access to nested dictionary values with appropriate error handling.


105-114: Clean timestamp extraction with appropriate error handling.

The timestamp property correctly uses the utility function for ISO time string parsing and handles missing values gracefully.


117-118: Appropriate type casting for raw data access.

The raw_data property provides safe access to the underlying data structure.


121-125: Safe ID extraction with string conversion.

The id property handles potential None values correctly and ensures string output.


127-132: Good enum mapping with appropriate fallback.

The type property provides a clean mapping to the enum with a sensible default for unknown event types.

reviews = [
review
for review in timeline_pr_events
if (review.type == PullRequestEventType.REVIEW)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this always an enum ? can review.type ever be a string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samad-yar-khan can you elaborate on this part

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants