Track Accurate PR cycle timing #661

Ayyanaruto · 2025-05-29T09:19:58Z

Linked Issue(s) #634

Acceptance Criteria fulfillment

Used github timeline API to fetch all the timeline events which include ready_for_review
Removed unecessary github review API as timeline already includes it
Added TypeDict and new GithubPRTimeline Dataclass for Type Safety
Calculating Cycle time from ready for review in analytics

Proposed changes (including videos or screenshots)

Further comments

This pull request introduces significant enhancements to the handling and processing of GitHub pull request timeline events in the analytics server. The changes include the addition of new schemas and models for timeline events, the implementation of a method to fetch and process these events from the GitHub API, and updates to the logic for calculating pull request performance metrics. These updates aim to provide more granular insights into pull request activities and improve the accuracy of analytics.

These changes collectively enhance the system's ability to analyze pull request workflows, providing deeper insights into developer activities and pull request lifecycles.

Summary by CodeRabbit

New Features
- Added support for retrieving and processing a wide range of GitHub pull request timeline events, enabling more detailed event tracking and analytics.
- Introduced a new timeline event data model for comprehensive representation of GitHub PR events.
- Expanded event type coverage to include actions such as assignment, labeling, merging, draft status changes, and more.
Improvements
- Enhanced calculation of pull request metrics (cycle time, review times) to use "ready for review" events when available for more accurate reporting.
- Generalized event handling to process timeline events beyond just reviews for richer data insights.
- Improved robustness in parsing ISO 8601 datetime strings from GitHub events.
Bug Fixes
- Metrics calculations now accurately filter and process only relevant review-related events.
Tests
- Added comprehensive test coverage for new event types and improved metric calculations.
- Introduced new test factories for timeline events to support expanded testing scenarios.

…t structure, and improve data processing logic, Add Ready_for_review in calculation

…ew events

coderabbitai · 2025-05-29T09:20:06Z

"""

Walkthrough

This update introduces comprehensive support for GitHub pull request timeline events. It adds new schemas and models for representing timeline events, implements event retrieval and adaptation in the API service, and updates the ETL and analytics logic to process these events for metrics calculations. Enums and utility functions are extended, and corresponding tests are added and updated.

Changes

File(s)	Change Summary
`mhq/exapi/github.py`, `mhq/exapi/models/github_timeline.py`, `mhq/exapi/schemas/timeline.py`	Added timeline event schemas and a dataclass for GitHub PR timeline events; implemented API method for fetching and adapting timeline events.
`mhq/service/code/sync/etl_github_handler.py`	Refactored event processing to use timeline events instead of only reviews; updated method signatures and event mapping logic accordingly.
`mhq/service/code/sync/etl_code_analytics.py`	Modified analytics logic to filter and process only relevant review-related events; adjusted cycle time and response time calculations to use READY_FOR_REVIEW events when present.
`mhq/store/models/code/enums.py`	Changed `PullRequestEventState` values to lowercase; expanded `PullRequestEventType` enum with many new event types.
`mhq/utils/time.py`	Enhanced ISO 8601 datetime parsing to support multiple formats and improved timezone handling.
`tests/factories/models/exapi/github.py`	Added factory for generating timeline event objects for testing.
`tests/factories/models/code.py`	Updated PR event state default to use enum value instead of string.
`tests/service/code/sync/test_etl_code_analytics.py`	Added tests for cycle/response time calculations with READY_FOR_REVIEW events; ensured correct fallback and earliest event usage.
`tests/service/code/sync/test_etl_github_handler.py`	Updated tests to use new timeline event factory and attributes; replaced review-specific test data and assertions with timeline event equivalents.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant GithubApiService
    participant GitHub API
    participant GithubPullRequestTimelineEvents

    Client->>GithubApiService: get_pr_timeline_events(repo, pr_number)
    GithubApiService->>GitHub API: GET /repos/{repo}/pulls/{pr_number}/timeline (with pagination)
    GitHub API-->>GithubApiService: JSON timeline events
    GithubApiService->>GithubApiService: _adapt_github_timeline_events(raw_events)
    GithubApiService->>GithubPullRequestTimelineEvents: Instantiate for each valid event
    GithubApiService-->>Client: List[GithubPullRequestTimelineEvents]

sequenceDiagram
    participant ETLHandler
    participant GithubApiService
    participant AnalyticsService

    ETLHandler->>GithubApiService: get_pr_timeline_events()
    GithubApiService-->>ETLHandler: List[TimelineEvents]
    ETLHandler->>ETLHandler: _to_pr_events(timeline_events)
    ETLHandler->>AnalyticsService: get_pr_performance(pr, pr_events)
    AnalyticsService->>AnalyticsService: Filter, sort, and compute metrics using review/ready_for_review events

Suggested reviewers

adnanhashmi09

Poem

In the warren, code hops anew,
With timelines tracked and events in view.
From reviews to merges, each hop’s in line,
Cycle times measured, analytics fine.
Now rabbits can see, with data so bright—
Every PR’s journey, from draft to green light!
🐇✨
"""

✨ Finishing Touches

📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

- Clean up imports and remove unused code - Enhance formatting and consistency in function definitions - Update timeline event processing logic for better clarity - Adjust test cases for cycle time and first response time calculations

coderabbitai

Actionable comments posted: 6

🔭 Outside diff range comments (1)

backend/analytics_server/mhq/exapi/schemas/timeline.py (1)
180-183: 🛠️ Refactor suggestion

Inconsistent naming for container TypedDict
The union type is GitHubPullTimelineEvent, but the container is named GitHubPrTimelineEventsDict (plural “Events” and abbreviated “Pr”). For clarity and consistency, rename it to match the singular “Pull” terminology:
-class GitHubPrTimelineEventsDict(TypedDict):
+class GitHubPullTimelineEventDict(TypedDict):
     event: str
     data: GitHubPullTimelineEvent

🧹 Nitpick comments (3)

backend/analytics_server/mhq/utils/time.py (1)

276-302: Enhanced datetime parsing looks good, but verify timezone handling logic.

The function correctly implements multiple format parsing with proper fallback mechanisms. However, there's a potential issue with timezone handling:

In the loop where "%z" in fmt, the variable dt_without_timezone is misleading since it may actually contain timezone info when parsing formats with %z.

Consider this minor improvement for clarity:

-            if "%z" in fmt:
-                dt_without_timezone = datetime.strptime(normalized_dt_str, fmt)
-            else:
-                dt_without_timezone = datetime.strptime(j_str_dt, fmt).replace(
-                    tzinfo=pytz.UTC
-                )
-
-            return dt_without_timezone.astimezone(pytz.UTC)
+            if "%z" in fmt:
+                parsed_dt = datetime.strptime(normalized_dt_str, fmt)
+            else:
+                parsed_dt = datetime.strptime(j_str_dt, fmt).replace(
+                    tzinfo=pytz.UTC
+                )
+
+            return parsed_dt.astimezone(pytz.UTC)

backend/analytics_server/mhq/exapi/schemas/timeline.py (2)

4-24: Refactor repetitive URL fields in UserDict
The UserDict TypedDict lists many URL attributes that can be grouped into a nested UserUrlsDict for better readability and reuse across schemas.

+class UserUrlsDict(TypedDict):
+    html_url: str
+    followers_url: str
+    following_url: str
+    gists_url: str
+    starred_url: str
+    subscriptions_url: str
+    organizations_url: str
+    repos_url: str
+    events_url: str
+    received_events_url: str

 class UserDict(TypedDict):
     login: str
     id: int
     node_id: str
     avatar_url: str
     gravatar_id: str
-    url: str
-    html_url: str
-    followers_url: str
-    following_url: str
-    gists_url: str
-    starred_url: str
-    subscriptions_url: str
-    organizations_url: str
-    repos_url: str
-    events_url: str
-    received_events_url: str
+    url: str
+    urls: UserUrlsDict
     type: str
     user_view_type: str
     site_admin: bool

96-102: Evaluate necessity of no-op subclasses
Classes like MergedEvent, ClosedEvent, and ReadyForReviewEvent inherit IssueEvent without adding new fields. Unless you expect to extend them, you can simplify the schema by using IssueEvent directly and dispatching based on its event value.

-GitHubPullTimelineEvent = Union[
-    CommitEvent,
-    CommentEvent,
-    IssueEvent,
-    ReviewRequestedEvent,
-    ReadyForReviewEvent,
-    MergedEvent,
-    ClosedEvent,
-    ReviewEvent,
-    CrossReferencedEvent,
-    ConvertToDraftEvent,
-    HeadRefDeletedEvent,
-    HeadRefForcePushedEvent,
-    LabeledEvent,
-    UnlabeledEvent,
-]
+GitHubPullTimelineEvent = Union[
+    CommitEvent,
+    CommentEvent,
+    IssueEvent,             # covers merged, closed, ready_for_review, etc.
+    ReviewRequestedEvent,
+    ReviewEvent,
+    CrossReferencedEvent,
+    ConvertToDraftEvent,
+    HeadRefDeletedEvent,
+    HeadRefForcePushedEvent,
+    LabeledEvent,
+    UnlabeledEvent,
+]

Also applies to: 124-126

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 65b1873 and b5bc6d8.

📒 Files selected for processing (11)

backend/analytics_server/mhq/exapi/github.py (2 hunks)
backend/analytics_server/mhq/exapi/models/timeline.py (1 hunks)
backend/analytics_server/mhq/exapi/schemas/timeline.py (1 hunks)
backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py (5 hunks)
backend/analytics_server/mhq/service/code/sync/etl_github_handler.py (3 hunks)
backend/analytics_server/mhq/store/models/code/enums.py (1 hunks)
backend/analytics_server/mhq/utils/time.py (1 hunks)
backend/analytics_server/tests/factories/models/code.py (2 hunks)
backend/analytics_server/tests/factories/models/exapi/github.py (2 hunks)
backend/analytics_server/tests/service/code/sync/test_etl_code_analytics.py (2 hunks)
backend/analytics_server/tests/service/code/sync/test_etl_github_handler.py (6 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (4)

backend/analytics_server/tests/factories/models/code.py (1)

backend/analytics_server/mhq/store/models/code/enums.py (1)

PullRequestEventState (24-27)

backend/analytics_server/tests/factories/models/exapi/github.py (3)

backend/analytics_server/mhq/exapi/models/timeline.py (1)

GithubPRTimelineEvent (9-14)

backend/analytics_server/mhq/store/models/code/enums.py (1)

PullRequestEventType (30-51)

backend/analytics_server/mhq/utils/time.py (1)

time_now (10-11)

backend/analytics_server/tests/service/code/sync/test_etl_github_handler.py (4)

backend/analytics_server/tests/factories/models/exapi/github.py (3)

get_github_pr_timeline_event (164-188)

raw_data (41-48)

raw_data (83-101)

backend/analytics_server/mhq/store/models/code/enums.py (1)

PullRequestEventType (30-51)

backend/analytics_server/mhq/exapi/github.py (3)

timestamp (429-437)

id (444-448)

raw_data (440-441)

backend/analytics_server/tests/factories/models/code.py (1)

get_pull_request_event (80-108)

backend/analytics_server/tests/service/code/sync/test_etl_code_analytics.py (4)

backend/analytics_server/mhq/store/models/code/enums.py (2)

PullRequestEventType (30-51)

PullRequestState (18-21)

backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py (2)

CodeETLAnalyticsService (17-219)

get_pr_performance (62-136)

backend/analytics_server/mhq/utils/time.py (1)

time_now (10-11)

backend/analytics_server/tests/factories/models/code.py (1)

get_pull_request_event (80-108)

🪛 Ruff (0.11.9)

backend/analytics_server/mhq/exapi/github.py

298-300: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

320-322: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

353-355: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: All file linting

🔇 Additional comments (25)

backend/analytics_server/mhq/store/models/code/enums.py (2)

31-51: LGTM! Comprehensive event type expansion.

The expanded enum properly covers the full range of GitHub pull request timeline events. The uppercase values are consistent and the additions are backward compatible.

25-27:
✅ Verification successful

Breaking change: Verify all references to event state values are updated.

The enum values changed from uppercase (e.g., "APPROVED") to lowercase with underscores (e.g., "approved"). This is a breaking change that could cause runtime errors if not all references are updated consistently.

Verify that all code referencing these enum values has been updated:

🏁 Script executed:
#!/bin/bash
# Description: Search for potential hardcoded references to old enum values
# Expected: No matches for the old uppercase values in non-test code

echo "Searching for hardcoded references to old enum values..."
rg -i "CHANGES_REQUESTED|APPROVED|COMMENTED" --type py | grep -v test | grep -v enum
Length of output: 3683
No outstanding breaking changes for PullRequestEventState usage

All non-test code comparisons use PullRequestEventState.<STATE>.value, which now resolves to the updated lowercase strings. A search for raw "CHANGES_REQUESTED", "APPROVED", and "COMMENTED" found only:

Enum definitions in backend/analytics_server/mhq/store/models/code/enums.py

Valid .value references in:

backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py

backend/analytics_server/mhq/service/code/sync/etl_gitlab_handler.py

Definitions for GitlabNoteType in backend/analytics_server/mhq/exapi/models/gitlab.py (unrelated to PullRequestEventState)

The expanded PullRequestEventType additions are purely additive and non-breaking. No further changes required.
backend/analytics_server/tests/factories/models/code.py (2)

16-16: LGTM! Added import for enum usage.

Good addition to support enum-based event state values.

98-98: LGTM! Proper use of enum value instead of hardcoded string.

This change aligns with the enum modifications and follows best practices by using enum values instead of hardcoded strings.

backend/analytics_server/tests/service/code/sync/test_etl_github_handler.py (4)

16-16: LGTM! Updated import to use new timeline event factory.

Correctly replaced the old review-specific factory with the generic timeline event factory.

18-18: LGTM! Added necessary enum import.

Required for specifying event types in the new timeline event factory.

203-212: LGTM! Tests correctly updated for timeline event structure.

The test functions properly migrated from review-specific events to generic timeline events while maintaining the same test logic. The new factory usage with explicit parameters is clear and correct.

Also applies to: 244-253

222-222: LGTM! Correctly updated attribute reference.

Changed from submitted_at to timestamp to match the new timeline event structure.

Also applies to: 231-231, 260-260, 274-274

backend/analytics_server/mhq/exapi/models/timeline.py (1)

1-14: LGTM!

The GithubPRTimelineEvent dataclass is well-structured with appropriate type annotations and clear field names.

backend/analytics_server/tests/factories/models/exapi/github.py (1)

164-188: LGTM!

The factory function is well-implemented with sensible defaults and proper handling of optional parameters.

backend/analytics_server/mhq/exapi/github.py (1)

283-507: Well-structured implementation of timeline event handling!

The new timeline event fetching and processing functionality is comprehensive and handles various edge cases properly. The Event class provides a clean abstraction for extracting data from different event types.

🧰 Tools

🪛 Ruff (0.11.9)

298-300: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

320-322: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

353-355: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py (3)

4-4: LGTM!

The import is correctly added to support the new event type filtering logic.

63-136: Well-implemented ready_for_review event handling!

The refactoring correctly separates review events from ready_for_review events and implements the new cycle time calculation logic. The improvements include:

Proper event type filtering for reviews and ready_for_review events

Safer dictionary access using .get() to prevent KeyError

Logical start time calculation that prefers ready_for_review timestamp

Correct calculation of first_review_time relative to the new start_time

This implementation aligns well with the PR objectives for better handling of draft PR workflows.

144-198: Correct filtering of review events for rework cycles.

The changes appropriately filter only review events for rework cycle calculations, which makes sense since ready_for_review events don't contain review state information. The safer dictionary access pattern using .get() is consistently applied.

backend/analytics_server/tests/service/code/sync/test_etl_code_analytics.py (7)

5-5: LGTM!

The import is correctly added to support the new test cases.

611-631: Well-structured test for ready_for_review cycle time calculation.

The test correctly verifies that cycle time is calculated from the ready_for_review event timestamp rather than PR creation time.

633-646: Correct test for fallback behavior.

The test appropriately verifies that cycle time falls back to using PR creation time when no ready_for_review events are present.

648-677: Good test coverage for multiple ready_for_review events.

The test correctly verifies that when multiple ready_for_review events exist, the earliest one is used for cycle time calculation.

679-702: Correct test for first response time with ready_for_review.

The test appropriately verifies that first response time is calculated from the ready_for_review event rather than PR creation time.

704-717: Good test for first response time fallback.

The test correctly verifies that first response time falls back to using PR creation time when no ready_for_review events exist.

719-747: Excellent test for draft PR workflow.

The test effectively models a real-world draft PR scenario where the PR remains in draft status for 2 days before being marked ready for review. The assertions correctly verify both cycle time and first response time calculations.

backend/analytics_server/mhq/service/code/sync/etl_github_handler.py (3)

18-18: LGTM!

The import is correctly added to support the new timeline event handling.

172-184: Clean refactoring to timeline events API.

The change from get_pr_reviews to get_pr_timeline_events with filtering for REVIEW events is well-implemented. This sets up the codebase to handle additional event types in the future while maintaining backward compatibility.

292-314: Correct adaptation to GithubPRTimelineEvent structure.

The method properly handles the new timeline event objects, correctly accessing properties like event.type.value, event.timestamp, and event.user_login. The implementation maintains backward compatibility while supporting the new event model.
backend/analytics_server/mhq/exapi/schemas/timeline.py (1)

1-2:
❓ Verification inconclusive

Ensure Python version compatibility for TypedDict
TypedDict is available in the standard typing module starting from Python 3.8. If the analytics server needs to support older Python versions, you should import TypedDict, Optional, Union, and List from typing_extensions instead of typing.

Run this script to confirm the minimum Python version requirement declared in your project:

🏁 Script executed:
#!/bin/bash
# Check python_requires in setup or pyproject
fd pyproject.toml setup.cfg setup.py | xargs grep -E "python_requires"
Length of output: 222

I couldn’t find any pyproject.toml, setup.cfg, or setup.py at the repo root. Let’s locate any project metadata files and then search them (and the rest of the repo) for Python version constraints:

This will tell us where (if anywhere) the minimum Python version is declared.

🏁 Script executed:
#!/bin/bash
# 1. List any metadata files
fd -e toml -e cfg -e py | sed 's|^\./||'

# 2. Search entire repo for python_requires
grep -R "python_requires" -n .

# 3. If using Poetry, look for Python constraints in pyproject.toml
grep -R "python" -n pyproject.toml || echo "no pyproject.toml"

# 4. Check setup.cfg classifiers for Python versions
grep -R "Programming Language :: Python" -n setup.cfg || echo "no setup.cfg"
Length of output: 11324
Confirm Python version compatibility for TypedDict
TypedDict is only available in the standard typing module from Python 3.8 onward. If this analytics server must support earlier Python versions, switch to importing TypedDict, Optional, Union, and List from typing_extensions instead. I couldn’t locate any python_requires declaration in pyproject.toml, setup.cfg, or setup.py, so please verify your project’s minimum supported Python version and update as needed.

• File: backend/analytics_server/mhq/exapi/schemas/timeline.py (lines 1-2)

backend/analytics_server/mhq/exapi/github.py

backend/analytics_server/tests/service/code/sync/test_etl_code_analytics.py

backend/analytics_server/mhq/exapi/schemas/timeline.py

… hint for verification field

backend/analytics_server/mhq/exapi/github.py

backend/analytics_server/mhq/exapi/schemas/timeline.py

backend/analytics_server/mhq/exapi/models/timeline.py

…t with GithubPullRequestTimelineEvents, update related imports, and improve event processing logic

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

backend/analytics_server/mhq/exapi/models/github_timeline.py (3)
73-76: Simplify the conditional logic.

The elif after return is unnecessary and can be simplified.

Apply this diff to remove the unnecessary elif:
-        if isinstance(user_data, dict) and "login" in user_data:
-            return user_data["login"]
-        elif hasattr(user_data, "login"):
-            return user_data.login
+        if isinstance(user_data, dict) and "login" in user_data:
+            return user_data["login"]
+        if hasattr(user_data, "login"):
+            return user_data.login
🧰 Tools

🪛 Pylint (3.3.7)

[refactor] 73-76: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)

94-96: Consider type consistency.

The raw_data property casts GitHubPullTimelineEvent to Dict, which might indicate a type mismatch. Consider whether the data parameter should be typed as Dict[str, Any] from the start if it's primarily used as a dictionary.

107-129: Consider extracting the event type mapping.

The large event type mapping dictionary could be extracted as a class constant for better maintainability and testability.

Consider extracting the mapping as a class constant:
+    EVENT_TYPE_MAPPING = {
+        "assigned": PullRequestEventType.ASSIGNED,
+        "closed": PullRequestEventType.CLOSED,
+        # ... rest of the mapping
+    }
+
     @property
     def type(self) -> Optional[PullRequestEventType]:
-        event_type_mapping = {
-            "assigned": PullRequestEventType.ASSIGNED,
-            # ... rest of the mapping
-        }
-        return event_type_mapping.get(self.event_type, PullRequestEventType.UNKNOWN)
+        return self.EVENT_TYPE_MAPPING.get(self.event_type, PullRequestEventType.UNKNOWN)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 12f4426 and ba5db24.

📒 Files selected for processing (6)

backend/analytics_server/mhq/exapi/github.py (2 hunks)
backend/analytics_server/mhq/exapi/models/github_timeline.py (1 hunks)
backend/analytics_server/mhq/exapi/schemas/timeline.py (1 hunks)
backend/analytics_server/mhq/service/code/sync/etl_github_handler.py (3 hunks)
backend/analytics_server/tests/factories/models/exapi/github.py (2 hunks)
backend/analytics_server/tests/service/code/sync/test_etl_github_handler.py (5 hunks)

🚧 Files skipped from review as they are similar to previous changes (5)

backend/analytics_server/tests/service/code/sync/test_etl_github_handler.py
backend/analytics_server/tests/factories/models/exapi/github.py
backend/analytics_server/mhq/exapi/github.py
backend/analytics_server/mhq/service/code/sync/etl_github_handler.py
backend/analytics_server/mhq/exapi/schemas/timeline.py

🧰 Additional context used

🪛 Pylint (3.3.7)

backend/analytics_server/mhq/exapi/models/github_timeline.py

[refactor] 73-76: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: All file linting

🔇 Additional comments (4)

backend/analytics_server/mhq/exapi/models/github_timeline.py (4)

16-42: Well-designed configuration-driven approach.

The EVENT_CONFIG dictionary provides a clean, maintainable way to handle different event types. The structure is consistent and allows for easy extension.

44-46: Simple and effective constructor.

The constructor is straightforward and follows dataclass patterns well.

48-57: Robust nested value extraction.

The _get_nested_value method implements good defensive programming with proper type checking and graceful handling of missing keys.

78-81: Good defensive logging.

The warning log when login field is missing provides valuable debugging information for data quality issues.

…ing into a class attribute and improve type hinting for raw_data property

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

backend/analytics_server/mhq/exapi/models/github_timeline.py (2)

14-65: Well-designed configuration structure with good separation of concerns.

The class structure effectively separates event-specific configuration from the mapping logic. The EVENT_CONFIG and EVENT_TYPE_MAPPING provide clear documentation of how different GitHub event types are handled.

Consider making the configuration immutable for better safety:

+from typing import Final
+
 @dataclass
 class GithubPullRequestTimelineEvents:
-    EVENT_CONFIG = {
+    EVENT_CONFIG: Final = {
         # ... existing config
     }
-    EVENT_TYPE_MAPPING = {
+    EVENT_TYPE_MAPPING: Final = {
         # ... existing mapping
     }

81-103: Comprehensive user extraction logic with proper fallback handling.

The user property handles different event types appropriately, with special handling for committed events and fallback logic for various data structures.

Consider extracting the user extraction logic into a separate method for better testability:

+    def _extract_user_from_data(self, user_data: Any) -> Optional[str]:
+        """Extract user login from various data structures."""
+        if not user_data:
+            return None
+        if isinstance(user_data, dict) and "login" in user_data:
+            return user_data["login"]
+        if hasattr(user_data, "login"):
+            return user_data.login
+        return None
+
     @property
     def user(self) -> Optional[str]:
         config = self.EVENT_CONFIG.get(self.event_type, self.EVENT_CONFIG["default"])
         actor_path = config["actor_path"]
 
         if not actor_path:
             return None
 
         if self.event_type == "committed":
             return self._get_nested_value(actor_path)
 
-        user_data = self._get_nested_value(actor_path)
-        if not user_data:
-            return None
-        if isinstance(user_data, dict) and "login" in user_data:
-            return user_data["login"]
-        if hasattr(user_data, "login"):
-            return user_data.login
-
-        LOG.warning(
-            f"User data does not contain login field for event type: {self.event_type}"
-        )
-        return None
+        user_data = self._get_nested_value(actor_path)
+        extracted_user = self._extract_user_from_data(user_data)
+        
+        if extracted_user is None and user_data is not None:
+            LOG.warning(
+                f"User data does not contain login field for event type: {self.event_type}"
+            )
+        
+        return extracted_user

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba5db24 and 6b6eecb.

📒 Files selected for processing (1)

backend/analytics_server/mhq/exapi/models/github_timeline.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: All file linting

🔇 Additional comments (5)

backend/analytics_server/mhq/exapi/models/github_timeline.py (5)

70-79: Robust implementation of nested value extraction.

The _get_nested_value method provides safe access to nested dictionary values with appropriate error handling.

105-114: Clean timestamp extraction with appropriate error handling.

The timestamp property correctly uses the utility function for ISO time string parsing and handles missing values gracefully.

117-118: Appropriate type casting for raw data access.

The raw_data property provides safe access to the underlying data structure.

121-125: Safe ID extraction with string conversion.

The id property handles potential None values correctly and ensures string output.

127-132: Good enum mapping with appropriate fallback.

The type property provides a clean mapping to the enum with a sensible default for unknown event types.

backend/analytics_server/mhq/exapi/schemas/timeline.py

backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py

backend/analytics_server/mhq/service/code/sync/etl_github_handler.py

backend/analytics_server/mhq/exapi/models/github_timeline.py

…itialization for improved readability

coderabbitai

Actionable comments posted: 0

🔭 Outside diff range comments (1)

backend/analytics_server/mhq/exapi/models/github_timeline.py (1)
134-140: 💡 Verification agent
🧩 Analysis chain

Verify comprehensive event type coverage.

Ensure all possible GitHub timeline event types are handled in the mapping to prevent missing events in analytics.

Let's verify that all GitHub timeline event types are covered:

🏁 Script executed:
#!/bin/bash
# Description: Check if all GitHub timeline event types are covered in the mapping

# Search for any references to timeline event types in the codebase
rg -i "timeline.*event" --type py -A 5 -B 5

# Look for any GitHub event type documentation or examples
rg -i "github.*event.*type" --type py -A 3
Length of output: 30179

🏁 Script executed:
#!/bin/bash
# Show EVENT_CONFIG definition
rg -n "EVENT_CONFIG" -A20 -B5 backend/analytics_server/mhq/exapi/models/github_timeline.py

# Show EVENT_TYPE_MAPPING definition
rg -n "EVENT_TYPE_MAPPING" -A20 -B5 backend/analytics_server/mhq/exapi/models/github_timeline.py
Length of output: 5088

🏁 Script executed:
#!/bin/bash
# Locate PullRequestEventType definition
rg -n "class PullRequestEventType" -A20 -B5 -t py

# If not found, search for Enum usage
rg -n "PullRequestEventType" -A5 -B5 -t py
Length of output: 32757

🌐 Web query:
List all GitHub issue/pull-request timeline event types returned by the GitHub REST API /repos/{owner}/{repo}/issues/{number}/timeline endpoint (official documentation).
💡 Result:

GitHub Issue/Pull-Request Timeline Event Types

The GitHub REST API endpoint /repos/{owner}/{repo}/issues/{number}/timeline returns a timeline of events associated with an issue or pull request. Each event in the timeline has a type specified by the event property. The official documentation provides a comprehensive list of these event types, which cover actions such as labeling, assignment, referencing, and state changes.

Complete List of Timeline Event Types

Below are all the event types that may be returned by the timeline endpoint, as specified by the official GitHub documentation[2][4][6]:

assigned: Someone was assigned to the issue.

unassigned: Someone was unassigned from the issue.

labeled: A label was added.

unlabeled: A label was removed.

milestoned: The issue was added to a milestone.

demilestoned: The issue was removed from a milestone.

renamed: The issue title was changed.

locked: The conversation was locked.

unlocked: The conversation was unlocked.

closed: The issue was closed.

reopened: The issue was reopened.

review_requested: A review was requested from a user or team (pull requests only).

review_request_removed: A requested review was removed (pull requests only).

reviewed: A pull request was reviewed.

ready_for_review: A draft pull request was marked ready for review.

convert_to_draft: A pull request was converted to a draft.

commented: A comment was added.

cross-referenced: Another issue or pull request referenced this issue.

referenced: A commit referenced this issue.

mentioned: The issue was mentioned elsewhere (e.g., in a commit or comment).

subscribed: Someone subscribed to notifications for the issue.

unsubscribed: Someone unsubscribed from notifications for the issue.

head_ref_deleted: The pull request’s branch was deleted.

head_ref_restored: The pull request’s branch was restored.

merged: The pull request was merged.

committed: A commit was pushed to the pull request’s branch.

added_to_project: The issue or pull request was added to a project.

moved_columns_in_project: The issue or pull request moved columns within a project.

removed_from_project: The issue or pull request was removed from a project.

converted_note_to_issue: A note was converted to an issue in a project board.

pinned: The issue was pinned.

unpinned: The issue was unpinned.

transferred: The issue was transferred to another repository.

auto_merge_enabled: Auto-merge was enabled on the pull request.

auto_merge_disabled: Auto-merge was disabled on the pull request.

auto_merge_requested: Auto-merge was requested for the pull request.

deployed: The issue or pull request was deployed via a deployment event.

deployment_environment_changed: The deployment environment for the issue or pull request changed.

Each event type provides specific context about actions or changes that occurred in the lifecycle of an issue or pull request. The documentation may receive updates, so always refer to the official list of issue event types for the most authoritative information[2][4][6].

Citations:

1: https://docs.github.com/en/rest/issues/timeline

2: https://docs.github.com/en/rest/using-the-rest-api/issue-event-types

3: https://docs.github.com/en/enterprise-server@3.14/rest/issues/timeline?apiVersion=2022-11-28

4: https://docs.github.com/en/enterprise-server@3.13/rest/using-the-rest-api/issue-event-types?apiVersion=2022-11-28

5: https://docs.github.com/rest/issues/issues

6: https://docs.github.com/en/enterprise-server@3.14/rest/using-the-rest-api/issue-event-types

7: https://docs.github.com/en/rest/issues/events

8: https://docs.github.com/enterprise-cloud@latest/rest/issues/events

9: https://docs.github.com/rest

10: https://docs.github.com/rest/guides/getting-started-with-the-rest-api
Action Required: Add missing GitHub timeline event types to the mapping

The current EVENT_TYPE_MAPPING covers only ~20 of the ~40+ timeline event types returned by GitHub. Any unmapped event will be marked UNKNOWN and dropped from analytics. Please extend both EVENT_CONFIG (for parsing) and EVENT_TYPE_MAPPING (for type) to include all of:

• milestoned, demilestoned, renamed
• cross-referenced, mentioned
• subscribed, unsubscribed
• head_ref_restored
• added_to_project, moved_columns_in_project, removed_from_project, converted_note_to_issue
• pinned, unpinned, transferred
• auto_merge_enabled, auto_merge_disabled, auto_merge_requested
• deployed, deployment_environment_changed

Also ensure corresponding PullRequestEventType enum members exist (e.g. MILESTONED, DEMILESTONED, etc.) and that each config supplies correct actor_path, timestamp_field, and id_path.

🧹 Nitpick comments (4)

backend/analytics_server/mhq/exapi/models/github_timeline.py (4)

77-87: Enhance the nested value extraction method for better type safety.

The _get_nested_value method could be more robust with input validation and better type handling.

Consider this improvement:

 def _get_nested_value(self, path: str) -> Optional[Any]:
+    if not path or not isinstance(path, str):
+        return None
+        
     keys = path.split(".")
     current = self.data

     for key in keys:
         if isinstance(current, dict) and key in current:
             current = current[key]
         else:
+            LOG.debug(f"Path '{path}' not found in timeline event data at key '{key}'")
             return None
     return current

96-111: Simplify the user property logic for better maintainability.

The user extraction logic has multiple branches and could be streamlined for better readability and maintainability.

Consider refactoring to a cleaner approach:

 @property
 def user(self) -> Optional[str]:
     config = self.EVENT_CONFIG.get(self.event_type, self.EVENT_CONFIG["default"])
     actor_path = config.actor_path

     if not actor_path:
         return None

-    if self.event_type == "committed":
-        return self._get_nested_value(actor_path)
-
     user_data = self._get_nested_value(actor_path)
     if not user_data:
         return None
+        
+    # For committed events, author.name is already a string
+    if self.event_type == "committed":
+        return str(user_data) if user_data else None
+        
+    # For other events, extract login from user object
     if isinstance(user_data, dict) and "login" in user_data:
         return user_data["login"]
     if hasattr(user_data, "login"):
         return user_data.login

     LOG.warning(
         f"User data does not contain login field for event type: {self.event_type}"
     )
     return None

50-71: Consider using a more type-safe approach for event type mapping.

While the current string-based mapping works, consider using an enum or constants to reduce magic strings and improve type safety.

Consider creating constants or using the enum values directly:

from mhq.store.models.code.enums import PullRequestEventType

# Define constants to avoid magic strings
class EventTypes:
    ASSIGNED = "assigned"
    CLOSED = "closed"
    COMMENTED = "commented"
    # ... etc

EVENT_TYPE_MAPPING = {
    EventTypes.ASSIGNED: PullRequestEventType.ASSIGNED,
    EventTypes.CLOSED: PullRequestEventType.CLOSED,
    # ... etc
}

127-133: Add validation for ID extraction to prevent potential issues.

The ID property could benefit from validation to ensure proper string conversion and handle edge cases.

Consider adding validation:

 @property
 def id(self) -> Optional[str]:
     config = self.EVENT_CONFIG.get(self.event_type, self.EVENT_CONFIG["default"])
     id_path = config.id_path
     id_value = self._get_nested_value(id_path)
-    return str(id_value) if id_value is not None else None
+    
+    if id_value is None:
+        return None
+    
+    # Handle different ID value types safely
+    if isinstance(id_value, (str, int)):
+        return str(id_value)
+    
+    LOG.warning(f"Unexpected ID value type for event {self.event_type}: {type(id_value)}")
+    return str(id_value)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6b6eecb and 6657e08.

📒 Files selected for processing (3)

backend/analytics_server/mhq/exapi/models/github_timeline.py (1 hunks)
backend/analytics_server/mhq/exapi/schemas/timeline.py (1 hunks)
backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py (5 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py
backend/analytics_server/mhq/exapi/schemas/timeline.py

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: All file linting

🔇 Additional comments (1)

backend/analytics_server/mhq/exapi/models/github_timeline.py (1)

14-19: Good improvement addressing previous feedback!

The introduction of GithubPullRequestTimelineEventConfig dataclass effectively addresses the past review concern about robustness by providing structured configuration instead of loose string handling.

backend/analytics_server/mhq/exapi/github.py

backend/analytics_server/mhq/service/code/sync/etl_github_handler.py

…ching methods for improved clarity and error handling

…ts method for better readability

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

backend/analytics_server/mhq/exapi/github.py (3)

318-323: Consider adding a docstring for clarity.

While the implementation is correct, adding a docstring would improve code documentation.

 def _create_timeline_event(self, event_data: Dict) -> GitHubPrTimelineEventsDict:
+    """Create a typed timeline event dictionary from raw event data."""
     return GitHubPrTimelineEventsDict(

324-358: Consider adding optional parameters to limit timeline events.

For PRs with extensive histories, fetching all timeline events could impact performance and API rate limits. Consider adding optional parameters to filter by event type or limit the number of events/pages fetched.

 def get_pr_timeline_events(
-    self, repo_name: str, pr_number: int
+    self, repo_name: str, pr_number: int, 
+    event_types: Optional[List[str]] = None, 
+    max_pages: Optional[int] = None
 ) -> List[GithubPullRequestTimelineEvents]:

359-383: Consider making the warning message more specific.

The validation logic is good, but the warning message could be more helpful by indicating which required field is missing.

-            if all([event.timestamp, event.type, event.id, event.user]):
-                adapted_timeline_events.append(event)
-            else:
-                LOG.warning(
-                    f"Skipping incomplete timeline event: {event_type} with id: {event.id}"
-                )
+            missing_fields = []
+            if not event.timestamp:
+                missing_fields.append("timestamp")
+            if not event.type:
+                missing_fields.append("type")
+            if not event.id:
+                missing_fields.append("id")
+            if not event.user:
+                missing_fields.append("user")
+            
+            if missing_fields:
+                LOG.warning(
+                    f"Skipping incomplete timeline event: {event_type} with id: {event.id}, "
+                    f"missing fields: {', '.join(missing_fields)}"
+                )
+            else:
+                adapted_timeline_events.append(event)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6657e08 and 588c5d4.

📒 Files selected for processing (1)

backend/analytics_server/mhq/exapi/github.py (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: All file linting

🔇 Additional comments (1)

backend/analytics_server/mhq/exapi/github.py (1)

281-317: Implementation looks good with proper error handling.

The _fetch_timeline_events method correctly implements the GitHub timeline API call with comprehensive error handling and proper exception chaining.

Ayyanaruto added 3 commits May 28, 2025 12:06

Implement GitHub PR timeline event handling and related data structures

d24a909

Refactor GitHub PR timeline event handling: rename types, update even…

c6b7f11

…t structure, and improve data processing logic, Add Ready_for_review in calculation

Add tests for cycle time and first response time using ready_for_revi…

07d4718

…ew events

coderabbitai bot reviewed May 29, 2025

View reviewed changes

Refactor error handling to maintain exception context and update type…

12f4426

… hint for verification field

Ayyanaruto changed the title ~~Cycle timing~~ Track Accurate PR cycle timing May 29, 2025

samad-yar-khan suggested changes Jun 4, 2025

View reviewed changes

Refactor GitHub timeline event handling: replace GithubPRTimelineEven…

ba5db24

…t with GithubPullRequestTimelineEvents, update related imports, and improve event processing logic

coderabbitai bot reviewed Jun 6, 2025

View reviewed changes

Refactor GithubPullRequestTimelineEvents: consolidate event type mapp…

6b6eecb

…ing into a class attribute and improve type hinting for raw_data property

coderabbitai bot reviewed Jun 6, 2025

View reviewed changes

samad-yar-khan suggested changes Jun 6, 2025

View reviewed changes

Ayyanaruto added 2 commits June 10, 2025 11:15

Optimized the code for maintainability and scalability

f0107d3

Refactor GithubPullRequestTimelineEvents: streamline configuration in…

6657e08

…itialization for improved readability

coderabbitai bot reviewed Jun 10, 2025

View reviewed changes

adnanhashmi09 requested changes Jul 1, 2025

View reviewed changes

backend/analytics_server/mhq/exapi/github.py Outdated Show resolved Hide resolved

backend/analytics_server/mhq/exapi/github.py Outdated Show resolved Hide resolved

backend/analytics_server/mhq/service/code/sync/etl_github_handler.py Show resolved Hide resolved

Ayyanaruto added 2 commits July 4, 2025 12:10

Refactor GitHub API service: rename and modularize timeline event fet…

fe988b3

…ching methods for improved clarity and error handling

Refactor GithubApiService: improve formatting of _fetch_timeline_even…

588c5d4

…ts method for better readability

coderabbitai bot reviewed Jul 4, 2025

View reviewed changes

Ayyanaruto requested a review from adnanhashmi09 July 4, 2025 11:23

adnanhashmi09 approved these changes Jul 4, 2025

View reviewed changes

adnanhashmi09 merged commit bafb28c into middlewarehq:main Jul 4, 2025
5 checks passed

Track Accurate PR cycle timing #661

Track Accurate PR cycle timing #661

Conversation

Ayyanaruto commented May 29, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Linked Issue(s) #634

Acceptance Criteria fulfillment

Proposed changes (including videos or screenshots)

Further comments

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

GitHub Issue/Pull-Request Timeline Event Types

Complete List of Timeline Event Types

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Ayyanaruto commented May 29, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented May 29, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)