Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix PR cycle time calculation for draft PRs #636

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

Ayyanaruto
Copy link

@Ayyanaruto Ayyanaruto commented Mar 18, 2025

This pull request includes several changes to enhance the processing and analysis of pull requests in the backend/analytics_server module. The key changes include adding a method to retrieve the pull request timeline, updating the create_pr_metrics function to account for the earliest "ready for review" event, and modifying the process_pr function to incorporate this new data.

Enhancements to pull request processing:

Updates to metrics calculation:

Changes to pull request processing flow:

Additional imports:

Linked Issue(s)

Resolves issue #634

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Introduced a new optional parameter for pull request metrics to enhance cycle time calculations.
    • Added a method to retrieve pull request timelines directly from the GitHub API.
    • Enhanced pull request processing with additional metrics related to readiness.
  • Tests

    • Added new tests to ensure the correct functionality of the readiness event retrieval method under various conditions.

- Modified cycle time calculation to start timing when PR first becomes ready for review
- Added logic to detect the first transition from draft to ready state
- First ready event timestamp is used instead of PR creation time when applicable
- Provides more accurate metrics by excluding draft preparation time

Resolves issue middlewarehq#634
@CLAassistant
Copy link

CLAassistant commented Mar 18, 2025

CLA assistant check
All committers have signed the CLA.

Copy link

coderabbitai bot commented Mar 18, 2025

Walkthrough

The changes introduce a new optional parameter, pr_earliest_event, to the create_pr_metrics and get_pr_performance methods in the CodeETLAnalyticsService class. A new method, get_pr_timeline, is added to the GithubApiService class to retrieve pull request timelines. The GithubETLHandler class is updated to pass the new parameter and includes a method to find the earliest "ready for review" event, enhancing functionality related to pull request processing.

Changes

File Change Summary
backend/analytics_server/.../etl_code_analytics.py Updated create_pr_metrics and get_pr_performance to include pr_earliest_event, modifying cycle_time calculation logic based on this new parameter.
backend/analytics_server/.../github.py Added get_pr_timeline method to retrieve pull request timelines using github_repo and pr_number.
backend/analytics_server/.../etl_github_handler.py Updated process_pr and get_repo_pull_requests_data methods to accept and utilize github_repo. Added get_first_ready_for_review_event static method to find the earliest "ready_for_review" event.
backend/analytics_server/.../test_etl_github_handler.py Added three new test methods for get_first_ready_for_review_event to ensure correct functionality under various conditions.

Sequence Diagram(s)

sequenceDiagram
    participant Handler as GithubETLHandler
    participant API as GithubApiService
    participant Repo as GithubRepository
    participant PR as PullRequest

    Handler->>Repo: Retrieve pull requests data
    Handler->>PR: Process pull request
    Handler->>API: Get PR timeline with github_repo and pr.number
    API-->>Handler: Return PR timeline events
    Handler->>Handler: Determine earliest "ready for review" event
    Handler->>Handler: Call create_pr_metrics with pr_earliest_event
Loading

Poem

In the code where bunnies hop,
A new parameter makes us stop!
With timelines and events so neat,
Our metrics now dance to a new beat.
Hooray for changes, let’s all cheer,
For every pull request, we hold dear!
🐇💻✨

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py (1)

58-104: Consider adding a small comment explaining the draft PR cycle time logic

While the implementation is correct, adding a brief comment explaining why we're using first_open_time instead of pr.created_at would make the code more maintainable for future developers who might not be familiar with draft PR workflows.

# Add comment before first_ready_event declaration
+        # For draft PRs, cycle time should start when the PR becomes ready for review (state changes to OPEN)
+        # rather than when it was initially created as a draft
         first_ready_event = next(
             (
                 e
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b715c16 and 8712219.

📒 Files selected for processing (1)
  • backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py (2 hunks)
🔇 Additional comments (4)
backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py (4)

58-65: Good implementation of finding the first OPEN state event

The implementation correctly uses next() with a generator expression to efficiently find the first event where the state is OPEN, with a proper fallback to None if no such event exists.


67-69: Well-handled fallback logic for first_open_time

The code appropriately sets first_open_time to either the timestamp of the first OPEN event (if found) or falls back to the PR creation time, which ensures backward compatibility for PRs without explicit state changes.


104-104: Correctly updated cycle_time calculation

This change properly implements the fix by using first_open_time instead of the PR creation time for calculating cycle time, which accurately reflects the time a PR spent in active review.


58-104: Verify this change with edge cases

The implementation looks correct, but it's worth verifying with some edge cases:

  1. PRs that were never in draft state
  2. PRs that have multiple open/draft transitions
  3. PRs created before this feature was implemented

The current implementation should handle all these cases correctly, but testing would confirm.

@adnanhashmi09
Copy link
Contributor

adnanhashmi09 commented Mar 18, 2025

Please get the issue assigned to yourself by commenting on the issue, before you start working on it. This helps others not pick up tasks that are already being worked on by someone else.

@Ayyanaruto
Copy link
Author

Apologies, I forgot to get the issue assigned before starting. I'll make sure to do that next time. Thanks for the reminder!

Copy link
Contributor

@adnanhashmi09 adnanhashmi09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is incomplete and the logic is flawed. Also please include tests for the changes you make.

@Ayyanaruto Ayyanaruto marked this pull request as draft March 18, 2025 14:57
@Ayyanaruto Ayyanaruto marked this pull request as ready for review March 23, 2025 09:32
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (9)
backend/analytics_server/mhq/exapi/github.py (1)

117-119: Add return type hint, docstring, and error handling.

The new get_pr_timeline method works correctly but could be improved with:

  1. Return type hint
  2. Docstring explaining purpose and parameters
  3. Error handling for API exceptions
-    def get_pr_timeline(self, github_repo:GithubRepository, pr_number:int):
-            issue = github_repo.get_issue(pr_number)
-            return issue.get_timeline()
+    def get_pr_timeline(self, github_repo: GithubRepository, pr_number: int) -> GithubPaginatedList:
+        """
+        Get the timeline of events for a pull request.
+        
+        Args:
+            github_repo: GitHub repository object
+            pr_number: Pull request number
+            
+        Returns:
+            Timeline of events for the pull request
+            
+        Raises:
+            GithubException: If there's an error retrieving the timeline
+        """
+        try:
+            issue = github_repo.get_issue(pr_number)
+            return issue.get_timeline()
+        except GithubException as e:
+            LOG.error(f"Error getting PR timeline for PR #{pr_number}: {e}")
+            raise
backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py (3)

21-21: Improve the type hint for pr_earliest_event.

The type hint Optional[object] is very generic. A more specific type would improve code readability and enable better IDE support.

-        pr_earliest_event:Optional[object]
+        pr_earliest_event: Optional[Any] = None

Also consider adding type imports:

from typing import Any

56-56: Improve parameter spacing and type hints.

The method signature has inconsistent spacing and uses a generic type for pr_earliest_event.

-    def get_pr_performance(pr: PullRequest, pr_events: [PullRequestEvent],pr_earliest_event:Optional[object]):
+    def get_pr_performance(pr: PullRequest, pr_events: List[PullRequestEvent], pr_earliest_event: Optional[Any] = None):

90-94: Improve indentation and handling of pr_earliest_event.created_at.

The current indentation makes this conditional expression hard to read. Also, there's no fallback if pr_earliest_event exists but doesn't have a created_at attribute.

-        cycle_time = pr.state_changed_at - (
-        pr_earliest_event.created_at.astimezone(timezone.utc) 
-        if pr_earliest_event and isinstance(pr_earliest_event.created_at, datetime) 
-        else pr.created_at
-        )
+        first_open_time = (
+            pr_earliest_event.created_at.astimezone(timezone.utc)
+            if pr_earliest_event and hasattr(pr_earliest_event, 'created_at') and isinstance(pr_earliest_event.created_at, datetime)
+            else pr.created_at
+        )
+        cycle_time = pr.state_changed_at - first_open_time
backend/analytics_server/mhq/service/code/sync/etl_github_handler.py (5)

149-151: Add space after comma in method call.

Maintain consistent spacing in method calls.

-                str(org_repo.id), github_pr, github_repo
+                str(org_repo.id), github_pr, github_repo

160-160: Fix parameter spacing.

The spacing after commas is inconsistent.

-        self, repo_id: str, pr: GithubPullRequest,github_repo:GithubRepository
+        self, repo_id: str, pr: GithubPullRequest, github_repo: GithubRepository

175-176: Improve variable spacing and variable naming.

The code lacks spaces around operators and could use a more descriptive variable name.

-        pr_timeline_events=self._api.get_pr_timeline(github_repo,pr.number)
-        pr_earliest_ready_for_review= self.get_first_ready_for_review_event(pr_timeline_events)
+        pr_timeline_events = self._api.get_pr_timeline(github_repo, pr.number)
+        pr_earliest_ready_for_review = self.get_first_ready_for_review_event(pr_timeline_events)

188-188: Fix parameter spacing.

Add space after comma for consistency.

-            pr_model, pr_events_model_list, pr_commits_model_list,pr_earliest_ready_for_review
+            pr_model, pr_events_model_list, pr_commits_model_list, pr_earliest_ready_for_review

194-202: Improve type annotations and add docstring.

The current type hint for events is too generic. Also, the method would benefit from a docstring explaining its purpose.

-    def get_first_ready_for_review_event(events: List) -> Optional[object]:
+    def get_first_ready_for_review_event(events: List[Any]) -> Optional[Any]:
+        """
+        Find the earliest 'ready_for_review' event from the timeline events.
+        
+        This method filters PR timeline events to find when a PR was first marked
+        as ready for review after being in draft state.
+        
+        Args:
+            events: List of PR timeline events
+            
+        Returns:
+            The earliest 'ready_for_review' event or None if no such events exist
+        """
         ready_events = [
             event for event in events
             if getattr(event, "event", None) == "ready_for_review" and hasattr(event, "created_at")
         ]
         if not ready_events:
             return None
         earliest_event = min(ready_events, key=lambda e: e.created_at)
         return earliest_event

Also, consider importing Any from typing:

from typing import Any
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8712219 and 9c39e98.

📒 Files selected for processing (3)
  • backend/analytics_server/mhq/exapi/github.py (1 hunks)
  • backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py (5 hunks)
  • backend/analytics_server/mhq/service/code/sync/etl_github_handler.py (4 hunks)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py (4)

21-22: Add default value for optional parameter

The new pr_earliest_event parameter is marked as Optional but doesn't have a default value, which could cause issues if the method is called without this parameter from existing code.

-        pr_earliest_event:Optional[datetime]
+        pr_earliest_event: Optional[datetime] = None

26-26: Fix spacing in parameter list

Missing space after comma for better readability.

-        pr_performance = self.get_pr_performance(pr, pr_events,pr_earliest_event)
+        pr_performance = self.get_pr_performance(pr, pr_events, pr_earliest_event)

56-57: Improve parameter formatting and add default value

The parameter list has inconsistent spacing and type annotation style. Also, similar to the create_pr_metrics method, the optional parameter should have a default value.

-    def get_pr_performance(pr: PullRequest, pr_events: [PullRequestEvent],pr_earliest_event:Optional[datetime]):
+    def get_pr_performance(pr: PullRequest, pr_events: List[PullRequestEvent], pr_earliest_event: Optional[datetime] = None):

91-96: Add comments explaining the purpose of this logic

It would be helpful to add a comment explaining what first_open_time and pr_earliest_event represent, especially since this is a core part of the fix for calculating cycle time for draft PRs. This helps future developers understand the intent of the code.

+        # Determine when the PR was first actually open for review.
+        # For draft PRs, this is when they were marked as "ready for review", not when they were created.
         first_open_time = (
             pr_earliest_event
             if pr_earliest_event and isinstance(pr_earliest_event, datetime)
             else pr.created_at
         ).astimezone(timezone.utc)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9c39e98 and c227cc9.

📒 Files selected for processing (2)
  • backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py (5 hunks)
  • backend/analytics_server/mhq/service/code/sync/etl_github_handler.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/analytics_server/mhq/service/code/sync/etl_github_handler.py
🔇 Additional comments (1)
backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py (1)

97-97: LGTM: Correct cycle time calculation

The updated calculation correctly uses first_open_time instead of pr.created_at, addressing the issue with draft PRs where cycle time should be calculated from when the PR was first ready for review.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py (3)

1-1: Remove unused import.

The timezone from datetime is imported but not used in the code. It's best to remove unused imports to keep the code clean.

-from datetime import datetime,timedelta,timezone
+from datetime import datetime,timedelta
🧰 Tools
🪛 Ruff (0.8.2)

1-1: datetime.timezone imported but unused

Remove unused import: datetime.timezone

(F401)


56-56: LGTM!

Method signature properly updated to include the new parameter. Consider adding a space after the comma for better readability.

-def get_pr_performance(pr: PullRequest, pr_events: [PullRequestEvent],pr_earliest_event:Optional[datetime]):
+def get_pr_performance(pr: PullRequest, pr_events: [PullRequestEvent], pr_earliest_event: Optional[datetime]):

103-114: Consider updating first_review_time calculation.

The first_review_time calculation still uses pr.created_at and doesn't account for the new first_open_time. For consistency with the cycle time calculation, consider using first_open_time here as well.

            first_review_time=(
-               (first_review.created_at - pr.created_at).total_seconds()
+               (first_review.created_at - first_open_time).total_seconds()
                if first_review
                else -1
            ),
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c227cc9 and eb73659.

📒 Files selected for processing (2)
  • backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py (5 hunks)
  • backend/analytics_server/mhq/service/code/sync/etl_github_handler.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/analytics_server/mhq/service/code/sync/etl_github_handler.py
🧰 Additional context used
🪛 Ruff (0.8.2)
backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py

1-1: datetime.timezone imported but unused

Remove unused import: datetime.timezone

(F401)

🔇 Additional comments (5)
backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py (5)

2-2: LGTM!

Good addition of Optional from typing to support the new parameter functionality.


21-21: LGTM!

The addition of the pr_earliest_event parameter enables tracking the earliest "ready for review" event for draft PRs, which is essential for accurate cycle time calculation.


26-26: LGTM!

Correctly passing the new parameter to the get_pr_performance method.


91-98: LGTM! Logic handles draft PRs correctly.

Good implementation of the logic to determine the first_open_time based on the earliest "ready for review" event or the PR creation time. This resolves the issue of cycle time calculation for draft PRs by considering when they actually became ready for review rather than when they were initially created.


183-183: LGTM!

No changes were made to the get_rework_cycles method, but it works correctly with the updated code.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
backend/analytics_server/tests/service/code/sync/test_etl_github_handler.py (1)

370-385: Good edge case handling for no ready events.

This test properly verifies the behavior when no "ready_for_review" events are present in the timeline data, ensuring the method returns None as expected.

Consider extracting the MockTimelineEvent class to reduce duplication with the previous test, though keeping it self-contained is also acceptable for test clarity.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eb73659 and d906944.

📒 Files selected for processing (1)
  • backend/analytics_server/tests/service/code/sync/test_etl_github_handler.py (1 hunks)
🔇 Additional comments (2)
backend/analytics_server/tests/service/code/sync/test_etl_github_handler.py (2)

349-368: Well-structured test for finding the earliest ready event.

The test effectively verifies that the get_first_ready_for_review_event method correctly identifies the earliest "ready_for_review" event from a list of mixed timeline events, which is essential for accurately calculating PR cycle time.


387-391: Appropriate empty list edge case handling.

This test verifies that the method correctly handles an empty list of events by returning None, which is an important edge case to cover for production reliability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants