Skip to content

Conversation

@cammv21
Copy link
Collaborator

@cammv21 cammv21 commented Jul 15, 2025

Description

This PR introduces the FetchPullRequestsFromGithub service, responsible for fetching all pull requests (both open and closed) from the repositories within the kommitters and kommit-co GitHub organizations.

The implementation establishes the basic structure for the fetcher and refactors the data extraction strategy to use the Octokit library.

Fixes #156

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented on my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Summary by CodeRabbit

  • New Features

    • Introduced automated fetching and storage of GitHub pull request data for specified organizations.
    • Added scheduled tasks to periodically sync pull requests from GitHub to the warehouse.
    • Enhanced data formatting for pull requests, including related issues, reviews, comments, and releases.
    • Expanded warehouse support for the new pull request entity type.
  • Bug Fixes

    • None.
  • Refactor

    • Improved the GitHub data formatter for better contextual awareness and structured JSON output.
  • Tests

    • Added comprehensive tests for fetching and writing pull request data from GitHub.

@cammv21 cammv21 self-assigned this Jul 15, 2025
@cammv21 cammv21 added the 👍 Feature New feature or request label Jul 15, 2025
@coderabbitai
Copy link

coderabbitai bot commented Jul 15, 2025

Walkthrough

This update introduces a new implementation for fetching GitHub pull requests across organization repositories, including related reviews, comments, issues, and releases. It adds a formatter for structuring pull request data, integrates the new fetcher into the warehouse ingestion pipeline, schedules automated executions, and provides comprehensive RSpec tests for the new functionality.

Changes

File(s) Change Summary
src/implementations/fetch_pull_requests_from_github.rb New implementation for fetching GitHub pull requests, related entities, and formatting data for warehouse ingestion.
src/utils/warehouse/github/pull_requests_format.rb New formatter class for standardizing pull request data structure.
src/utils/warehouse/github/base.rb Refactored and expanded base class to support context-aware extraction and JSON formatting of GitHub API data.
src/implementations/warehouse_ingester.rb Added support for github_pull_request entity in warehouse ingestion services.
src/use_cases_execution/warehouse/notion/warehouse_ingester.rb Added new fetcher (FetchPullRequestsFromGithub) to the list of warehouse ingester fetch operations.
src/use_cases_execution/schedules.rb Scheduled scripts for fetching pull requests from GitHub for two organizations.
src/use_cases_execution/warehouse/github/fetch_kommit_co_pull_requests_from_github.rb New script to fetch and store Kommit Co organization's pull requests using the new implementation.
src/use_cases_execution/warehouse/github/fetch_kommitters_pull_requests_from_github.rb New script to fetch and store Kommitters organization's pull requests using the new implementation.
spec/implementations/warehouse/github/fetch_pull_requests_from_github_spec.rb New RSpec test suite covering the fetcher's process and write methods with extensive mocking of dependencies.

Sequence Diagram(s)

sequenceDiagram
    participant Scheduler
    participant Script
    participant Fetcher as FetchPullRequestsFromGithub
    participant GitHubAPI
    participant Formatter
    participant Storage

    Scheduler->>Script: Trigger scheduled PR fetch script
    Script->>Fetcher: Initialize with config and storage
    Fetcher->>GitHubAPI: Authenticate and fetch repositories
    Fetcher->>GitHubAPI: Fetch releases per repository
    loop For each repository
        Fetcher->>GitHubAPI: Fetch pull requests
        loop For each pull request
            Fetcher->>GitHubAPI: Fetch reviews, comments, related issues
            Fetcher->>Formatter: Format PR data with context
        end
    end
    Fetcher->>Storage: Write formatted PRs in paginated batches
Loading

Possibly related PRs

Suggested reviewers

  • FelipeGuzmanSierra

Poem

In the warren of code where the pull requests flow,
A rabbit hops swiftly to fetch what we know.
With reviews and releases, and comments in tow,
It gathers the stories from GitHub below.
Now paged and formatted, the data will grow—
🐇 Into the warehouse, the carrots all go!

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@cammv21 cammv21 force-pushed the fetch-pull-requests-events branch from 00b493a to 6bc6d53 Compare July 16, 2025 23:06
Copy link
Collaborator

@FelipeGuzmanSierra FelipeGuzmanSierra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments, please, let me know if you want to discuss them

@cammv21 cammv21 force-pushed the fetch-pull-requests-events branch from 6bc6d53 to 6170242 Compare July 17, 2025 22:25
@cammv21 cammv21 marked this pull request as ready for review July 17, 2025 22:25
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
src/implementations/fetch_pull_requests_from_github.rb (1)

5-5: Use HTTParty instead of Octokit as per coding guidelines

The coding guidelines specify using HTTParty for HTTP requests. Consider refactoring to use HTTParty with GitHub's REST API instead of the Octokit client.

Also applies to: 78-78

🧹 Nitpick comments (1)
src/use_cases_execution/warehouse/github/fetch_kommit_co_pull_requests_from_github.rb (1)

32-34: Enhance error handling beyond logging

Currently, errors are only logged to stdout. Consider implementing proper error tracking or alerting for production environments.

 rescue StandardError => e
   Logger.new($stdout).info(e.message)
+  # Consider adding error tracking/alerting here
+  # e.g., send to error monitoring service
 end
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5c7fd7c and 6170242.

📒 Files selected for processing (9)
  • spec/implementations/warehouse/github/fetch_pull_requests_from_github_spec.rb (1 hunks)
  • src/implementations/fetch_pull_requests_from_github.rb (1 hunks)
  • src/implementations/warehouse_ingester.rb (2 hunks)
  • src/use_cases_execution/schedules.rb (1 hunks)
  • src/use_cases_execution/warehouse/github/fetch_kommit_co_pull_requests_from_github.rb (1 hunks)
  • src/use_cases_execution/warehouse/github/fetch_kommitters_pull_requests_from_github.rb (1 hunks)
  • src/use_cases_execution/warehouse/notion/warehouse_ingester.rb (1 hunks)
  • src/utils/warehouse/github/base.rb (3 hunks)
  • src/utils/warehouse/github/pull_requests_format.rb (1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.rb

Instructions used from:

Sources:
📄 CodeRabbit Inference Engine

  • CLAUDE.md
src/use_cases_execution/schedules.rb

Instructions used from:

Sources:
📄 CodeRabbit Inference Engine

  • CLAUDE.md
src/implementations/**/*.rb

Instructions used from:

Sources:
📄 CodeRabbit Inference Engine

  • CLAUDE.md
spec/implementations/**/*.rb

Instructions used from:

Sources:
📄 CodeRabbit Inference Engine

  • CLAUDE.md
🧠 Learnings (6)
src/use_cases_execution/warehouse/github/fetch_kommitters_pull_requests_from_github.rb (6)
Learnt from: CR
PR: kommitters/bas_use_cases#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-02T20:14:00.049Z
Learning: Applies to db/build_shared_storage.sql : Update database schema in `db/build_shared_storage.sql` if needed when adding a new use case
Learnt from: CR
PR: kommitters/bas_use_cases#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-02T20:14:00.049Z
Learning: Applies to src/use_cases_execution/*/*.rb : Create use case directory in `src/use_cases_execution/[name]/` with `config.rb` and 4 pipeline files when adding a new use case
Learnt from: juanhiginio
PR: kommitters/bas_use_cases#158
File: spec/implementations/deploy_process_in_operaton/deploy_process_in_operaton_spec.rb:33-42
Timestamp: 2025-07-15T03:46:00.728Z
Learning: In the bas_use_cases repository, implementation test files in `spec/implementations/` follow a minimal testing pattern with a single test that verifies `execute` doesn't return nil, using mocked shared storage dependencies. This is the established convention across all implementation files rather than comprehensive edge case testing.
Learnt from: CR
PR: kommitters/bas_use_cases#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-02T20:14:00.049Z
Learning: Applies to src/implementations/**/*.rb : Create implementations in `src/implementations/` (if they don't already exist) when adding a new use case
Learnt from: juanpabloxk
PR: kommitters/bas_use_cases#104
File: src/use_cases_execution/schedules.rb:119-127
Timestamp: 2025-06-06T20:08:44.949Z
Learning: In the bas_use_cases repository, the user juanpabloxk prefers to keep schedule constants consolidated in a single file (src/use_cases_execution/schedules.rb) rather than extracting them to separate files, even when it exceeds RuboCop's 100-line module limit.
Learnt from: CR
PR: kommitters/bas_use_cases#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-02T20:14:00.049Z
Learning: Applies to src/use_cases_execution/*/config.rb : Database connection configuration should be standardized across all use cases in each `config.rb`
src/use_cases_execution/schedules.rb (2)
Learnt from: CR
PR: kommitters/bas_use_cases#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-02T20:14:00.049Z
Learning: Applies to src/use_cases_execution/schedules.rb : Add schedule constants to `src/use_cases_execution/schedules.rb` when adding a new use case
Learnt from: juanpabloxk
PR: kommitters/bas_use_cases#104
File: src/use_cases_execution/schedules.rb:119-127
Timestamp: 2025-06-06T20:08:44.949Z
Learning: In the bas_use_cases repository, the user juanpabloxk prefers to keep schedule constants consolidated in a single file (src/use_cases_execution/schedules.rb) rather than extracting them to separate files, even when it exceeds RuboCop's 100-line module limit.
src/implementations/warehouse_ingester.rb (5)
Learnt from: CR
PR: kommitters/bas_use_cases#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-02T20:14:00.049Z
Learning: Applies to src/implementations/**/*.rb : Create implementations in `src/implementations/` (if they don't already exist) when adding a new use case
Learnt from: CR
PR: kommitters/bas_use_cases#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-02T20:14:00.049Z
Learning: Applies to src/use_cases_execution/*/config.rb : Database connection configuration should be standardized across all use cases in each `config.rb`
Learnt from: juanhiginio
PR: kommitters/bas_use_cases#158
File: spec/implementations/deploy_process_in_operaton/deploy_process_in_operaton_spec.rb:33-42
Timestamp: 2025-07-15T03:46:00.728Z
Learning: In the bas_use_cases repository, implementation test files in `spec/implementations/` follow a minimal testing pattern with a single test that verifies `execute` doesn't return nil, using mocked shared storage dependencies. This is the established convention across all implementation files rather than comprehensive edge case testing.
Learnt from: juanpabloxk
PR: kommitters/bas_use_cases#104
File: src/use_cases_execution/schedules.rb:119-127
Timestamp: 2025-06-06T20:08:44.949Z
Learning: In the bas_use_cases repository, the user juanpabloxk prefers to keep schedule constants consolidated in a single file (src/use_cases_execution/schedules.rb) rather than extracting them to separate files, even when it exceeds RuboCop's 100-line module limit.
Learnt from: CR
PR: kommitters/bas_use_cases#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-02T20:14:00.049Z
Learning: Applies to **/*.rb : Use explicit imports for modules (e.g., `Date`, `DateTime`); do not rely on Rails autoloading
src/use_cases_execution/warehouse/github/fetch_kommit_co_pull_requests_from_github.rb (7)
Learnt from: CR
PR: kommitters/bas_use_cases#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-02T20:14:00.049Z
Learning: Applies to src/implementations/**/*.rb : Create implementations in `src/implementations/` (if they don't already exist) when adding a new use case
Learnt from: juanhiginio
PR: kommitters/bas_use_cases#158
File: spec/implementations/deploy_process_in_operaton/deploy_process_in_operaton_spec.rb:33-42
Timestamp: 2025-07-15T03:46:00.728Z
Learning: In the bas_use_cases repository, implementation test files in `spec/implementations/` follow a minimal testing pattern with a single test that verifies `execute` doesn't return nil, using mocked shared storage dependencies. This is the established convention across all implementation files rather than comprehensive edge case testing.
Learnt from: CR
PR: kommitters/bas_use_cases#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-02T20:14:00.049Z
Learning: Applies to src/use_cases_execution/*/*.rb : Create use case directory in `src/use_cases_execution/[name]/` with `config.rb` and 4 pipeline files when adding a new use case
Learnt from: CR
PR: kommitters/bas_use_cases#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-02T20:14:00.049Z
Learning: Applies to db/build_shared_storage.sql : Update database schema in `db/build_shared_storage.sql` if needed when adding a new use case
Learnt from: juanpabloxk
PR: kommitters/bas_use_cases#104
File: src/use_cases_execution/schedules.rb:119-127
Timestamp: 2025-06-06T20:08:44.949Z
Learning: In the bas_use_cases repository, the user juanpabloxk prefers to keep schedule constants consolidated in a single file (src/use_cases_execution/schedules.rb) rather than extracting them to separate files, even when it exceeds RuboCop's 100-line module limit.
Learnt from: CR
PR: kommitters/bas_use_cases#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-02T20:14:00.049Z
Learning: Applies to src/use_cases_execution/schedules.rb : Add schedule constants to `src/use_cases_execution/schedules.rb` when adding a new use case
Learnt from: CR
PR: kommitters/bas_use_cases#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-02T20:14:00.049Z
Learning: Applies to src/use_cases_execution/*/config.rb : Database connection configuration should be standardized across all use cases in each `config.rb`
src/implementations/fetch_pull_requests_from_github.rb (3)
Learnt from: CR
PR: kommitters/bas_use_cases#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-02T20:14:00.049Z
Learning: Applies to src/implementations/**/*.rb : Create implementations in `src/implementations/` (if they don't already exist) when adding a new use case
Learnt from: kren-prog
PR: kommitters/bas_use_cases#99
File: src/utils/workspace_helpers/google_chat_space_request_manager.rb:49-59
Timestamp: 2025-06-09T21:43:25.685Z
Learning: In the bas_use_cases Ruby project, avoid assigning HTTParty.post() results to a variable if the method will just return that variable immediately, as it's considered redundant. Only assign to a response variable when the response needs to be processed (e.g., printed) before returning.
Learnt from: juanhiginio
PR: kommitters/bas_use_cases#158
File: spec/implementations/deploy_process_in_operaton/deploy_process_in_operaton_spec.rb:33-42
Timestamp: 2025-07-15T03:46:00.728Z
Learning: In the bas_use_cases repository, implementation test files in `spec/implementations/` follow a minimal testing pattern with a single test that verifies `execute` doesn't return nil, using mocked shared storage dependencies. This is the established convention across all implementation files rather than comprehensive edge case testing.
spec/implementations/warehouse/github/fetch_pull_requests_from_github_spec.rb (3)
Learnt from: CR
PR: kommitters/bas_use_cases#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-02T20:14:00.049Z
Learning: Applies to spec/implementations/**/*.rb : Write comprehensive tests in `spec/implementations/[name]/` when adding a new use case
Learnt from: juanhiginio
PR: kommitters/bas_use_cases#158
File: spec/implementations/deploy_process_in_operaton/deploy_process_in_operaton_spec.rb:33-42
Timestamp: 2025-07-15T03:46:00.728Z
Learning: In the bas_use_cases repository, implementation test files in `spec/implementations/` follow a minimal testing pattern with a single test that verifies `execute` doesn't return nil, using mocked shared storage dependencies. This is the established convention across all implementation files rather than comprehensive edge case testing.
Learnt from: CR
PR: kommitters/bas_use_cases#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-02T20:14:00.049Z
Learning: Applies to src/implementations/**/*.rb : Create implementations in `src/implementations/` (if they don't already exist) when adding a new use case
🧬 Code Graph Analysis (3)
src/utils/warehouse/github/pull_requests_format.rb (1)
src/utils/warehouse/github/base.rb (9)
  • extract_id (29-31)
  • extract_repository_id (53-55)
  • extract_related_issues (77-81)
  • extract_release_id (83-89)
  • format_pg_array (102-106)
  • format_reviews_as_json (91-100)
  • extract_title (73-75)
  • extract_created_at (45-47)
  • extract_merged_at (69-71)
src/implementations/fetch_pull_requests_from_github.rb (1)
src/utils/warehouse/github/pull_requests_format.rb (2)
  • format (12-28)
  • format (15-27)
spec/implementations/warehouse/github/fetch_pull_requests_from_github_spec.rb (1)
src/implementations/fetch_pull_requests_from_github.rb (3)
  • process (41-176)
  • process (48-58)
  • write (63-70)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Tests
🔇 Additional comments (10)
src/use_cases_execution/warehouse/notion/warehouse_ingester.rb (1)

26-26: LGTM! Fetcher addition follows established pattern.

The addition of FetchPullRequestsFromGithub to the fetchers array is consistent with the existing GitHub fetchers and maintains the logical ordering.

src/implementations/warehouse_ingester.rb (2)

16-16: LGTM! Service require follows established pattern.

The require statement for the github_pull_request service is consistent with other GitHub service imports.


68-69: LGTM! Service mapping follows established pattern.

The SERVICES hash entry for github_pull_request follows the same structure as other GitHub services, with appropriate service class and external key mapping.

src/use_cases_execution/schedules.rb (1)

161-161: LGTM! Scheduling follows established pattern.

The scheduling time and path structure are consistent with other GitHub fetchers in the warehouse sync pipeline.

src/utils/warehouse/github/pull_requests_format.rb (1)

1-32: LGTM! Format class follows established pattern.

The PullRequestsFormat class properly inherits from Base and implements a comprehensive format method that maps all relevant pull request fields. The implementation is consistent with other format classes in the codebase and correctly uses the extraction methods from the base class.

src/use_cases_execution/warehouse/github/fetch_kommitters_pull_requests_from_github.rb (1)

1-35: LGTM! Execution script follows established pattern.

The script properly follows the established pattern for GitHub fetcher execution scripts with correct configuration, error handling, and integration with the shared storage system.

spec/implementations/warehouse/github/fetch_pull_requests_from_github_spec.rb (1)

9-130: Excellent comprehensive test coverage!

The test suite thoroughly covers authentication failures, successful data fetching, formatting, and pagination logic. This goes beyond the minimal testing pattern and provides robust coverage.

src/utils/warehouse/github/base.rb (2)

83-89: Review release association logic

The current logic finds the first release published after the PR was merged. This might miss the intended release if there's a delay between merge and release, or if multiple releases happen quickly.

Consider whether this is the intended behavior. You might want to find the release that includes the PR based on commit history or tags instead of just timing.


69-75: LGTM! Safe hash access patterns

Good use of hash notation with symbols for accessing data, which is safer than method calls on potentially nil objects.

src/implementations/fetch_pull_requests_from_github.rb (1)

116-125: Well-structured data aggregation

Good separation of concerns - fetching all related data in a context hash before formatting. This makes the code more testable and maintainable.

Copy link
Collaborator

@FelipeGuzmanSierra FelipeGuzmanSierra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚀

@cammv21 cammv21 merged commit 55276d5 into main Jul 17, 2025
3 checks passed
@cammv21 cammv21 deleted the fetch-pull-requests-events branch July 17, 2025 22:37
@coderabbitai coderabbitai bot mentioned this pull request Sep 25, 2025
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

👍 Feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement fetching of GitHub Pull Requests

3 participants