Skip to content

Conversation

@codegen-sh
Copy link

@codegen-sh codegen-sh bot commented Mar 23, 2025

This PR creates a comprehensive integrated CI/CD flow example that combines all the existing codegen examples into a cohesive pipeline. The implementation:

  1. Uses modern FileIndex instead of the deprecated VectorIndex for semantic search
  2. Implements an event-driven architecture with a simple event bus for component communication
  3. Provides both Modal deployment and local development options
  4. Includes comprehensive documentation and configuration templates

The flow connects:

  • Linear issues → AI-assisted development → GitHub PRs → Automated code review → Slack notifications

Key components:

  • app.py: Main application with Modal deployment
  • models.py: Shared data models
  • event_bus.py: Simple event bus for communication between components
  • events.py: Event handlers for Linear, GitHub, and Slack
  • agents.py: AI agents for code generation and review
  • utils.py: Utility functions

To use this example:

  1. Create a .env file from the template
  2. Deploy with Modal: modal deploy app.py
  3. Create a Linear issue with the "Codegen" label
  4. The system will automatically analyze the issue, generate code changes, create a PR, and review it

Comment on lines +75 to +83
response = self.client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a software development planner."},
{"role": "user", "content": prompt}
],
response_format={"type": "json_object"},
temperature=0
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lack of Error Handling in API Calls

The method create_plan in the PlanningAgent class makes a call to the OpenAI API without handling potential exceptions that might occur during the call (e.g., network issues, API limits exceeded). This can lead to unhandled exceptions and application crashes.

Recommendation: Implement try-except blocks around the API calls to handle exceptions gracefully. Log the errors and consider implementing a retry mechanism or returning a default response in case of failure.

Comment on lines +75 to +83
response = self.client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a software development planner."},
{"role": "user", "content": prompt}
],
response_format={"type": "json_object"},
temperature=0
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance Concerns with Synchronous API Calls

The create_plan method in the PlanningAgent class uses a synchronous call to fetch data from an external API, which can be a performance bottleneck if the API's response time is slow. This approach also does not utilize any form of caching, which could improve performance by reducing the number of API calls for similar requests.

Recommendation: Consider using asynchronous calls to improve responsiveness. Additionally, implement a caching mechanism to store and reuse results of similar API requests, reducing the need for repeated calls and improving overall performance.

# Set up event handlers
await setup_event_handlers()
# Start the event bus
asyncio.create_task(event_bus.start())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The asynchronous task created with asyncio.create_task for the event bus is not being monitored or handled for exceptions. This can lead to unhandled exceptions which might not stop the running task even if it encounters critical errors.

Recommendation:
Consider using a more robust handling mechanism for background tasks. For example, you could keep a reference to the task and add an exception handler:

self.event_bus_task = asyncio.create_task(event_bus.start())
self.event_bus_task.add_done_callback(self.handle_task_result)

This way, you can log exceptions or take appropriate actions if the task fails.


@modal_app.function(
image=base_image,
secrets=[modal.Secret.from_dotenv()],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of modal.Secret.from_dotenv() relies on the security of the .env file, which must be properly managed to avoid exposing sensitive information. If the .env file is not secured or accidentally included in version control, it could lead to security vulnerabilities.

Recommendation:
Ensure that the .env file is included in your .gitignore file to prevent it from being checked into version control. Additionally, consider using a more secure vault solution for production environments, such as AWS Secrets Manager or HashiCorp Vault, to enhance the security of your application's secrets.

Comment on lines +70 to +81
async def _process_event(self, event: Event) -> None:
"""Process an event by calling all subscribers.

Args:
event: The event to process
"""
if event.type in self.subscribers:
for callback in self.subscribers[event.type]:
try:
await callback(event)
except Exception as e:
logger.error(f"Error in subscriber callback: {e}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: Generic Exception Handling in Event Callbacks

The _process_event method catches and logs exceptions from subscriber callbacks but does not re-raise them or otherwise notify the system of the error (lines 78-81). This approach can lead to silent failures where errors in callbacks do not halt or alter the system's operation, potentially masking significant issues.

Recommendation:
Consider implementing a more robust error handling strategy. Options include re-raising exceptions, implementing a retry mechanism, or notifying the system through an error handling event or callback. This would help in maintaining system integrity and responsiveness in the face of errors.

Comment on lines +17 to +30
self.subscribers: Dict[EventType, List[Callable]] = {}
self.event_queue = asyncio.Queue()
self.running = False

def subscribe(self, event_type: EventType, callback: Callable) -> None:
"""Subscribe to an event type.

Args:
event_type: The type of event to subscribe to
callback: The function to call when the event occurs
"""
if event_type not in self.subscribers:
self.subscribers[event_type] = []
self.subscribers[event_type].append(callback)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: Potential Data Race in Subscriber Management

The EventBus class manages subscribers in a dictionary without explicit locks or concurrency controls (lines 17-30). This could lead to race conditions if subscribe, unsubscribe, or _process_event are called concurrently, potentially corrupting the state of the subscribers dictionary.

Recommendation:
To ensure thread safety, consider using synchronization primitives such as asyncio.Lock to protect accesses and modifications to the subscribers dictionary. This would prevent data races and ensure the integrity of the event handling system.

Comment on lines +52 to +170
repo_name = os.environ.get("GITHUB_REPO", "codegen-sh/codegen-sdk")
codebase = create_codebase(repo_name)

# Create a planning agent
planning_agent = PlanningAgent(codebase)

# Create a development plan
plan = planning_agent.create_plan(issue)

# Comment on the issue with the plan
plan_comment = f"""
## Development Plan

### Summary
{plan.summary}

### Steps
{chr(10).join([f"- {step}" for step in plan.steps])}

### Changes
{chr(10).join([f"- {change.filepath}: {change.description}" for change in plan.code_changes])}

I'll start working on implementing these changes now.
"""
comment_on_linear_issue(issue.id, plan_comment)

# Create a development agent
dev_agent = DevelopmentAgent(codebase)

# Generate code changes
dev_agent.generate_changes(plan)

# Apply code changes
updated_changes = generate_code_changes(plan, codebase)
apply_code_changes(codebase, updated_changes)

# Create a PR
pr_result = create_github_pr(codebase, issue, plan)

# Comment on the issue with the PR link
pr_comment = f"I've created a PR with the changes: {pr_result['url']}"
comment_on_linear_issue(issue.id, pr_comment)


async def handle_github_pr_created(event: Event) -> None:
"""Handle a GitHub PR created event.

Args:
event: The event to handle
"""
logger.info("[GITHUB_PR_CREATED] Handling GitHub PR created event")

# Process the GitHub PR event
pr = process_github_pr_event(event.payload)

# Check if the PR has the Codegen label
if "Codegen" not in pr.labels:
logger.info(f"PR #{pr.number} does not have the Codegen label, skipping")
return

# Create a codebase
repo_name = os.environ.get("GITHUB_REPO", "codegen-sh/codegen-sdk")
codebase = create_codebase(repo_name)

# Create a review agent
review_agent = ReviewAgent(codebase)

# Review the PR
review = review_agent.review_pr(pr)

# Post a summary comment
summary_comment = f"""
## Code Review

{review.summary}

### Suggestions
{chr(10).join([f"- {suggestion}" for suggestion in review.suggestions])}

{"I approve this PR! ✅" if review.approval else "I have some concerns that should be addressed before merging. ❌"}
"""
create_pr_comment(codebase, pr.number, summary_comment)

# Post individual comments
for comment in review.comments:
create_pr_comment(
codebase,
pr.number,
comment["comment"],
commit_sha=pr.head_sha,
path=comment["filepath"],
line=comment["line"]
)


async def handle_slack_message(event: Event) -> None:
"""Handle a Slack message event.

Args:
event: The event to handle
"""
logger.info("[SLACK_MESSAGE] Handling Slack message event")

# Get the Slack event data
slack_event = event.payload

# Check if it's a message mentioning the bot
if "app_mention" not in slack_event.get("type", ""):
return

# Get the message text
text = slack_event.get("text", "")

# Remove the bot mention
query = text.split(">", 1)[1].strip() if ">" in text else text

# Create a codebase
repo_name = os.environ.get("GITHUB_REPO", "codegen-sh/codegen-sdk")
codebase = create_codebase(repo_name)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The environment variable GITHUB_REPO is accessed multiple times across different functions (handle_linear_issue_created, handle_github_pr_created, handle_slack_message). This repetitive access can lead to performance overhead and potential inconsistencies if the environment variable changes during runtime.

Recommendation:
Extract the access to GITHUB_REPO into a single function or a configuration class that loads all necessary configuration at startup. This approach reduces the overhead and centralizes configuration management, making the code cleaner and more maintainable.

Comment on lines +89 to +181
pr_result = create_github_pr(codebase, issue, plan)

# Comment on the issue with the PR link
pr_comment = f"I've created a PR with the changes: {pr_result['url']}"
comment_on_linear_issue(issue.id, pr_comment)


async def handle_github_pr_created(event: Event) -> None:
"""Handle a GitHub PR created event.

Args:
event: The event to handle
"""
logger.info("[GITHUB_PR_CREATED] Handling GitHub PR created event")

# Process the GitHub PR event
pr = process_github_pr_event(event.payload)

# Check if the PR has the Codegen label
if "Codegen" not in pr.labels:
logger.info(f"PR #{pr.number} does not have the Codegen label, skipping")
return

# Create a codebase
repo_name = os.environ.get("GITHUB_REPO", "codegen-sh/codegen-sdk")
codebase = create_codebase(repo_name)

# Create a review agent
review_agent = ReviewAgent(codebase)

# Review the PR
review = review_agent.review_pr(pr)

# Post a summary comment
summary_comment = f"""
## Code Review

{review.summary}

### Suggestions
{chr(10).join([f"- {suggestion}" for suggestion in review.suggestions])}

{"I approve this PR! ✅" if review.approval else "I have some concerns that should be addressed before merging. ❌"}
"""
create_pr_comment(codebase, pr.number, summary_comment)

# Post individual comments
for comment in review.comments:
create_pr_comment(
codebase,
pr.number,
comment["comment"],
commit_sha=pr.head_sha,
path=comment["filepath"],
line=comment["line"]
)


async def handle_slack_message(event: Event) -> None:
"""Handle a Slack message event.

Args:
event: The event to handle
"""
logger.info("[SLACK_MESSAGE] Handling Slack message event")

# Get the Slack event data
slack_event = event.payload

# Check if it's a message mentioning the bot
if "app_mention" not in slack_event.get("type", ""):
return

# Get the message text
text = slack_event.get("text", "")

# Remove the bot mention
query = text.split(">", 1)[1].strip() if ">" in text else text

# Create a codebase
repo_name = os.environ.get("GITHUB_REPO", "codegen-sh/codegen-sdk")
codebase = create_codebase(repo_name)

# Create a research agent
research_agent = CodeResearchAgent(codebase)

# Research the query
answer = research_agent.research(query)

# Send the response
from slack_sdk import WebClient
client = WebClient(token=os.environ["SLACK_BOT_TOKEN"])
client.chat_postMessage(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The operations that interact with external services such as creating a GitHub PR (create_github_pr) and posting a message to Slack (client.chat_postMessage) do not include error handling. This can lead to unhandled exceptions if the external service fails or is unavailable, potentially causing the application to crash or behave unpredictably.

Recommendation:
Implement try-except blocks around these external service calls to handle possible exceptions gracefully. Log the errors and consider implementing a retry mechanism or alerting mechanisms to handle these failures more robustly.

Comment on lines +24 to +25
payload: Dict
metadata: Optional[Dict] = None

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The payload and metadata fields in the Event class are defined as dictionaries without any further type specifications. This approach provides flexibility but lacks type safety, which can lead to runtime errors if unexpected data types are passed.

Recommendation: Consider using TypedDict from typing to define expected structures for payload and metadata. This will enforce a clearer contract for the data and improve code reliability.

Example:

from typing import TypedDict, Optional

class EventPayload(TypedDict):
    key: str  # Example key
    value: Any  # Expected value type

class EventMetadata(TypedDict, total=False):
    timestamp: float  # Example optional metadata

@dataclass
class Event:
    type: EventType
    payload: EventPayload
    metadata: Optional[EventMetadata] = None


pr: GitHubPR
summary: str
comments: List[Dict]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments field in the CodeReview class is a list of dictionaries, which is flexible but does not enforce any structure or type constraints. This can lead to inconsistencies and errors in data handling.

Recommendation: Define a Comment data class or use TypedDict to specify the structure of a comment. This will enhance type safety and make the data handling more predictable.

Example:

from typing import List, TypedDict

class Comment(TypedDict):
    author: str
    message: str
    line: int

@dataclass
class CodeReview:
    pr: GitHubPR
    summary: str
    comments: List[Comment]
    suggestions: List[str]
    approval: bool

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants