Skip to content

Conversation

@CREDO23
Copy link
Contributor

@CREDO23 CREDO23 commented Jan 5, 2026

Description

  • Migrate the GitHub connector to use httpsgithubcomcoderamp

API Changes

  • This PR includes API changes

Change Type

  • Bug fix
  • New feature
  • Performance improvement
  • Refactoring
  • Documentation
  • Dependency/Build system
  • Breaking change
  • Other (specify):

Testing Performed

  • Tested locally
  • Manual/QA verification

Checklist

  • Follows project coding standards and conventions
  • Documentation updated as needed
  • Dependencies updated as needed
  • No lint/build errors or new warnings
  • All relevant tests are passing

High-level PR Summary

This PR represents an architectural migration of the GitHub connector from a file-by-file indexing approach to a bulk repository processing approach using the gitingest library. The original github_connector.py module (296 lines) that recursively traversed repository files and processed them individually has been replaced with a new modular structure under app/connectors/github/ that leverages gitingest to convert entire repositories into text format optimized for LLM processing. The new implementation processes repositories as single documents rather than individual files, significantly changing the indexing strategy and document structure. The migration also updates the indexer task to work with the new bulk processing model, removing date filtering logic and reducing the granularity from per-file to per-repository documents.

⏱️ Estimated Review Time: 30-90 minutes

💡 Review Order Suggestion
Order File Path
1 surfsense_backend/pyproject.toml
2 surfsense_backend/app/connectors/github/constants.py
3 surfsense_backend/app/connectors/github/__init__.py
4 surfsense_backend/app/connectors/github/service.py
5 surfsense_backend/app/connectors/github/client.py
6 surfsense_backend/app/routes/search_source_connectors_routes.py
7 surfsense_backend/app/tasks/connector_indexers/github_indexer.py
8 surfsense_backend/app/connectors/github_connector.py

Need help? Join our Discord

Analyze latest changes

@vercel
Copy link

vercel bot commented Jan 5, 2026

@CREDO23 is attempting to deploy a commit to the Rohan Verma's projects Team on Vercel.

A member of the Team first needs to authorize it.

Copy link

@recurseml recurseml bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by RecurseML

🔍 Review performed on 70890bd..17c7a19

✨ No bugs found, your code is sparkling clean

✅ Files analyzed, no issues (8)

surfsense_backend/app/connectors/github/__init__.py
surfsense_backend/app/connectors/github/client.py
surfsense_backend/app/connectors/github/constants.py
surfsense_backend/app/connectors/github/service.py
surfsense_backend/app/connectors/github_connector.py
surfsense_backend/app/routes/search_source_connectors_routes.py
surfsense_backend/app/tasks/connector_indexers/github_indexer.py
surfsense_backend/pyproject.toml

- Convert GitIngestService.process_repository() to async using ingest_async()
- Convert GitHubConnector.process_repository() to async
- Update github_indexer to await async process_repository() call
- Prevents blocking event loop in Celery tasks
- Improves performance and concurrency
@CREDO23 CREDO23 closed this Jan 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant