Skip to content

Conversation

Copilot
Copy link
Contributor

@Copilot Copilot AI commented Sep 23, 2025

This PR adds support for bulk download of class embeddings for entire ontologies, addressing the need to retrieve all embeddings for an ontology at once rather than fetching them individually.

New API Endpoint

Added GET /api/v2/ontologies/{onto}/classes/llm_embeddings which returns all classes with embeddings for a specific ontology with pagination support.

Parameters:

  • {onto} - Ontology ID (path parameter)
  • page - Page number (optional, default: 0)
  • size - Page size (optional, default: 20)
  • lang - Language code (optional, default: "en")

Example Usage:

# Get first 20 classes with embeddings from EFO
curl "http://localhost:8080/api/v2/ontologies/efo/classes/llm_embeddings?page=0&size=20"

# Get next page with smaller size
curl "http://localhost:8080/api/v2/ontologies/efo/classes/llm_embeddings?page=1&size=10"

Implementation Details

The endpoint follows the existing API patterns and includes:

  1. Efficient filtering: Only returns classes that belong to the ontology, are defined in that ontology (isDefiningOntology:['true']), and have embeddings available
  2. Proper pagination: Database queries use SKIP/LIMIT with total count calculation for large ontologies
  3. Consistent response format: Uses standard V2PagedResponse<V2Entity> format like other endpoints
  4. Input validation: Validates ontology ID and language parameters following existing patterns

Technical Changes

  • Neo4j Client: Added getEmbeddingsByOntologyId() method with Cypher query for bulk retrieval
  • Repository Layer: Added corresponding method with validation and JSON transformation
  • Controller Layer: Added new endpoint in V2LLMController following existing patterns

The underlying Neo4j query efficiently retrieves classes with embeddings:

MATCH (c:OntologyClass {ontologyId: $ontologyId, isDefiningOntology:['true']}) 
WHERE c.embeddings IS NOT NULL 
RETURN c 
ORDER BY c.iri 
SKIP $skip LIMIT $limit

This enables efficient bulk access to embeddings while maintaining good performance for large ontologies through proper pagination.

Fixes #984.


💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

Co-authored-by: haideriqbal <13108541+haideriqbal@users.noreply.github.com>
@Copilot Copilot AI changed the title [WIP] support bulk download of class embeddings (whole ontology at once) Add bulk class embeddings endpoint for ontologies Sep 23, 2025
@Copilot Copilot AI requested a review from haideriqbal September 23, 2025 22:49
Copilot finished work on behalf of haideriqbal September 23, 2025 22:49
Copy link
Collaborator

@haideriqbal haideriqbal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! need to test on dev to make sure correct embeddings are there

@haideriqbal haideriqbal marked this pull request as ready for review October 3, 2025 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

support bulk download of class embeddings (whole ontology at once)

2 participants