Skip to content

Add Feature Extraction Support for API Classifiers #77

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

mohamedelabbas1996
Copy link

@mohamedelabbas1996 mohamedelabbas1996 commented Apr 14, 2025

Description

This PR adds support for returning model feature vectors (embeddings) alongside classification results in the Data Companion API.
The classification pipeline now supports returning a vector embedding per classification, derived from the classification model backbone.

The changes are fully backward-compatible for models that do not implement custom get_features(), as they will fallback to returningNone from the base class.

Related Issues

#752

Screenshots

Detection features clustering visualization using K-means + PCA
image

Copy link

sentry-io bot commented Apr 14, 2025

🔍 Existing Issues For Review

Your pull request is modifying functions with the following pre-existing issues:

📄 File: trapdata/api/models/classification.py

Function Unhandled Issue
save_results ValidationError: 15 validation errors for ClassificationResponse ...
Event Count: 2
save_results ValidationError: 10 validation errors for ClassificationResponse ...
Event Count: 2
save_results AttributeError: 'NoneType' object has no attribute 'tolist' ...
Event Count: 1
save_results ValueError: not enough values to unpack (expected 3, got 2) ...
Event Count: 1

Did you find this useful? React with a 👍 or 👎

@mohamedelabbas1996 mohamedelabbas1996 marked this pull request as ready for review April 22, 2025 15:38
@@ -36,7 +36,8 @@ pyobjus = [
{ version = "^1.2.1", platform = "darwin" },
{ version = "^1.2.1", platform = "linux" },
]

plotly = "^5.21.0"
scikit-learn = "^1.3.0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make these optional dependencies and just use numpy in the tests. unless we need to use them in the core app.

[tool.poetry.extras]
dev = ["plotly", "scikit-learn"]

@@ -287,6 +287,16 @@ def get_model(self):
model.eval()
return model

def get_features(self, batch_input: torch.Tensor) -> torch.Tensor:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on this method of extracting features! It seems more flexible than our current feature extractor. Perhaps we should add a comment in both feature extractors that the other one exists. And eventually update the old one to use this code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants