Skip to content

Conversation

@jdpearce4
Copy link
Collaborator

Summary

Introduce a first-class Python client (TranscriptFormerClient) for inference and artifact/data downloads. The client mirrors the CLI configuration behavior while providing a simple, programmatic API that returns an in-memory AnnData object.

Key Features

  • In-memory inference: inference(...) returns an anndata.AnnData without writing to disk.
  • Config parity with CLI: Builds a Hydra-compatible config by:
  • Loading the same CLI YAML defaults.
  • Applying dataclass overrides from kwargs (InferenceConfig, DataConfig).
  • Merging with the checkpoint config.json via the same utility used by the CLI.

Convenience downloads:

  • download_model(...) for checkpoints and embeddings.
  • download_data(...) for CellxGene datasets by species.
  • download_dataset(...) for curated sources (e.g., Tabula Sapiens, Bgee).
  • Logging control: Optional log_level argument to run quietly or verbosely without affecting global logging.

API Surface

  • TranscriptFormerClient.inference(data_file, checkpoint_path, **kwargs) -> anndata.AnnData
  • Accepts most InferenceConfig and DataConfig fields as kwargs (e.g., batch_size, output_keys, gene_col_name, use_raw, use_oom_dataloader, n_data_workers, etc.).
  • Returns a single AnnData with obsm/uns populated per output_keys.
  • TranscriptFormerClient.download_model(model, checkpoint_dir=...) -> None
  • TranscriptFormerClient.download_data(species=[...], output_dir=..., ...) -> int
  • TranscriptFormerClient.download_dataset(dataset, ...) -> anndata.AnnData | None

jdpearce4 and others added 9 commits July 22, 2025 17:21
…ce and artifact downloading

- Deleted `download_artifacts.py` and `inference.py` scripts as they are now replaced by CLI commands.
- Updated CLI commands to improve user experience and added progress tracking for downloads and extractions.
- Enhanced inference configuration to support backward compatibility for checkpoint paths.
- Updated documentation in the inference configuration YAML file to clarify model types and embedding options.
@jdpearce4 jdpearce4 requested a review from Copilot August 25, 2025 23:20
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a first-class Python client (TranscriptFormerClient) that enables programmatic inference and downloads for TranscriptFormer models. The client provides in-memory inference operations that return AnnData objects directly, mirroring CLI configuration behavior while offering a simplified API.

Key Changes

  • Adds Python client with inference(), download_model(), download_data(), and download_dataset() methods
  • Refactors config merging logic into reusable utility for consistency between CLI and client
  • Enhances inference configuration with new checkpoint_path and model_type fields

Reviewed Changes

Copilot reviewed 10 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/transcriptformer/client/client.py Core client implementation with inference and download methods
src/transcriptformer/config/build_config.py Shared config merging utility extracted from CLI
src/transcriptformer/data/dataclasses.py Added checkpoint_path and model_type fields to InferenceConfig
src/transcriptformer/cli/inference.py Refactored to use shared config merging utility
src/transcriptformer/cli/download_artifacts.py Performance improvements to progress tracking
src/transcriptformer/init.py Package-level client exports
README.md Documentation for Python client usage

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@jdpearce4 jdpearce4 changed the title feat: python client; resolves #45 feat: python client; resolves #44 Aug 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants