Skip to content

WIP: (feat) Add meta synthetic data kit as an inline provider #2311

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

alinaryan
Copy link
Contributor

What does this PR do?

Adds comprehensive test suite for the synthetic data kit provider implementation, including both unit and integration tests. This ensures the provider's functionality, configuration handling, and error cases are properly validated according to Llama Stack's testing guidelines.

Test Plan

  1. Unit Tests (tests/unit/providers/inline/synthetic_data_generation/test_synthetic_data_kit.py):

    pytest tests/unit/providers/inline/synthetic_data_generation/test_synthetic_data_kit.py -v

    Verifies:

    • Configuration initialization and validation
    • Environment variable handling via sample_run_config()
    • Basic synthetic data generation
    • Filtering functionality
    • Custom model specification
  2. Integration Tests (tests/integration/providers/inline/synthetic_data_generation/test_synthetic_data_kit_integration.py):

    # Start vLLM server on port 8000 first
    python -m vllm.entrypoints.api_server --model meta-llama/Llama-3.2-3B-Instruct --port 8000
    
    # Then run integration tests
    pytest tests/integration/providers/inline/synthetic_data_generation/test_synthetic_data_kit_integration.py -v

    Verifies:

    • End-to-end provider functionality with LlamaStackAsLibraryClient
    • Error handling for invalid inputs
    • Environment configuration integration
    • Response format and content validation

Prerequisites:

  • vLLM server running locally on port 8000
  • Access to meta-llama/Llama-3.2-3B-Instruct model
  • Python environment with test dependencies installed

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 29, 2025
@alinaryan alinaryan marked this pull request as draft May 29, 2025 20:29
@alinaryan alinaryan changed the title WIP: (feat) Add meta synthetic data kit WIP: (feat) Add meta synthetic data kit as an inline provider May 29, 2025
alinaryan added 3 commits May 30, 2025 12:14
This establishes the API contract and prepares for provider integration in a future commit.

Signed-off-by: Alina Ryan <aliryan@redhat.com>
…_generation API

The synthetic_data_kit provider integration enables high-quality synthetic dataset
generation for fine-tuning LLMs. This commit sets up the initial provider
registration and fixes provider resolution to properly handle type casting and
imports, ensuring proper integration with llama-stack's provider system.

Implementation of the actual provider functionality will follow in a subsequent
commit.

Signed-off-by: Alina Ryan <aliryan@redhat.com>
These tests follow Llama Stack's provider
testing guidelines to validate:
- Configuration handling and environment variables work as expected
- Provider implementation behaves correctly in both unit and integration scenarios
- Error cases are properly handled
- Integration with Llama Stack's client SDK functions properly

Signed-off-by: Alina Ryan <aliryan@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot. new-in-tree-provider
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants