Skip to content

[Gap Analysis] Implement In-House Model Serving Framework #308

@ooples

Description

@ooples

User Story

As a developer who has trained a model with AiDotNet, I want to easily deploy it as a high-performance, production-ready REST API, so that I can integrate its predictions into my applications without needing a complex external serving solution.


Phase 1: Core Server and Model Management

Goal: Create the foundational web server project and the API endpoints for dynamically loading and managing models.

AC 1.1: Create the Serving Project (2 points)

Requirement: Set up a new ASP.NET Core Web API project for the server.

  • Create a new C# project: src/Serving/AiDotNet.Serving.
  • The project must be configured as an ASP.NET Core Web API.
  • Add necessary project references to the core AiDotNet library.

AC 1.2: Implement Model Repository (3 points)

Requirement: Create a thread-safe, singleton service to store and manage loaded models.

  • Create a new file: src/Serving/ModelRepository.cs.
  • Define a public class ModelRepository<T> as a singleton service.
  • It must contain a private ConcurrentDictionary<string, IModel<T>> to store the loaded models, mapping a model name to the model instance.
  • Implement methods: AddModel(string name, IModel<T> model), GetModel(string name), and RemoveModel(string name).

AC 1.3: Create Model Management API (5 points)

Requirement: Create an ASP.NET controller with endpoints to manage the models in the repository.

  • Create a new controller: src/Serving/Controllers/ModelsController.cs.
  • POST /models Endpoint:
    • This endpoint will accept a request body containing a model_name and a model_path.
    • It must load a saved AiDotNet model from the model_path using the library's existing deserialization logic.
    • It will then add the loaded model to the ModelRepository with the given name.
  • GET /models Endpoint:
    • This endpoint will return a list of the names of all currently loaded models from the ModelRepository.
  • DELETE /models/{model_name} Endpoint:
    • This endpoint will remove the specified model from the ModelRepository.

Phase 2: High-Performance Inference Endpoint

Goal: Implement the /predict endpoint with dynamic request batching to maximize throughput.

AC 2.1: Implement Request Batching Service (13 points)

Requirement: Create the core service that collects, batches, and processes inference requests.

  • Create a new file: src/Serving/RequestBatcher.cs.
  • Define a public class RequestBatcher<T> as a singleton service.
  • It will contain a ConcurrentQueue to hold incoming requests. Each item in the queue will be a tuple containing the request data and a TaskCompletionSource to signal when the result is ready.
  • Background Worker: The RequestBatcher will start a background task (Task.Run) in its constructor that runs an infinite loop.
  • Batching Logic (inside the loop):
    1. await Task.Delay(10); (The batching window, should be configurable).
    2. Dequeue all currently pending requests from the queue.
    3. If there are requests, collate their individual input data into a single, large batch tensor.
    4. Run the batch tensor through the appropriate model (this requires passing the model to the batcher).
    5. De-collate the model's output tensor back into individual results.
    6. For each original request, use its TaskCompletionSource to set its result, which unblocks the waiting HTTP request.

AC 2.2: Create the /predict Endpoint (5 points)

Requirement: Create the public-facing API endpoint that users will call for inference.

  • Create a new controller: src/Serving/Controllers/InferenceController.cs.
  • Implement a POST /predict/{model_name} endpoint.
  • Endpoint Logic:
    • This method will not call the model directly.
    • It will create a TaskCompletionSource.
    • It will add the request data and the TaskCompletionSource to the RequestBatcher's queue.
    • It will then await the TaskCompletionSource.Task to get the result.
    • Once the result is available (set by the batcher), it will be returned as the HTTP response.

Phase 3: Configuration and Testing

Goal: Make the server configurable and verify its functionality, especially the batching mechanism.

AC 3.1: Add Configuration (3 points)

Requirement: Allow server settings to be configured via appsettings.json.

  • In the appsettings.json file, add a section for ServingSettings.
  • Add configuration options for Port, BatchingWindowMilliseconds, and an array of ModelsToLoadOnStartup (each with a name and path).
  • The server must load these settings on startup.

AC 3.2: Integration Test (8 points)

Requirement: Create an end-to-end test that proves the server works and that batching is effective.

  • In a test project, use WebApplicationFactory to host the server in-memory.
  • Test Logic:
    1. Create and save a simple AiDotNet model that can be loaded by the server.
    2. Use an HttpClient to call the POST /models endpoint to load the model.
    3. Create a list of 10 concurrent tasks, where each task sends a unique request to the POST /predict/{model_name} endpoint.
    4. Run all 10 tasks concurrently using Task.WhenAll().
    5. Assert that all 10 tasks complete successfully and that each received its correct, corresponding response.
    6. (Advanced) Use a mock model to verify that its Forward method was called only once with a batch size of 10, proving that the dynamic batching worked correctly.

Definition of Done

  • All checklist items are complete.
  • A new AiDotNet.Serving project is created.
  • A user can start the server, load a model via a REST API call, and get predictions from it.
  • The server correctly batches concurrent requests into a single model execution.
  • Integration tests verify the end-to-end functionality.

⚠️ CRITICAL ARCHITECTURAL REQUIREMENTS

Before implementing this user story, you MUST review:

Mandatory Implementation Checklist

1. INumericOperations Usage (CRITICAL)

  • Include protected static readonly INumericOperations<T> NumOps = MathHelper.GetNumericOperations<T>(); in base class
  • NEVER hardcode double, float, or specific numeric types - use generic T
  • NEVER use default(T) - use NumOps.Zero instead
  • Use NumOps.Zero, NumOps.One, NumOps.FromDouble() for values
  • Use NumOps.Add(), NumOps.Multiply(), etc. for arithmetic
  • Use NumOps.LessThan(), NumOps.GreaterThan(), etc. for comparisons

2. Inheritance Pattern (REQUIRED)

  • Create I{FeatureName}.cs in src/Interfaces/ (root level, NOT subfolders)
  • Create {FeatureName}Base.cs in src/{FeatureArea}/ inheriting from interface
  • Create concrete classes inheriting from Base class (NOT directly from interface)

3. PredictionModelBuilder Integration (REQUIRED)

  • Add private field: private I{FeatureName}<T>? _{featureName}; to PredictionModelBuilder.cs
  • Add Configure method taking ONLY interface (no parameters):
    public IPredictionModelBuilder<T, TInput, TOutput> Configure{FeatureName}(I{FeatureName}<T> {featureName})
    {
        _{featureName} = {featureName};
        return this;
    }
  • Use feature in Build() with default: var {featureName} = _{featureName} ?? new Default{FeatureName}<T>();
  • Verify feature is ACTUALLY USED in execution flow

4. Beginner-Friendly Defaults (REQUIRED)

  • Constructor parameters with defaults from research/industry standards
  • Document WHY each default was chosen (cite papers/standards)
  • Validate parameters and throw ArgumentException for invalid values

5. Property Initialization (CRITICAL)

  • NEVER use default! operator
  • String properties: = string.Empty;
  • Collections: = new List<T>(); or = new Vector<T>(0);
  • Numeric properties: appropriate default or NumOps.Zero

6. Class Organization (REQUIRED)

  • One class/enum/interface per file
  • ALL interfaces in src/Interfaces/ (root level)
  • Namespace mirrors folder structure (e.g., src/Regularization/namespace AiDotNet.Regularization)

7. Documentation (REQUIRED)

  • XML documentation for all public members
  • <b>For Beginners:</b> sections with analogies and examples
  • Document all <param>, <returns>, <exception> tags
  • Explain default value choices

8. Testing (REQUIRED)

  • Minimum 80% code coverage
  • Test with multiple numeric types (double, float)
  • Test default values are applied correctly
  • Test edge cases and exceptions
  • Integration tests for PredictionModelBuilder usage

⚠️ Failure to follow these requirements will result in repeated implementation mistakes and PR rejections.

See full details: .github/USER_STORY_ARCHITECTURAL_REQUIREMENTS.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions