Skip to content

Variable series length support for foundation models#3125

Draft
Kurokabe wants to merge 1 commit into
masterfrom
variable_length_dataset
Draft

Variable series length support for foundation models#3125
Kurokabe wants to merge 1 commit into
masterfrom
variable_length_dataset

Conversation

@Kurokabe

Copy link
Copy Markdown
Collaborator

Hi @dennisbader and @daidahao ,

Here is my draft PR to support variable-length fine-tuning and inference on foundation models.

The main changes are:

  • VariableLengthTorchTrainingDataset (new class in training_dataset.py): a TorchTrainingDataset subclass that accepts series shorter than input_chunk_length by left-padding the past window with NaN. This allows fit_from_dataset() to handle heterogeneous datasets without requiring per-window input_chunk_length tuning or silently dropping short series. Covariates and sample weights are intentionally not supported for now.

  • FoundationModel._build_inference_dataset override (new method in foundation_model.py): transparently left-pads short series with NaN before passing them to SequentialTorchInferenceDataset, so that predict() works on short series without any manual pre-processing from callers. The padding logic mirrors what VariableLengthTorchTrainingDataset does during training.

Note that for now, only inference has been tested end-to-end. dev_fev_tasks_mini_validation.ipynb and fev_tasks_mini.yaml are development artifacts I've included in case you want to reproduce the validation runs, they will be removed before merging.

One thing I can't fully explain: the notebook compares three approaches. Step 1 (adaptive input_chunk_length, window-by-window) produces different results than steps 2 and 3. Step 2 uses a fixed input_chunk_length=32 with manual NaN pre-padding before fit(), processed window-by-window. Step 3 uses VariableLengthTorchTrainingDataset with the same fixed input_chunk_length=32 in a single pass over all series. Steps 2 and 3 match each other exactly, which validates that VariableLengthTorchTrainingDataset is equivalent to manual pre-padding. But I can't explain why step 1 produces different outputs, since the only difference is the input_chunk_length value used per window, it's likely a context-length effect rather than a batching artefact, but I'm not certain. Do you have any insight on this?

One idea I had for a potential follow-up: instead of a dedicated VariableLengthTorchTrainingDataset, we could relax the short-series validation in ShiftedTorchTrainingDataset (currently a hard error in _get_end_of_output_idx) and handle the NaN padding in a collate_fn passed to the DataLoader.

Let me know what you think :)

…gth inputs and pre-pad smaller inputs during inference for foundation models
@review-notebook-app

Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@codecov

codecov Bot commented May 29, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 19.73684% with 61 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.15%. Comparing base (40af46d) to head (e71a8e2).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
...arts/utils/data/torch_datasets/training_dataset.py 9.23% 59 Missing ⚠️
darts/models/forecasting/foundation_model.py 81.81% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3125      +/-   ##
==========================================
- Coverage   96.54%   96.15%   -0.39%     
==========================================
  Files         160      160              
  Lines       17261    17361     +100     
==========================================
+ Hits        16664    16693      +29     
- Misses        597      668      +71     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@daidahao

Copy link
Copy Markdown
Contributor

Hi @Kurokabe, thank you for inviting me to review this PR. Adding variable-length support to foundation models would be a great feature, and I truly appreciate your efforts in exploring the design and implementation.

To be completely honest, I found it a bit difficult to fully evaluate the PR at its current stage, mainly for two reasons:

  1. FoundationModel still requires a predefined input_chunk_length, which has been a pain point when calling fit() and predict(). Most foundation models support flexible input_chunk_length, but the Darts TorchForecastingModel workflow currently prevents this.

    Ideally, we would want to support flexible length at declaration time, for example:

    model = Chronos2Model(input_chunk_length=None, output_chunk_length=24)

    When calling model.fit() or model.predict(), Darts could pad or trim the time series and set an ad hoc input_chunk_length dynamically, while input_chunk_length itself remains None to indicate flexibility.

    The current VariableLengthTorchTrainingDataset and _build_inference_dataset() address the padding part, but they always cap the series length at the initially declared input_chunk_length and do not set a dynamic input_chunk_length value. Also missing is support for allowing users to declare a flexible input_chunk_length for FoundationModel.

    That said, the current PR could be included as part of a larger solution for “variable length for foundation models”; for example, VariableLengthTorchTrainingDataset could be used transparently when input_chunk_length=None.

  2. Example usage and docstrings are missing. It would be helpful if you could provide a minimal example of using the new VariableLengthTorchTrainingDataset in the PR description. Also, if we are changing the inference behavior of FoundationModel, this should be clearly documented in the docstring, along with any caveats.

    The notebook is also too large and verbose. I would suggest a smaller example, possibly without FEV, that shows the discrepancy between the three methods.

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants