Skip to content

Add per-timeframe feature processing support#178

Merged
BY571 merged 5 commits intomainfrom
feature/per-timeframe-feature-processing
Feb 5, 2026
Merged

Add per-timeframe feature processing support#178
BY571 merged 5 commits intomainfrom
feature/per-timeframe-feature-processing

Conversation

@BY571
Copy link
Collaborator

@BY571 BY571 commented Feb 5, 2026

Summary

  • Allow different feature processing functions per timeframe (e.g., [process_1min, process_5min])
  • Support different feature dimensions per timeframe in observation specs
  • Add input validation with helpful error messages
  • Maintain full backward compatibility (single function still works)

Closes #177

Changes

Core Implementation:

  • sampler.py: Accept list of processing functions, add get_num_features_per_timeframe() and get_feature_keys_per_timeframe() methods
  • offline_base.py: Update _build_observation_specs() to handle per-timeframe feature counts with validation
  • sequential.py: Use per-timeframe feature counts for spec building

Validation:

  • Raise ValueError if feature processing function produces no features_* columns
  • Raise ValueError if num_features dict is missing timeframe keys
  • Include timeframe name in error messages for easier debugging

Test plan

  • 455 offline tests passing
  • Backward compatibility verified (single function works)
  • Different feature dimensions verified (3 vs 5 features)
  • Multi-step episode shape consistency verified
  • Error handling for invalid inputs verified
  • All review agents passed (TorchRL, Code Simplifier, Test Analyzer)

🤖 Generated with Claude Code

BY571 and others added 5 commits February 5, 2026 11:30
Enable different feature processing functions for each timeframe,
allowing different feature engineering logic and output dimensions.

Changes:
- MarketDataObservationSampler: Accept list of callables for
  per-timeframe processing (backward compatible with single callable)
- Add get_feature_keys_per_timeframe() and get_num_features_per_timeframe()
- Update _build_observation_specs() to handle per-TF feature counts
- Update SequentialTradingEnv to use per-TF feature counts

Usage:
  env = SequentialTradingEnv(
      df, config,
      feature_preprocessing_fn=[process_1min, process_5min]
  )

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address test coverage gaps identified by PR review:
- Add TestOneStepPerTimeframeFeatures class with tests for different
  feature dimensions per timeframe in OneStep environments
- Add TestSLTPPerTimeframeFeatures class with tests for different
  feature dimensions per timeframe in SLTP environments
- Add multi-step episode test to verify observation shapes remain
  consistent throughout an entire episode (not just reset and first step)

All 461 offline tests pass.

Refs: #177

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address high-priority gaps from test analyzer:
1. Add validation in sampler.py: raise ValueError if feature processing
   function produces no columns starting with 'features_'
2. Add validation in offline_base.py: raise ValueError if num_features
   dict is missing keys for any timeframe

Tests added:
- test_feature_processing_fn_with_no_features_raises
- test_feature_processing_fn_partial_failure_raises
- test_feature_processing_with_no_features_raises (env-level)

All 464 offline tests pass.

Refs: #177

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Apply test-justification skill to eliminate duplicate coverage:

Deleted (8 tests):
- test_sequential.py: test_env_single_function_backward_compatible
  (covered by sampler test - env just passes through)
- test_sequential.py: test_feature_processing_with_no_features_raises
  (covered by sampler test - validation is in sampler)
- test_sequential_sltp.py: TestSLTPPerTimeframeFeatures class (3 tests)
  (SLTP inherits from base - no SLTP-specific code path for this feature)
- test_onestep.py: TestOneStepPerTimeframeFeatures class (3 tests)
  (OneStep inherits from base - no OneStep-specific code path)

Consolidated (2 tests → 1 parametrized):
- test_feature_processing_fn_with_no_features_raises
- test_feature_processing_fn_partial_failure_raises
  → Single parametrized test with [single_tf, multi_tf_partial] cases

Result: 456 tests (down from 464), all passing.
Test reduction: 47% fewer per-timeframe feature tests with same coverage.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Merge test_env_with_different_feature_dimensions and
test_multi_step_episode_maintains_shapes into single test.

The multi-step test is a superset - it verifies spec shapes,
reset shapes, AND shape consistency over 30 steps.

Result: 455 tests (down from 456), all passing.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@BY571
Copy link
Collaborator Author

BY571 commented Feb 5, 2026

@claude review

@claude
Copy link

claude bot commented Feb 5, 2026

Claude finished @BY571's task in 1m 52s —— View job


Code Review

Todo List

  • Read repository guidelines (CLAUDE.md)
  • Review implementation files
  • Review test files
  • Provide comprehensive feedback

Review Summary

This PR successfully implements per-timeframe feature processing support while maintaining full backward compatibility. The implementation is well-structured, thoroughly tested, and includes excellent error handling with informative messages.

Strengths

1. Clean API Design

  • Backward compatible: single function still works exactly as before
  • Natural extension: list of functions for per-timeframe customization
  • Type hints are clear: FeatureProcessingFn = Optional[Union[Callable, Sequence[Callable]]]

2. Excellent Error Handling

The validation is comprehensive and provides actionable error messages:

# sampler.py:100-105
if len(cols_to_keep) == 0:
    raise ValueError(
        f"Feature processing function for timeframe {tf.obs_key_freq()} "
        f"produced no columns starting with '{features_start_with}'. "
        f"Ensure your function adds columns like 'features_close', 'features_volume', etc."
    )
# offline_base.py:157-161
if missing_keys:
    raise ValueError(
        f"num_features dict is missing keys for timeframes: {missing_keys}. "
        f"Expected keys: {market_data_keys}, got: {list(num_features.keys())}"
    )

These error messages include:

  • What went wrong
  • Which timeframe is affected
  • How to fix it

3. Comprehensive Testing

  • 266 lines of sampler tests covering:
    • Backward compatibility
    • Different feature dimensions per timeframe
    • Mismatched list lengths
    • Edge cases
  • 96 lines of environment integration tests
  • All 455 offline tests passing

4. Thoughtful Backward Compatibility

The get_feature_keys() method now raises a helpful error when timeframes have different features:

# sampler.py:368-376
def get_feature_keys(self) -> List[str]:
    """Get feature columns (raises error if timeframes have different features).
    
    For backward compatibility with single-function feature processing.
    If timeframes have different features, use get_feature_keys_per_timeframe() instead.
    """
    if not all(c == columns[0] for c in columns):
        raise ValueError(
            "Timeframes have different features. Use get_feature_keys_per_timeframe() instead. "
            f"Feature counts per timeframe: {self.get_num_features_per_timeframe()}"
        )

This is better than the old assert because it:

  • Explains the issue
  • Suggests the solution
  • Provides diagnostic information

5. Clean Implementation

  • Changes are minimal and focused (448 additions, 12 deletions)
  • Logic is well-organized: normalize input → validate → process
  • No code duplication

Minor Observations

1. Type Annotation Consistency

In sampler.py:54, you explicitly type processing_fns:

processing_fns: List[Optional[Callable]] = [None] * len(time_frames)

But in lines 57 and 60, the type annotation is omitted:

processing_fns = [feature_processing_fn] * len(time_frames)
processing_fns = list(feature_processing_fn)

Suggestion: For consistency, you could add type annotations to all branches, but this is a very minor point and the code is clear as-is.

2. Documentation Clarity

The new methods have excellent docstrings:

  • get_feature_keys(): Explains when to use it and when to use alternatives
  • get_feature_keys_per_timeframe(): Clear return type description
  • get_num_features_per_timeframe(): Simple and clear

All docstrings follow the existing style.

3. Test Coverage

The test coverage is excellent:

  • Tests verify shapes over 30 steps (catching any shape inconsistencies)
  • Tests cover both sampler and environment levels
  • Edge cases are tested (mismatched lengths, empty features)

Code Quality Metrics

Aspect Rating Notes
Backward Compatibility ⭐⭐⭐⭐⭐ Perfect - single function still works
Error Messages ⭐⭐⭐⭐⭐ Informative and actionable
Test Coverage ⭐⭐⭐⭐⭐ Comprehensive with 362 new test lines
API Design ⭐⭐⭐⭐⭐ Natural and intuitive
Documentation ⭐⭐⭐⭐⭐ Clear docstrings with examples
Code Organization ⭐⭐⭐⭐⭐ Logical and maintainable

Specific File Reviews

sampler.py (torchtrade/envs/offline/infrastructure/sampler.py:52-65)

The normalization logic is clean and handles all cases:

if feature_processing_fn is None:
    processing_fns: List[Optional[Callable]] = [None] * len(time_frames)
elif callable(feature_processing_fn):
    processing_fns = [feature_processing_fn] * len(time_frames)
else:
    processing_fns = list(feature_processing_fn)
    if len(processing_fns) != len(time_frames):
        raise ValueError(...)

offline_base.py (torchtrade/envs/core/offline_base.py:154-161)

The validation ensures the dictionary has all required keys before accessing them:

if isinstance(num_features, dict):
    missing_keys = set(market_data_keys) - set(num_features.keys())
    if missing_keys:
        raise ValueError(...)

This prevents cryptic KeyError exceptions.

sequential.py (torchtrade/envs/offline/sequential.py:206-208)

Clean integration - the change is minimal and uses the new API correctly:

num_features_per_tf = self.sampler.get_num_features_per_timeframe()
self._build_observation_specs(account_state, num_features_per_tf)

Conclusion

This is production-ready code that successfully implements the feature request (#177) while maintaining backward compatibility and code quality. The implementation is:

  • ✅ Well-designed
  • ✅ Thoroughly tested
  • ✅ Properly documented
  • ✅ Backward compatible
  • ✅ Ready to merge

Recommendation: Approve and merge 🚀

The PR description accurately reflects the changes, all tests pass, and the code quality is excellent. Great work on the implementation and especially on the comprehensive error handling!


@BY571 BY571 merged commit 4ca194e into main Feb 5, 2026
2 checks passed
@BY571 BY571 deleted the feature/per-timeframe-feature-processing branch February 5, 2026 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: Per-timeframe feature processing functions

1 participant