Skip to content

Latest commit

 

History

History
37 lines (28 loc) · 9.28 KB

SCIENTIFIC_LITERATURE.md

File metadata and controls

37 lines (28 loc) · 9.28 KB

Engineering Design

This document lists key literature that has informed the development of this package. Please note that this is not a conclusive list but highlights the most relevant works. Our design is explicitly built for flexibility, unlike other time series machine learning and deep learning packages that often enforce rigid preprocessing constraints. We intentionally adopt familiar software engineering patterns, inspired by scikit-learn, to provide a modular and adaptable framework. The only assumption we impose is that features must be organized in a context window prior to the target variable. This allows users to focus on their core applications while ensuring compatibility with SHAP and other explainability methods.

Category Title Authors Publication Summary
Regulatory Literature Machine learning algorithms for financial asset price forecasting Ndikum, P. arXiv preprint, 2020 Discusses the application of machine learning algorithms for forecasting financial asset prices, with implications for regulatory frameworks.
Regulatory Literature Advancing Investment Frontiers: Industry-grade Deep Reinforcement Learning for Portfolio Optimization Ndikum, P., & Ndikum, S. arXiv preprint, 2024 Explores deep reinforcement learning approaches for portfolio optimization, emphasizing industry-grade applications and regulatory considerations.
Scientific Literature SHAP-based insights for aerospace PHM: Temporal feature importance, dependencies, robustness, and interaction analysis Alomari, Y., & Andó, M. Results in Engineering, 2024 This paper explores SHAP-based methods for analyzing temporal feature importance in aerospace predictive health management.
Scientific Literature Feature importance explanations for temporal black-box models Sood, A., & Craven, M. AAAI Conference on Artificial Intelligence, 2022 Introduces the TIME framework for explaining temporal black-box models using feature importance.
Scientific Literature WindowSHAP: An efficient framework for explaining time-series classifiers based on Shapley values Nayebi, A., Tipirneni, S., Reddy, C. K., et al. Journal of Biomedical Informatics, 2023 Proposes the WindowSHAP framework to explain time-series classifiers, improving both computational efficiency and explanation quality.
Scientific Literature The sliding window and SHAP theory—an improved system with a long short-term memory network model for state of charge prediction in electric vehicle application Gu, X., See, K. W., Wang, Y., et al. Energies, 2021 Combines sliding window and SHAP theories to enhance LSTM-based SOC prediction models for electric vehicles.
Scientific Literature Cross-Frequency Time Series Meta-Forecasting Van Ness, M., Shen, H., Wang, H., et al. arXiv preprint, 2023 Proposes the CFA model, capable of handling varying frequencies in time series data, supporting flexible universal model assumptions in time series forecasting.
Scientific Literature Unified Training of Universal Time Series Forecasting Transformers Woo, G., Liu, C., Kumar, A., et al. arXiv preprint, 2024 Introduces Moirai, a transformer model that scales universally across multiple time series forecasting tasks without heavy preprocessing constraints.
Scientific Literature Universal Time-Series Representation Learning: A Survey Trirat, P., Shin, Y., Kang, J., et al. arXiv preprint, 2024 Provides a comprehensive survey of universal models for time series, outlining how generalization across datasets is achieved with minimal assumptions.

Partitioning Guidelines

The following heuristics are derived from key papers in the field and are designed to ensure that data partitions used in temporal analysis are robust and appropriate for machine learning tasks.

Heuristic Rule Reasoning Sources
Minimum samples ≥ 3,000 samples Ensures sufficient data for analysis and model training. Grinsztajn et al. (2022), Shwartz-Ziv and Armon (2021), Gorishniy et al. (2021)
Maximum samples ≤ 50,000 samples Defines the upper bound for medium-sized datasets. Larger datasets might require different handling. Shwartz-Ziv and Armon (2021), Gorishniy et al. (2021)
Minimum features ≥ 4 features Ensures dataset complexity and avoids overly simplistic data. Grinsztajn et al. (2022), Shwartz-Ziv and Armon (2021), Gorishniy et al. (2021)
Maximum features < 500 features Avoids high-dimensional data issues which can be problematic for many models. Grinsztajn et al. (2022), Shwartz-Ziv and Armon (2021)
Feature to sample ratio d/n < 1/10 Ensures a sufficient number of samples relative to the number of features, mitigating the risk of overfitting. Grinsztajn et al. (2022)
Categorical feature cardinality ≤ 20 unique values Focuses on managing low-cardinality categorical features effectively. Grinsztajn et al. (2022)
Numerical feature uniqueness ≥ 10 unique values Ensures sufficient variability in numerical features, crucial for robust modeling. Grinsztajn et al. (2022)
Binary numerical features Convert to categorical if 2 unique values Properly categorizes binary features to ensure that the models interpret them correctly. Grinsztajn et al. (2022)
Class balance (for classification) Equal samples per class Ensures that the learning problem is balanced, which is crucial for model accuracy and fairness. Grinsztajn et al. (2022)

Contributor Guidelines

Contributors are encouraged to reference relevant literature when making contributions. Please ensure that the appropriate citations are included in this document and the codebase where applicable.