diff --git a/scripts/anomaly_detection/README.md b/scripts/anomaly_detection/README.md index 28426ce..66e4515 100644 --- a/scripts/anomaly_detection/README.md +++ b/scripts/anomaly_detection/README.md @@ -1,11 +1,11 @@ # Anomaly Detection -Anomaly detection is vital in industry and operations. Previous methods ([Xu et al., 2021](https://arxiv.org/abs/2110.02642); [Wu et al., 2022](https://openreview.net/forum?id=ju_Uqw384Oq)) typically tackle the unsupervised scenario in a reconstructed- tive approach, where a model is trained to reconstruct the input series, and the output is regarded as the normal series. Based on our generative model, we cope with anomaly detection in a predictive approach, which utilizes the observed segments to predict the future segment, and the predicted segment will be established as the standard to be compared with the actual value received. Unlike the previous method requiring to collect time series of a period for reconstruction, our predictive approach allows for segment-level anomaly detection on the fly. Thus, the task is converted into a next token prediction task. +Anomaly detection is vital in industry and operations. Previous methods ([Xu et al., 2021](https://arxiv.org/abs/2110.02642); [Wu et al., 2022](https://openreview.net/forum?id=ju_Uqw384Oq)) typically tackle the unsupervised scenario in a reconstructedtive approach, where a model is trained to reconstruct the input series, and the output is regarded as the normal series. Based on our generative model, we cope with anomaly detection in a predictive approach, which utilizes the observed segments to predict the future segment, and the predicted segment will be established as the standard to be compared with the actual value received. Unlike the previous method requiring to collect time series of a period for reconstruction, our predictive approach allows for segment-level anomaly detection on the fly. Thus, the task is converted into a next token prediction task. ## Dataset -[UCR Anomaly Archive](https://arxiv.org/pdf/2009.13807) is evaluated as the benchmark. Download datasets: [Google Drive](https://drive.google.com/file/d/1yffcQBcMLasQcT7cdotjOVcg-2UKRarw/view?usp=drive_link) and [Tsinghua Cloud](https://cloud.tsinghua.edu.cn/f/6bc31f9a003b4d75a10b/). +[UCR Anomaly Archive](https://arxiv.org/pdf/2009.13807) is evaluated as the benchmark. We provide the processed dataset: [Google Drive](https://drive.google.com/file/d/1yffcQBcMLasQcT7cdotjOVcg-2UKRarw/view?usp=drive_link) and [Tsinghua Cloud](https://cloud.tsinghua.edu.cn/f/6bc31f9a003b4d75a10b/). ## Task Description @@ -14,7 +14,7 @@ Anomaly detection is vital in industry and operations. Previous methods ([Xu et Credit from UCR Anomaly Archive ([Wu & Keogh, 2021)

-We introduce UCR Anomaly Archive ([Wu & Keogh, 2021](https://arxiv.org/pdf/2009.13807)) that contains 250 tasks. In each task, a single normal time series is provided for training, and the model should locate the position of an anomaly in the test series. We first train a predictive model on the training set and calculate the MSE between the predicted series and ground truth on the test set. We use the MSE of all segments as the confidence level, the segments with higher than $\alpha$ quantile of confidence are labeled as potential positions of anomalies. +We introduce UCR Anomaly Archive ([Wu & Keogh, 2021](https://arxiv.org/pdf/2009.13807)) that contains 250 tasks. In each task, a single normal time series is provided for training, and the model should locate the position of an anomaly in the test series. We first train a predictive model on the training set and calculate the MSE between the predicted series and ground truth on the test set. We use the MSE of all segments as the confidence level, the segments with higher than an given quantile of confidence are labeled as potential positions of anomalies. ## Scripts diff --git a/scripts/forecast/README.md b/scripts/forecast/README.md index 949f00e..6bbcd29 100644 --- a/scripts/forecast/README.md +++ b/scripts/forecast/README.md @@ -1,4 +1,4 @@ -# Large Time Series Model for Time Series Forecasting +# Time Series Forecasting ## Dataset @@ -10,16 +10,17 @@ Time series forecasting is essential and presents challenges in real-world appli ## Task Description -### Few-shot +### Few-shot Generalization + To construct data-scarce scenarios, we perform retrieval with the uniform interval in the training split according to the sampling ratio and conduct random shuffling at the end of each epoch to train the model. We keep the same validation and testing sets of original downstream datasets and train the baseline model and Timer with the same set of training samples. Set `--subset_rand_ratio` to decide the ratio of training samples in few-shot scenarios. ### Direct Multi-step (DMS) and Iterative Multi-step (IMS) Forecasting -GPT has the flexibility to address unfixed context length and excels at multi-step generation by iteratively sliding and enlarging input tokens while small time series models generally refrain from iterative multi-step forecasting to mitigate error accumulation. +GPT has the flexibility to address unfixed context length and excels at multi-step generation by iteratively sliding and enlarging input tokens. However. small time series models generally refrain from iterative multi-step forecasting to mitigate error accumulation. -Timer adopts autoregression, a.k.a. Iterative Multi-step (IMS) in time series forecasting. During inference, we concatenate the prediction with the previous lookback series and iteratively generate the next token until reaching the desired length. We also provide implementations of Direct Multi-step (DMS)) approach for typical encoder-only forecasters. +Timer adopts autoregression, a.k.a. Iterative Multi-step (IMS) in time series forecasting. During inference, we concatenate the prediction with the previous lookback series and iteratively generate the next token until reaching the desired length. We also implement Direct Multi-step (DMS)) approach for typical encoder-only forecasters. Set `--use_ims` to evaluate decoder-only models in the IMS way. If the option is not activated, the script evaluates encoder-only models with a fair comparison. @@ -28,8 +29,8 @@ Set `--use_ims` to evaluate decoder-only models in the IMS way. If the option is To train with your time series dataset, you can try out the following steps: -1. Read through the ```CIDatasetBenchmark``` and ```CIAutoRegressionDatasetBenchmark```classes under the ```data_provider/data_loader``` folder, which provides the functionality to load and process time series files and train the model in DMS mode or IMS mode. -2. The file should be ```csv``` format with the first column containing the timestamp and the following columns containing the variates of time series. +1. Read through the ```CIDatasetBenchmark``` and ```CIAutoRegressionDatasetBenchmark```classes under the ```data_provider/data_loader``` folder, which provides the functionality to load and process time series files and evaluate models in DMS mode or IMS mode. +2. File should be ```csv``` format with the first column containing timestamps and the following columns containing the variates of time series. ## Scripts diff --git a/scripts/imputation/README.md b/scripts/imputation/README.md index 6edebfd..7b6df95 100644 --- a/scripts/imputation/README.md +++ b/scripts/imputation/README.md @@ -1,4 +1,4 @@ -# Large Time Series Model for Time Series Segment Imputation +# Time Series Imputation Imputation is ubiquitous in real-world applications, aiming to fill corrupted time series based on partially observed data. However, while various machine learning algorithms and simple linear interpolation can effectively cope with the corruptions randomly happening at the point level, real-world corruptions typically result from prolonged monitor shutdowns and require a continuous period of recovery. Consequently, imputation can be ever challenging when attempting to recover a span of time points encompassing intricate series variations.