Skip to content

Commit

Permalink
fix task readme
Browse files Browse the repository at this point in the history
  • Loading branch information
WenWeiTHU committed Jun 5, 2024
1 parent 6845dc7 commit 1ce4bbd
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 10 deletions.
6 changes: 3 additions & 3 deletions scripts/anomaly_detection/README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# Anomaly Detection

Anomaly detection is vital in industry and operations. Previous methods ([Xu et al., 2021](https://arxiv.org/abs/2110.02642); [Wu et al., 2022](https://openreview.net/forum?id=ju_Uqw384Oq)) typically tackle the unsupervised scenario in a reconstructed- tive approach, where a model is trained to reconstruct the input series, and the output is regarded as the normal series. Based on our generative model, we cope with anomaly detection in a predictive approach, which utilizes the observed segments to predict the future segment, and the predicted segment will be established as the standard to be compared with the actual value received. Unlike the previous method requiring to collect time series of a period for reconstruction, our predictive approach allows for segment-level anomaly detection on the fly. Thus, the task is converted into a next token prediction task.
Anomaly detection is vital in industry and operations. Previous methods ([Xu et al., 2021](https://arxiv.org/abs/2110.02642); [Wu et al., 2022](https://openreview.net/forum?id=ju_Uqw384Oq)) typically tackle the unsupervised scenario in a reconstructedtive approach, where a model is trained to reconstruct the input series, and the output is regarded as the normal series. Based on our generative model, we cope with anomaly detection in a predictive approach, which utilizes the observed segments to predict the future segment, and the predicted segment will be established as the standard to be compared with the actual value received. Unlike the previous method requiring to collect time series of a period for reconstruction, our predictive approach allows for segment-level anomaly detection on the fly. Thus, the task is converted into a next token prediction task.


## Dataset

[UCR Anomaly Archive](https://arxiv.org/pdf/2009.13807) is evaluated as the benchmark. Download datasets: [Google Drive](https://drive.google.com/file/d/1yffcQBcMLasQcT7cdotjOVcg-2UKRarw/view?usp=drive_link) and [Tsinghua Cloud](https://cloud.tsinghua.edu.cn/f/6bc31f9a003b4d75a10b/).
[UCR Anomaly Archive](https://arxiv.org/pdf/2009.13807) is evaluated as the benchmark. We provide the processed dataset: [Google Drive](https://drive.google.com/file/d/1yffcQBcMLasQcT7cdotjOVcg-2UKRarw/view?usp=drive_link) and [Tsinghua Cloud](https://cloud.tsinghua.edu.cn/f/6bc31f9a003b4d75a10b/).


## Task Description
Expand All @@ -14,7 +14,7 @@ Anomaly detection is vital in industry and operations. Previous methods ([Xu et
<img src="../../figures/anomaly_detection_dataset.png" alt="Credit from UCR Anomaly Archive ([Wu & Keogh, 2021)" align=center />
</p>

We introduce UCR Anomaly Archive ([Wu & Keogh, 2021](https://arxiv.org/pdf/2009.13807)) that contains 250 tasks. In each task, a single normal time series is provided for training, and the model should locate the position of an anomaly in the test series. We first train a predictive model on the training set and calculate the MSE between the predicted series and ground truth on the test set. We use the MSE of all segments as the confidence level, the segments with higher than $\alpha$ quantile of confidence are labeled as potential positions of anomalies.
We introduce UCR Anomaly Archive ([Wu & Keogh, 2021](https://arxiv.org/pdf/2009.13807)) that contains 250 tasks. In each task, a single normal time series is provided for training, and the model should locate the position of an anomaly in the test series. We first train a predictive model on the training set and calculate the MSE between the predicted series and ground truth on the test set. We use the MSE of all segments as the confidence level, the segments with higher than an given quantile of confidence are labeled as potential positions of anomalies.

## Scripts

Expand Down
13 changes: 7 additions & 6 deletions scripts/forecast/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Large Time Series Model for Time Series Forecasting
# Time Series Forecasting

## Dataset

Expand All @@ -10,16 +10,17 @@ Time series forecasting is essential and presents challenges in real-world appli

## Task Description

### Few-shot
### Few-shot Generalization

To construct data-scarce scenarios, we perform retrieval with the uniform interval in the training split according to the sampling ratio and conduct random shuffling at the end of each epoch to train the model. We keep the same validation and testing sets of original downstream datasets and train the baseline model and Timer with the same set of training samples.

Set `--subset_rand_ratio` to decide the ratio of training samples in few-shot scenarios.

### Direct Multi-step (DMS) and Iterative Multi-step (IMS) Forecasting

GPT has the flexibility to address unfixed context length and excels at multi-step generation by iteratively sliding and enlarging input tokens while small time series models generally refrain from iterative multi-step forecasting to mitigate error accumulation.
GPT has the flexibility to address unfixed context length and excels at multi-step generation by iteratively sliding and enlarging input tokens. However. small time series models generally refrain from iterative multi-step forecasting to mitigate error accumulation.

Timer adopts autoregression, a.k.a. Iterative Multi-step (IMS) in time series forecasting. During inference, we concatenate the prediction with the previous lookback series and iteratively generate the next token until reaching the desired length. We also provide implementations of Direct Multi-step (DMS)) approach for typical encoder-only forecasters.
Timer adopts autoregression, a.k.a. Iterative Multi-step (IMS) in time series forecasting. During inference, we concatenate the prediction with the previous lookback series and iteratively generate the next token until reaching the desired length. We also implement Direct Multi-step (DMS)) approach for typical encoder-only forecasters.

Set `--use_ims` to evaluate decoder-only models in the IMS way. If the option is not activated, the script evaluates encoder-only models with a fair comparison.

Expand All @@ -28,8 +29,8 @@ Set `--use_ims` to evaluate decoder-only models in the IMS way. If the option is

To train with your time series dataset, you can try out the following steps:

1. Read through the ```CIDatasetBenchmark``` and ```CIAutoRegressionDatasetBenchmark```classes under the ```data_provider/data_loader``` folder, which provides the functionality to load and process time series files and train the model in DMS mode or IMS mode.
2. The file should be ```csv``` format with the first column containing the timestamp and the following columns containing the variates of time series.
1. Read through the ```CIDatasetBenchmark``` and ```CIAutoRegressionDatasetBenchmark```classes under the ```data_provider/data_loader``` folder, which provides the functionality to load and process time series files and evaluate models in DMS mode or IMS mode.
2. File should be ```csv``` format with the first column containing timestamps and the following columns containing the variates of time series.

## Scripts

Expand Down
2 changes: 1 addition & 1 deletion scripts/imputation/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Large Time Series Model for Time Series Segment Imputation
# Time Series Imputation

Imputation is ubiquitous in real-world applications, aiming to fill corrupted time series based on partially observed data. However, while various machine learning algorithms and simple linear interpolation can effectively cope with the corruptions randomly happening at the point level, real-world corruptions typically result from prolonged monitor shutdowns and require a continuous period of recovery. Consequently, imputation can be ever challenging when attempting to recover a span of time points encompassing intricate series variations.

Expand Down

0 comments on commit 1ce4bbd

Please sign in to comment.