Merge pull request #229 from WenjieDu/dev

Fix a bug in CRLI, switch to BSD-3 license
WenjieDu · Nov 6, 2023 · 7577570 · 7577570
2 parents b4cf5d8 + e59ef4f
commit 7577570
Show file tree

Hide file tree

Showing 190 changed files with 284 additions and 916 deletions.
diff --git a/LICENSE b/LICENSE
diff --git a/README.md b/README.md
@@ -17,7 +17,7 @@
         <img alt="the latest release version" src="https://img.shields.io/github/v/release/wenjiedu/pypots?color=EE781F&include_prereleases&label=Release&logo=github&logoColor=white">
     </a>
     <a href="https://github.com/WenjieDu/PyPOTS/blob/main/LICENSE">
-        <img alt="GPL-v3 license" src="https://img.shields.io/badge/License-GPL--v3-E9BB41?logo=opensourceinitiative&logoColor=white">
+        <img alt="BSD-3 license" src="https://img.shields.io/badge/License-BSD--3-E9BB41?logo=opensourceinitiative&logoColor=white">
     </a>
     <a href="https://github.com/WenjieDu/PyPOTS/blob/main/README.md#-community">
         <img alt="Community" src="https://img.shields.io/badge/join_us-community!-C8A062">
@@ -79,37 +79,39 @@ The rest of this readme file is organized as follows:
 
 
 ## ❖ PyPOTS Ecosystem
-At PyPOTS, time series datasets are taken as coffee beans, and POTS datasets are incomplete coffee beans with missing parts that have their own meanings.
+At PyPOTS, things are related to coffee, which we're familiar with. Yes, this is a coffee universe!
 As you can see, there is a coffee pot in the PyPOTS logo.
+And what else? Please read on ;-)
 
 <a href="https://github.com/WenjieDu/TSDB">
-    <img src="https://pypots.com/figs/pypots_logos/TSDB_logo_FFBG.svg" align="left" width="130" alt="TSDB logo"/>
+    <img src="https://pypots.com/figs/pypots_logos/TSDB_logo_FFBG.svg" align="left" width="140" alt="TSDB logo"/>
 </a>
 
-👈 To make various open-source time-series datasets readily available to our users,
-PyPOTS gets supported by its ecosystem library <i>Time Series Data Beans (TSDB)</i>, a toolbox making loading time-series datasets super easy!
+👈 Time series datasets are taken as coffee beans at PyPOTS, and POTS datasets are incomplete coffee beans with missing parts that have their own meanings.
+To make various public time-series datasets readily available to users,
+<i>Time Series Data Beans (TSDB)</i> is created to make loading time-series datasets super easy!
 Visit [TSDB](https://github.com/WenjieDu/TSDB) right now to know more about this handy tool 🛠, and it now supports a total of 168 open-source datasets!
 
 <a href="https://github.com/WenjieDu/PyGrinder">
-    <img src="https://pypots.com/figs/pypots_logos/PyGrinder_logo_FFBG.svg" align="right" width="130" alt="PyGrinder logo"/>
+    <img src="https://pypots.com/figs/pypots_logos/PyGrinder_logo_FFBG.svg" align="right" width="140" alt="PyGrinder logo"/>
 </a>
 
-👉 To simulate the real-world data beans with missingness, the ecosystem library [PyGrinder](https://github.com/WenjieDu/PyGrinder), 
-a toolkit helping grind your coffee beans into incomplete ones, is created. Missing patterns fall into three categories according to Robin's theory[^13]: 
-MCAR (missing completely at random), MAR (missing at random), and MNAR (missing not at random). 
-PyGrinder supports all of them and additional functionalities related to missingness. 
+👉 To simulate the real-world data beans with missingness, the ecosystem library [PyGrinder](https://github.com/WenjieDu/PyGrinder),
+a toolkit helping grind your coffee beans into incomplete ones, is created. Missing patterns fall into three categories according to Robin's theory[^13]:
+MCAR (missing completely at random), MAR (missing at random), and MNAR (missing not at random).
+PyGrinder supports all of them and additional functionalities related to missingness.
 With PyGrinder, you can introduce synthetic missing values into your datasets with a single line of code.
 
 <a href="https://github.com/WenjieDu/BrewPOTS">
-    <img src="https://pypots.com/figs/pypots_logos/BrewPOTS_logo_FFBG.svg" align="left" width="130" alt="BrewPOTS logo"/>
+    <img src="https://pypots.com/figs/pypots_logos/BrewPOTS_logo_FFBG.svg" align="left" width="140" alt="BrewPOTS logo"/>
 </a>
 
 👈 Now we have the beans, the grinder, and the pot, how to brew us a cup of coffee? Tutorials are necessary!
-Considering the future workload, PyPOTS tutorials is released in a single repo,
+Considering the future workload, PyPOTS tutorials are released in a single repo,
 and you can find them in [BrewPOTS](https://github.com/WenjieDu/BrewPOTS).
 Take a look at it now, and learn how to brew your POTS datasets!
 
-☕️ Enjoy it and have fun!
+☕️ Welcome to the universe of PyPOTS. Enjoy it and have fun!
 
 
 ## ❖ Installation
@@ -131,8 +133,9 @@ conda update  -c conda-forge pypots  # update pypots to the latest version
 Alternatively, you can install from the latest source code with the latest features but may be not officially released yet:
 > pip install https://github.com/WenjieDu/PyPOTS/archive/main.zip
 
+
 ## ❖ Usage
-Besides [BrewPOTS](https://github.com/WenjieDu/BrewPOTS), you can also find a simple and quick-start tutorial notebook 
+Besides [BrewPOTS](https://github.com/WenjieDu/BrewPOTS), you can also find a simple and quick-start tutorial notebook
 on Google Colab with [this link](https://colab.research.google.com/drive/1HEFjylEy05-r47jRy0H9jiS_WhD0UWmQ?usp=sharing).
 If you have further questions, please refer to PyPOTS documentation [docs.pypots.com](https://docs.pypots.com).
 You can also [raise an issue](https://github.com/WenjieDu/PyPOTS/issues) or [ask in our community](#-community).
@@ -162,7 +165,8 @@ dataset = {"X": X}
 print(dataset["X"].shape)  # (11988, 48, 37), 11988 samples, 48 time steps, 37 features
 # Model training. This is PyPOTS showtime.
 saits = SAITS(n_steps=48, n_features=37, n_layers=2, d_model=256, d_inner=128, n_heads=4, d_k=64, d_v=64, dropout=0.1, epochs=10)
-saits.fit(dataset)  # train the model. Here I use the whole dataset as the training set, because ground truth is not visible to the model.
+# Here I use the whole dataset as the training set because ground truth is not visible to the model, you can also split it into train/val/test sets
+saits.fit(dataset)
 imputation = saits.impute(dataset)  # impute the originally-missing values and artificially-missing values
 mae = cal_mae(imputation, X_intact, indicating_mask)  # calculate mean absolute error on the ground truth (artificially-missing values)
 ```
@@ -174,12 +178,12 @@ PyPOTS supports imputation, classification, clustering, and forecasting tasks on
 
 |   ***`Imputation`***   |     🚥      |                                                                                        🚥                                                                                         |    🚥    |
 |:----------------------:|:-----------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:--------:|
-|        **Type**        |  **Abbr.**  |                                                                    **Full name of the algorithm/model/paper**                                                                     | **Year** |
+|        **Type**        |  **Abbr.**  |                                                                       **Full name of the algorithm/model**                                                                        | **Year** |
 |       Neural Net       |    SAITS    |                                                               Self-Attention-based Imputation for Time Series [^1]                                                                |   2023   |
 |       Neural Net       | Transformer | Attention is All you Need [^2];<br>Self-Attention-based Imputation for Time Series [^1];<br><sub>Note: proposed in [^2], and re-implemented as an imputation model in [^1].</sub> |   2017   |
 |       Neural Net       |    CSDI     |                                              Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation [^12]                                              |   2021   |
-|       Neural Net       |   US-GAN    |                                                 Generative Semi-supervised Learning for Multivariate Time Series Imputation [^10]                                                 |   2021   |
-|       Neural Net       |   GP-VAE    |                                                              GP-VAE: Deep Probabilistic Time Series Imputation [^11]                                                              |   2020   |
+|       Neural Net       |   US-GAN    |                                                          Unsupervised GAN for Multivariate Time Series Imputation [^10]                                                           |   2021   |
+|       Neural Net       |   GP-VAE    |                                                                   Gaussian Process Variational Autoencoder [^11]                                                                  |   2020   |
 |       Neural Net       |    BRITS    |                                                              Bidirectional Recurrent Imputation for Time Series [^3]                                                              |   2018   |
 |       Neural Net       |    M-RNN    |                                                                  Multi-directional Recurrent Neural Network [^9]                                                                  |   2019   |
 |         Naive          |    LOCF     |                                                                         Last Observation Carried Forward                                                                          |    -     |
@@ -212,7 +216,7 @@ Here is [an incomplete list of them](https://scholar.google.com/scholar?as_ylo=2
 
 ``` bibtex
 @article{du2023PyPOTS,
-title={{PyPOTS: a Python toolbox for machine learning on Partially-Observed Time Series}},
+title={{PyPOTS: a Python toolbox for data mining on Partially-Observed Time Series}},
 author={Wenjie Du},
 year={2023},
 eprint={2305.18811},
@@ -224,14 +228,14 @@ doi={10.48550/arXiv.2305.18811},
 ```
 
 > Wenjie Du. (2023).
-> PyPOTS: a Python toolbox for machine learning on Partially-Observed Time Series.
+> PyPOTS: a Python toolbox for data mining on Partially-Observed Time Series.
 > arXiv, abs/2305.18811.https://arxiv.org/abs/2305.18811
 
 or
 
 ``` bibtex
 @inproceedings{du2023PyPOTS,
-title={{PyPOTS: a Python toolbox for machine learning on Partially-Observed Time Series}},
+title={{PyPOTS: a Python toolbox for data mining on Partially-Observed Time Series}},
 booktitle={9th SIGKDD workshop on Mining and Learning from Time Series (MiLeTS'23)},
 author={Wenjie Du},
 year={2023},
@@ -240,7 +244,7 @@ url={https://arxiv.org/abs/2305.18811},
 ```
 
 > Wenjie Du. (2023).
-> PyPOTS: a Python toolbox for machine learning on Partially-Observed Time Series.
+> PyPOTS: a Python toolbox for data mining on Partially-Observed Time Series.
 > In *9th SIGKDD workshop on Mining and Learning from Time Series (MiLeTS'23)*. https://arxiv.org/abs/2305.18811
 
 
@@ -268,16 +272,17 @@ Your star is your recognition to PyPOTS, and it matters!
     </i></b>
 </summary>
 <a href="https://github.com/WenjieDu/PyPOTS/stargazers">
-    <img alt="PyPOTS stargazers" src="https://reporoster.com/stars/dark/WenjieDu/PyPOTS">
+    <img alt="PyPOTS stargazers" src="http://reporoster.com/stars/dark/WenjieDu/PyPOTS">
 </a>
 <br>
 <a href="https://github.com/WenjieDu/PyPOTS/network/members">
-    <img alt="PyPOTS forkers" src="https://reporoster.com/forks/dark/WenjieDu/PyPOTS">
+    <img alt="PyPOTS forkers" src="http://reporoster.com/forks/dark/WenjieDu/PyPOTS">
 </a>
 </details>
 
 👀 Check out a full list of our users' affiliations [on PyPOTS website here](https://pypots.com/users/)!
 
+
 ## ❖ Community
 We care about the feedback from our users, so we're building PyPOTS community on
 
@@ -289,6 +294,7 @@ We care about the feedback from our users, so we're building PyPOTS community on
 If you have any suggestions or want to contribute ideas or share time-series related papers, join us and tell.
 PyPOTS community is open, transparent, and surely friendly. Let's work together to build and improve PyPOTS!
 
+
 [//]: # (Use APA reference style below)
 [^1]: Du, W., Cote, D., & Liu, Y. (2023). [SAITS: Self-Attention-based Imputation for Time Series](https://doi.org/10.1016/j.eswa.2023.119619). *Expert systems with applications*.
 [^2]: Vaswani, A., Shazeer, N.M., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., & Polosukhin, I. (2017). [Attention is All you Need](https://papers.nips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html). *NeurIPS 2017*.
@@ -302,7 +308,7 @@ PyPOTS community is open, transparent, and surely friendly. Let's work together
 [^10]: Miao, X., Wu, Y., Wang, J., Gao, Y., Mao, X., & Yin, J. (2021). [Generative Semi-supervised Learning for Multivariate Time Series Imputation](https://ojs.aaai.org/index.php/AAAI/article/view/17086). *AAAI 2021*.
 [^11]: Fortuin, V., Baranchuk, D., Raetsch, G. & Mandt, S. (2020). [GP-VAE: Deep Probabilistic Time Series Imputation](https://proceedings.mlr.press/v108/fortuin20a.html). *AISTATS 2020*.
 [^12]: Tashiro, Y., Song, J., Song, Y., & Ermon, S. (2021). [CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation](https://proceedings.neurips.cc/paper/2021/hash/cfe8504bda37b575c70ee1a8276f3486-Abstract.html). *NeurIPS 2021*.
-[^13]: Rubin, D. B. (1976). [Inference and missing data](https://academic.oup.com/biomet/article-abstract/63/3/581/270932). *Biometrika*, 63(3), 581-592.
+[^13]: Rubin, D. B. (1976). [Inference and missing data](https://academic.oup.com/biomet/article-abstract/63/3/581/270932). *Biometrika*.
 
 
 <details>