Merge pull request #224 from WenjieDu/dev

Enable hyperparameter tuning with NNI, fix dependency error in testing_CI, and update docs
WenjieDu · Oct 31, 2023 · b4cf5d8 · b4cf5d8
2 parents cf9b627 + ad9a8b4
commit b4cf5d8
Show file tree

Hide file tree

Showing 21 changed files with 378 additions and 243 deletions.
diff --git a/.github/workflows/testing_ci.yml b/.github/workflows/testing_ci.yml
@@ -15,7 +15,7 @@ jobs:
         runs-on: ${{ matrix.os }}
         defaults:
             run:
-                shell: bash {0}
+                shell: bash
         strategy:
             fail-fast: false
             matrix:
@@ -51,15 +51,15 @@ jobs:
               run: |
                   which python
                   which pip
-                  python -m pip install --upgrade pip
+                  pip install --upgrade pip
                   pip install torch==${{ matrix.torch-version }} -f https://download.pytorch.org/whl/cpu
                   python -c "import torch; print('PyTorch:', torch.__version__)"
 
             - name: Install other dependencies
               run: |
-                  pip install pypots
+                  pip install -r requirements.txt
                   pip install torch-geometric==2.3.1 torch-scatter==2.1.1 torch-sparse==0.6.17 -f "https://data.pyg.org/whl/torch-${{ matrix.torch-version }}+cpu.html"
-                  pip install -e ".[dev]"
+                  pip install pypots[dev]
 
             - name: Fetch the test environment details
               run: |

diff --git a/README.md b/README.md
@@ -4,7 +4,7 @@
 
 <h2 align="center">Welcome to PyPOTS</h2>
 
-<p align="center"><i>a Python toolbox for data mining on Partially-Observed Time Series</i></p>
+<p align="center"><i>a Python toolbox for machine learning on Partially-Observed Time Series</i></p>
 
 <p align="center">
     <a href="https://docs.pypots.com/en/latest/install.html#reasons-of-version-limitations-on-dependencies">
@@ -55,32 +55,21 @@
 ⦿ `Motivation`: Due to all kinds of reasons like failure of collection sensors, communication error,
 and unexpected malfunction, missing values are common to see in time series from the real-world environment.
 This makes partially-observed time series (POTS) a pervasive problem in open-world modeling and prevents advanced
-data analysis. Although this problem is important, the area of data mining on POTS still lacks a dedicated toolkit.
+data analysis. Although this problem is important, the area of machine learning on POTS still lacks a dedicated toolkit.
 PyPOTS is created to fill in this blank.
 
-⦿ `Mission`: PyPOTS (pronounced "Pie Pots") is born to become a handy toolbox that is going to make data mining on POTS easy rather than
+⦿ `Mission`: PyPOTS (pronounced "Pie Pots") is born to become a handy toolbox that is going to make machine learning on POTS easy rather than
 tedious, to help engineers and researchers focus more on the core problems in their hands rather than on how to deal
-with the missing parts in their data. PyPOTS will keep integrating classical and the latest state-of-the-art data mining
+with the missing parts in their data. PyPOTS will keep integrating classical and the latest state-of-the-art machine learning
 algorithms for partially-observed multivariate time series. For sure, besides various algorithms, PyPOTS is going to
 have unified APIs together with detailed documentation and interactive examples across algorithms as tutorials.
 
 🤗 **Please** star this repo to help others notice PyPOTS if you think it is a useful toolkit.
 **Please** properly [cite PyPOTS](https://github.com/WenjieDu/PyPOTS#-citing-pypots) in your publications
 if it helps with your research. This really means a lot to our open-source research. Thank you!
 
-<a href="https://github.com/WenjieDu/TSDB">
-    <img src="https://pypots.com/figs/pypots_logos/TSDB_logo_FFBG.svg?sanitize=true" align="left" width="160" alt="TSDB logo"/>
-</a>
-
-To make various open-source time-series datasets readily available to our users,
-PyPOTS gets supported by its subproject [TSDB (Time-Series Data Beans)](https://github.com/WenjieDu/TSDB),
-a toolbox making loading time-series datasets super easy!
-
-Visit [TSDB](https://github.com/WenjieDu/TSDB) right now to know more about this handy tool 🛠!
-It now supports a total of 168 open-source datasets.
-<br clear="left">
-
 The rest of this readme file is organized as follows:
+[**❖ PyPOTS Ecosystem**](#-pypots-ecosystem),
 [**❖ Installation**](#-installation),
 [**❖ Usage**](#-usage),
 [**❖ Available Algorithms**](#-available-algorithms),
@@ -89,6 +78,40 @@ The rest of this readme file is organized as follows:
 [**❖ Community**](#-community).
 
 
+## ❖ PyPOTS Ecosystem
+At PyPOTS, time series datasets are taken as coffee beans, and POTS datasets are incomplete coffee beans with missing parts that have their own meanings.
+As you can see, there is a coffee pot in the PyPOTS logo.
+
+<a href="https://github.com/WenjieDu/TSDB">
+    <img src="https://pypots.com/figs/pypots_logos/TSDB_logo_FFBG.svg" align="left" width="130" alt="TSDB logo"/>
+</a>
+
+👈 To make various open-source time-series datasets readily available to our users,
+PyPOTS gets supported by its ecosystem library <i>Time Series Data Beans (TSDB)</i>, a toolbox making loading time-series datasets super easy!
+Visit [TSDB](https://github.com/WenjieDu/TSDB) right now to know more about this handy tool 🛠, and it now supports a total of 168 open-source datasets!
+
+<a href="https://github.com/WenjieDu/PyGrinder">
+    <img src="https://pypots.com/figs/pypots_logos/PyGrinder_logo_FFBG.svg" align="right" width="130" alt="PyGrinder logo"/>
+</a>
+
+👉 To simulate the real-world data beans with missingness, the ecosystem library [PyGrinder](https://github.com/WenjieDu/PyGrinder), 
+a toolkit helping grind your coffee beans into incomplete ones, is created. Missing patterns fall into three categories according to Robin's theory[^13]: 
+MCAR (missing completely at random), MAR (missing at random), and MNAR (missing not at random). 
+PyGrinder supports all of them and additional functionalities related to missingness. 
+With PyGrinder, you can introduce synthetic missing values into your datasets with a single line of code.
+
+<a href="https://github.com/WenjieDu/BrewPOTS">
+    <img src="https://pypots.com/figs/pypots_logos/BrewPOTS_logo_FFBG.svg" align="left" width="130" alt="BrewPOTS logo"/>
+</a>
+
+👈 Now we have the beans, the grinder, and the pot, how to brew us a cup of coffee? Tutorials are necessary!
+Considering the future workload, PyPOTS tutorials is released in a single repo,
+and you can find them in [BrewPOTS](https://github.com/WenjieDu/BrewPOTS).
+Take a look at it now, and learn how to brew your POTS datasets!
+
+☕️ Enjoy it and have fun!
+
+
 ## ❖ Installation
 You can refer to [the installation instruction](https://docs.pypots.com/en/latest/install.html) in PyPOTS documentation for a guideline with more details.
 
@@ -108,24 +131,15 @@ conda update  -c conda-forge pypots  # update pypots to the latest version
 Alternatively, you can install from the latest source code with the latest features but may be not officially released yet:
 > pip install https://github.com/WenjieDu/PyPOTS/archive/main.zip
 
-
 ## ❖ Usage
-<a href="https://github.com/WenjieDu/BrewPOTS">
-    <img src="https://pypots.com/figs/pypots_logos/BrewPOTS_logo_FFBG.svg?sanitize=true" align="left" width="160" alt="BrewPOTS logo"/>
-</a>
-
-PyPOTS tutorials have been released. Considering the future workload, I separate the tutorials into a single repo,
-and you can find them in [BrewPOTS](https://github.com/WenjieDu/BrewPOTS).
-Take a look at it now, and learn how to brew your POTS datasets!
-
-You can also find a simple and quick-start tutorial notebook on Google Colab with
-[this link](https://colab.research.google.com/drive/1HEFjylEy05-r47jRy0H9jiS_WhD0UWmQ?usp=sharing).
+Besides [BrewPOTS](https://github.com/WenjieDu/BrewPOTS), you can also find a simple and quick-start tutorial notebook 
+on Google Colab with [this link](https://colab.research.google.com/drive/1HEFjylEy05-r47jRy0H9jiS_WhD0UWmQ?usp=sharing).
 If you have further questions, please refer to PyPOTS documentation [docs.pypots.com](https://docs.pypots.com).
-Besides, you can also [raise an issue](https://github.com/WenjieDu/PyPOTS/issues) or [ask in our community](#-community).
+You can also [raise an issue](https://github.com/WenjieDu/PyPOTS/issues) or [ask in our community](#-community).
 
 We present you a usage example of imputing missing values in time series with PyPOTS below, you can click it to view.
 
-<details>
+<details open>
 <summary><b>Click here to see an example applying SAITS on PhysioNet2012 for imputation:</b></summary>
 
 ``` python
@@ -198,7 +212,7 @@ Here is [an incomplete list of them](https://scholar.google.com/scholar?as_ylo=2
 
 ``` bibtex
 @article{du2023PyPOTS,
-title={{PyPOTS: a Python toolbox for data mining on Partially-Observed Time Series}},
+title={{PyPOTS: a Python toolbox for machine learning on Partially-Observed Time Series}},
 author={Wenjie Du},
 year={2023},
 eprint={2305.18811},
@@ -210,14 +224,14 @@ doi={10.48550/arXiv.2305.18811},
 ```
 
 > Wenjie Du. (2023).
-> PyPOTS: a Python toolbox for data mining on Partially-Observed Time Series.
+> PyPOTS: a Python toolbox for machine learning on Partially-Observed Time Series.
 > arXiv, abs/2305.18811.https://arxiv.org/abs/2305.18811
 
 or
 
 ``` bibtex
 @inproceedings{du2023PyPOTS,
-title={{PyPOTS: a Python toolbox for data mining on Partially-Observed Time Series}},
+title={{PyPOTS: a Python toolbox for machine learning on Partially-Observed Time Series}},
 booktitle={9th SIGKDD workshop on Mining and Learning from Time Series (MiLeTS'23)},
 author={Wenjie Du},
 year={2023},
@@ -226,7 +240,7 @@ url={https://arxiv.org/abs/2305.18811},
 ```
 
 > Wenjie Du. (2023).
-> PyPOTS: a Python toolbox for data mining on Partially-Observed Time Series.
+> PyPOTS: a Python toolbox for machine learning on Partially-Observed Time Series.
 > In *9th SIGKDD workshop on Mining and Learning from Time Series (MiLeTS'23)*. https://arxiv.org/abs/2305.18811
 
 
@@ -288,6 +302,7 @@ PyPOTS community is open, transparent, and surely friendly. Let's work together
 [^10]: Miao, X., Wu, Y., Wang, J., Gao, Y., Mao, X., & Yin, J. (2021). [Generative Semi-supervised Learning for Multivariate Time Series Imputation](https://ojs.aaai.org/index.php/AAAI/article/view/17086). *AAAI 2021*.
 [^11]: Fortuin, V., Baranchuk, D., Raetsch, G. & Mandt, S. (2020). [GP-VAE: Deep Probabilistic Time Series Imputation](https://proceedings.mlr.press/v108/fortuin20a.html). *AISTATS 2020*.
 [^12]: Tashiro, Y., Song, J., Song, Y., & Ermon, S. (2021). [CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation](https://proceedings.neurips.cc/paper/2021/hash/cfe8504bda37b575c70ee1a8276f3486-Abstract.html). *NeurIPS 2021*.
+[^13]: Rubin, D. B. (1976). [Inference and missing data](https://academic.oup.com/biomet/article-abstract/63/3/581/270932). *Biometrika*, 63(3), 581-592.
 
 
 <details>

diff --git a/docs/index.rst b/docs/index.rst
@@ -82,23 +82,55 @@ Welcome to PyPOTS docs!
 **Please** properly `cite PyPOTS <https://docs.pypots.com/en/latest/milestones.html#citing-pypots>`_ in your publications
 if it helps with your research. This really means a lot to our open-source research. Thank you!
 
-.. image:: https://pypots.com/figs/pypots_logos/TSDB_logo_FFBG.svg?sanitize=true
-   :width: 170
-   :alt: TSDB
+The rest of this readme file is organized as follows:
+`❖ PyPOTS Ecosystem <#id1>`_,
+`❖ Installation <#id2>`_,
+`❖ Usage <#id4>`_,
+`❖ Available Algorithms <#id6>`_,
+`❖ Citing PyPOTS <#id19>`_,
+`❖ Contribution <#id20>`_,
+`❖ Community <#id21>`_.
+
+
+❖ PyPOTS Ecosystem
+^^^^^^^^^^^^^^^^^^^
+At PyPOTS, time series datasets are taken as coffee beans, and POTS datasets are incomplete coffee beans with missing parts that have their own meanings.
+As you can see, there is a coffee pot in the PyPOTS logo.
+
+.. image:: https://pypots.com/figs/pypots_logos/TSDB_logo_FFBG.svg
+   :width: 130
+   :alt: TSDB logo
    :align: left
    :target: https://github.com/WenjieDu/TSDB
 
-To make various open-source time-series datasets readily available to our users, PyPOTS gets supported by its sub-project `TSDB (Time-Series Data Beans) <https://github.com/WenjieDu/TSDB>`_, a toolbox making loading time-series datasets super easy!
+👈 To make various open-source time-series datasets readily available to our users,
+PyPOTS gets supported by its ecosystem library <i>Time Series Data Beans (TSDB)</i>, a toolbox making loading time-series datasets super easy!
+Visit `TSDB <https://github.com/WenjieDu/TSDB>`_ right now to know more about this handy tool 🛠, and it now supports a total of 168 open-source datasets!
+
+.. image:: https://pypots.com/figs/pypots_logos/PyGrinder_logo_FFBG.svg
+   :width: 130
+   :alt: PyGrinder logo
+   :align: right
+   :target: https://github.com/WenjieDu/PyGrinder
 
-Visit `TSDB <https://github.com/WenjieDu/TSDB>`_ right now to know more about this handy tool 🛠! It now supports a total of 168 open-source datasets.
+👉 To simulate the real-world data beans with missingness, the ecosystem library `PyGrinder <https://github.com/WenjieDu/PyGrinder>`_,
+a toolkit helping grind your coffee beans into incomplete ones, is created. Missing patterns fall into three categories according to Robin's theory:cite:`rubin1976missing`:
+MCAR (missing completely at random), MAR (missing at random), and MNAR (missing not at random).
+PyGrinder supports all of them and additional functionalities related to missingness.
+With PyGrinder, you can introduce synthetic missing values into your datasets with a single line of code.
 
-The rest of this readme file is organized as follows:
-`❖ Installation <#id1>`_,
-`❖ Usage <#id3>`_,
-`❖ Available Algorithms <#id4>`_,
-`❖ Citing PyPOTS <#id14>`_,
-`❖ Contribution <#id15>`_,
-`❖ Community <#id16>`_.
+.. image:: https://pypots.com/figs/pypots_logos/BrewPOTS_logo_FFBG.svg
+   :width: 130
+   :alt: BrewPOTS logo
+   :align: left
+   :target: https://github.com/WenjieDu/BrewPOTS
+
+👈 Now we have the beans, the grinder, and the pot, how to brew us a cup of coffee? Tutorials are necessary!
+Considering the future workload, PyPOTS tutorials is released in a single repo,
+and you can find them in `BrewPOTS <https://github.com/WenjieDu/BrewPOTS>`_.
+Take a look at it now, and learn how to brew your POTS datasets!
+
+☕️ Enjoy it and have fun!
 
 
 ❖ Installation
@@ -110,18 +142,12 @@ Refer to the page `Installation <install.html>`_ to see different ways of instal
 
 ❖ Usage
 ^^^^^^^^
-.. image:: https://pypots.com/figs/pypots_logos/BrewPOTS_logo_FFBG.svg?sanitize=true
-   :width: 160
-   :alt: BrewPOTS logo
-   :align: left
-   :target: https://github.com/WenjieDu/BrewPOTS
-
-PyPOTS tutorials have been released. Considering the future workload, I separate the tutorials into a single repo,
-and you can find them in `BrewPOTS <https://github.com/WenjieDu/BrewPOTS>`_.
-Take a look at it now, and brew your POTS dataset into a cup of coffee!
+Besides `BrewPOTS <https://github.com/WenjieDu/BrewPOTS>`_, you can also find a simple and quick-start tutorial notebook
+on Google Colab with `this link <https://colab.research.google.com/drive/1HEFjylEy05-r47jRy0H9jiS_WhD0UWmQ?usp=sharing>`_.
+You can also `raise an issue <https://github.com/WenjieDu/PyPOTS/issues>`_ or `ask in our community <#id21>`_.
 
-If you have further questions, please refer to PyPOTS documentation `docs.pypots.com <https://docs.pypots.com>`_.
-Besides, you can also `raise an issue <https://github.com/WenjieDu/PyPOTS/issues>`_ or `ask in our community <#id14>`_.
+Additionally, we present you a usage example of imputing missing values in time series with PyPOTS in
+`Section Quick-start Examples <https://docs.pypots.com/en/latest/examples.html>`_, you can click it to view.
 
 
 ❖ Available Algorithms

diff --git a/docs/references.bib b/docs/references.bib
@@ -445,3 +445,16 @@ @inproceedings{tashiro2021csdi
 year={2021},
 url={https://openreview.net/forum?id=VzuIzbRDrum}
 }
+
+@article{rubin1976missing,
+ISSN = {00063444},
+URL = {http://www.jstor.org/stable/2335739},
+author = {Donald B. Rubin},
+journal = {Biometrika},
+number = {3},
+pages = {581--592},
+publisher = {[Oxford University Press, Biometrika Trust]},
+title = {Inference and Missing Data},
+volume = {63},
+year = {1976}
+}
diff --git a/environment-dev.yml b/environment-dev.yml
@@ -46,5 +46,7 @@ dependencies:
     - conda-forge::jupyterlab
 
     - pip:
-          # doc
-          - sphinxcontrib-gtagjs
+        # doc
+        - sphinxcontrib-gtagjs
+        # hyperparameter tuning
+        - nni
diff --git a/pypots/__init__.py b/pypots/__init__.py
@@ -25,6 +25,8 @@
 __version__ = "0.1.4"
 
 
+from . import imputation, classification, clustering, forecasting, optim, data, utils
+
 __all__ = [
     "imputation",
     "classification",

diff --git a/pypots/classification/base.py b/pypots/classification/base.py
@@ -6,6 +6,7 @@
 # License: GPL-v3
 
 
+import os
 from abc import abstractmethod
 from typing import Optional, Union
 
@@ -16,6 +17,11 @@
 from ..base import BaseModel, BaseNNModel
 from ..utils.logging import logger
 
+try:
+    import nni
+except ImportError:
+    pass
+
 
 class BaseClassifier(BaseModel):
     """The abstract class for all PyPOTS classification models.
@@ -332,11 +338,18 @@ def _train_model(
                     )
                 else:
                     self.patience -= 1
-                    if self.patience == 0:
-                        logger.info(
-                            "Exceeded the training patience. Terminating the training procedure..."
-                        )
-                        break
+
+                if os.getenv("enable_tuning", False):
+                    nni.report_intermediate_result(mean_loss)
+                    if epoch == self.epochs - 1 or self.patience == 0:
+                        nni.report_final_result(self.best_loss)
+
+                if self.patience == 0:
+                    logger.info(
+                        "Exceeded the training patience. Terminating the training procedure..."
+                    )
+                    break
+
         except Exception as e:
             logger.error(f"Exception: {e}")
             if self.best_model_dict is None:

diff --git a/pypots/classification/grud/modules/__init__.py b/pypots/classification/grud/modules/__init__.py
@@ -6,9 +6,7 @@
 # License: GLP-v3
 
 from .core import _GRUD
-from pypots.modules.rnn import TemporalDecay
 
 __all__ = [
     "_GRUD",
-    "TemporalDecay",
 ]
diff --git a/pypots/classification/grud/modules/core.py b/pypots/classification/grud/modules/core.py
@@ -16,7 +16,7 @@
 import torch.nn as nn
 import torch.nn.functional as F
 
-from pypots.modules.rnn import TemporalDecay
+from ....modules.rnn import TemporalDecay
 
 
 class _GRUD(nn.Module):

diff --git a/pypots/cli/pypots_cli.py b/pypots/cli/pypots_cli.py
@@ -10,6 +10,7 @@
 from .dev import DevCommand
 from .doc import DocCommand
 from .env import EnvCommand
+from .tuning import TuningCommand
 
 
 def main():
@@ -22,6 +23,7 @@ def main():
     DevCommand.register_subcommand(commands_parser)
     DocCommand.register_subcommand(commands_parser)
     EnvCommand.register_subcommand(commands_parser)
+    TuningCommand.register_subcommand(commands_parser)
 
     # parse all arguments
     args = parser.parse_args()