Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wenjie Updates #4

Merged
merged 83 commits into from
Apr 11, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
425fc81
feat: enable auto reply on PRs created by new contributors;
WenjieDu Nov 29, 2022
144c4bf
feat: simplify requirements to speed up the installation process of P…
WenjieDu Dec 1, 2022
905ee65
feat: remove torch_geometric from the setup file as well to speed up …
WenjieDu Dec 1, 2022
3f119f8
doc: update README to add the usage example;
WenjieDu Dec 4, 2022
c9955a2
feat: print all outputs during test with pytest;
WenjieDu Dec 20, 2022
0d2f36a
Merge pull request #28 from WenjieDu/dev
WenjieDu Dec 20, 2022
fee98c0
feat: add MANIFEST.in to remove the test dir from the released package;
WenjieDu Dec 21, 2022
87ff2a1
fix: the bug of separating the code-coverage report;
WenjieDu Dec 21, 2022
2d478fd
fix: capture the error caused by singular matrix existence in VaDER;
WenjieDu Dec 21, 2022
154d014
doc: update the documentation;
WenjieDu Dec 21, 2022
196d1e9
doc: add the doc of all implemented modules;
WenjieDu Dec 25, 2022
601bd81
fix: add the dependencies of PyPOTS into the doc building requirement…
WenjieDu Dec 29, 2022
c53f6fb
doc: update README;
WenjieDu Jan 8, 2023
11e3ac7
Merge pull request #29 from WenjieDu/dev
WenjieDu Jan 13, 2023
f8da4f6
feat: add the lazy-loading strategy for BaseDataset;
WenjieDu Jan 16, 2023
456293a
doc: update README;
WenjieDu Feb 8, 2023
e0bb1b7
feat: add limitations on lib dependencies;
WenjieDu Jan 17, 2023
4116399
Merge pull request #33 from WenjieDu/dev
WenjieDu Feb 9, 2023
cf51ce2
feat: add class Logger to help present logs better;
WenjieDu Feb 12, 2023
7ec0f03
feat: replace print with logger;
WenjieDu Feb 12, 2023
5509128
feat: add the func create_dir_if_not_exist() in pypots.utils.files;
WenjieDu Feb 15, 2023
82126a2
fix: TypeError when using logger with mistake;
WenjieDu Feb 15, 2023
8e70636
refactor: update the logger;
WenjieDu Feb 15, 2023
cf0acde
feat: add the test cases for logging;
WenjieDu Feb 16, 2023
7c76e3a
feat: add the attribute __all__ into __init__ files;
WenjieDu Feb 16, 2023
8c584b5
doc: update README;
WenjieDu Feb 18, 2023
df2414b
feat: add the file lazy-loading strategy for classes derived from Bas…
WenjieDu Feb 19, 2023
831e9d4
doc: fix the reference ;
WenjieDu Feb 24, 2023
dc3c005
fix: update the dependencies;
WenjieDu Mar 9, 2023
818e7ef
Merge pull request #37 from WenjieDu/dev
WenjieDu Mar 20, 2023
e7b72bd
doc: update README to add pypots installation with conda;
WenjieDu Mar 28, 2023
0611df1
feat: separate the input data assembling functions of training, valid…
WenjieDu Mar 29, 2023
7cdd393
Merge pull request #38 from WenjieDu/dev
WenjieDu Mar 29, 2023
19c5bb3
doc: update the reference info;
WenjieDu Mar 29, 2023
343c8d8
Merge branch 'lazy_loading_dataset' into dev
WenjieDu Mar 30, 2023
3c56ce2
fix: imputation models applying MIT do not need use DatasetForMIT on …
WenjieDu Mar 30, 2023
5927909
fix: only import h5py when needed;
WenjieDu Mar 30, 2023
4a9c5be
feat: move check_input() to BaseDataset;
WenjieDu Mar 30, 2023
c71c8fa
fix: correct mistaken operator from & to ^;
WenjieDu Mar 30, 2023
af4586a
fix: turn imputation to numpy.ndarray in the validation stage;
WenjieDu Mar 30, 2023
fababb1
feat: update the data given and input logic to support loading datase…
WenjieDu Mar 30, 2023
7dfbf87
fix: bugs in Dataset classes' functions with lazy-loading strategy;
WenjieDu Mar 31, 2023
fdc1459
fix: update the dependencies;
WenjieDu Mar 31, 2023
ee5270a
feat: add testing cases for lazy-loading datasets;
WenjieDu Mar 31, 2023
8a4f682
doc: update README;
WenjieDu Mar 31, 2023
0fb57d4
feat: v0.0.10 is ready;
WenjieDu Mar 31, 2023
72eaf20
fix: running testing cases for forecasting models and lazy-loading da…
WenjieDu Mar 31, 2023
fa5f5b6
fix: running testing cases for logging;
WenjieDu Mar 31, 2023
e9aea74
fix: try to fix the BlockingIOError, see below message for details;
WenjieDu Mar 31, 2023
46fca41
refactor: test scripts;
WenjieDu Mar 31, 2023
13a7cd1
fix: use annotation @pytest.mark.xdist_group to help pytest-dist exec…
WenjieDu Mar 31, 2023
9ad9c7e
fix: fix some warnings while running VaDER;
WenjieDu Mar 31, 2023
e7bee57
fix: move dataset saving into test steps;
WenjieDu Mar 31, 2023
235c607
fix: the error file name of test_data.py;
WenjieDu Mar 31, 2023
f64dda9
Merge pull request #39 from WenjieDu/dev
WenjieDu Mar 31, 2023
f7fa13e
doc: update the documentation;
WenjieDu Apr 4, 2023
634c25a
doc: update the documentation;
WenjieDu Apr 4, 2023
3ac3185
Merge `dev` into `main` to update the documentation and add doc-gener…
WenjieDu Apr 4, 2023
8a856d8
refactor: preprocessing functions of specific dataset now move to mod…
WenjieDu Apr 6, 2023
88780e6
fix: solve the problem of circular import;
WenjieDu Apr 6, 2023
5def912
refactor: don't save data into h5 files if the datasets already exit;
WenjieDu Apr 6, 2023
6a103c7
feat: add issue templates of bug report, feature request, and model a…
WenjieDu Apr 7, 2023
4654961
Add issue templates (#41)
WenjieDu Apr 7, 2023
629ad6c
feat: turn the given device (str or torch.device) into torch.device;
WenjieDu Apr 7, 2023
e887390
feat: enable save training logs into `tb_file_saving_path` in BaseMod…
WenjieDu Apr 7, 2023
37b54ea
feat: enable set num_workers of DataLoader and typing annotation;
WenjieDu Apr 8, 2023
acd262e
feat: add typing annotations in the functions in `data` and `utils`;
WenjieDu Apr 8, 2023
034a298
feat: add python version 3.11 of all three platforms in the testing w…
WenjieDu Apr 8, 2023
11a7529
fix: numpy.float is deprecated;
WenjieDu Apr 8, 2023
ebe9cec
Merge branch 'main' into dev
WenjieDu Apr 8, 2023
fc01480
Decrease testing python version 3.11 to 3.10, and remove fixed depend…
WenjieDu Apr 8, 2023
4646f5c
Merge pull request #42 from WenjieDu/dev
WenjieDu Apr 8, 2023
823f0af
feat: add daily testing workflow;
WenjieDu Apr 9, 2023
5c6dfa4
feat: make imputation models val_X_intact and val_indicating_mask sho…
WenjieDu Apr 9, 2023
4aff213
fix: invalid attribute;
WenjieDu Apr 9, 2023
0b89440
fix: invalid `cron` attribute, 7 is not standard, should use 0 to rep…
WenjieDu Apr 9, 2023
8ca4952
doc: update README, split the table of the available algos according …
WenjieDu Apr 9, 2023
e55325b
Merge pull request #44 from WenjieDu/dev
WenjieDu Apr 9, 2023
538b4c3
refactor: move gene_incomplete_random_walk_dataset and gene_physionet…
WenjieDu Apr 10, 2023
52dc756
fix: correct the mistaken path to environment_for_pip_test.txt;
WenjieDu Apr 10, 2023
a8066ee
fix: fix error the caused by renaming file `test_logging` to `test_ut…
WenjieDu Apr 10, 2023
12d0a2a
feat: remove `pull_request` trigger to avoid duplicate CI running;
WenjieDu Apr 10, 2023
8a91b7b
Merge pull request #45 from WenjieDu/dev
WenjieDu Apr 10, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fix: solve the problem of circular import;
moved the functions of parsing delta to util.py.
  • Loading branch information
WenjieDu committed Apr 6, 2023
commit 88780e6cce471f77fcb4b45dbe1aef6be5d6ae66
5 changes: 4 additions & 1 deletion pypots/data/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,16 +9,19 @@
from pypots.data.dataset_for_brits import DatasetForBRITS
from pypots.data.dataset_for_grud import DatasetForGRUD
from pypots.data.dataset_for_mit import DatasetForMIT

from pypots.data.generating import (
generate_random_walk,
generate_random_walk_for_classification,
)
from pypots.data.integration import (

from pypots.data.utils import (
masked_fill,
mcar,
pickle_load,
pickle_dump,
)

from pypots.data.load_specific_datasets import (
list_supported_datasets,
load_specific_dataset,
Expand Down
2 changes: 1 addition & 1 deletion pypots/data/dataset_for_brits.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
import torch

from pypots.data.base import BaseDataset
from pypots.data.dataset_for_grud import torch_parse_delta
from pypots.data.utils import torch_parse_delta


class DatasetForBRITS(BaseDataset):
Expand Down
85 changes: 2 additions & 83 deletions pypots/data/dataset_for_grud.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,95 +6,15 @@
# License: GLP-v3


import numpy as np
import torch

from pypots.data.base import BaseDataset
from pypots.data.utils import torch_parse_delta
from pypots.imputation.locf import LOCF


def torch_parse_delta(missing_mask):
"""Generate time-gap (delta) matrix from missing masks. Please refer to :cite:`che2018GRUD` for its math definition.

Parameters
----------
missing_mask : torch.tensor, shape of [n_steps, n_features] or [n_samples, n_steps, n_features]
Binary masks indicate missing values.

Returns
-------
delta, torch.tensor,
Delta matrix indicates time gaps of missing values.
"""

def cal_delta_for_single_sample(mask):
"""calculate single sample's delta. The sample's shape is [n_steps, n_features]."""
d = []
for step in range(n_steps):
if step == 0:
d.append(torch.zeros(1, n_features, device=device))
else:
d.append(
torch.ones(1, n_features, device=device) + (1 - mask[step]) * d[-1]
)
d = torch.concat(d, dim=0)
return d

device = missing_mask.device
if len(missing_mask.shape) == 2:
n_steps, n_features = missing_mask.shape
delta = cal_delta_for_single_sample(missing_mask)
else:
n_samples, n_steps, n_features = missing_mask.shape
delta_collector = []
for m_mask in missing_mask:
delta = cal_delta_for_single_sample(m_mask)
delta_collector.append(delta.unsqueeze(0))
delta = torch.concat(delta_collector, dim=0)

return delta


def numpy_parse_delta(missing_mask):
"""Generate time-gap (delta) matrix from missing masks. Please refer to :cite:`che2018GRUD` for its math definition.

Parameters
----------
missing_mask : np.ndarray, shape of [n_steps, n_features] or [n_samples, n_steps, n_features]
Binary masks indicate missing values.

Returns
-------
delta, np.ndarray,
Delta matrix indicates time gaps of missing values.
"""

def cal_delta_for_single_sample(mask):
"""calculate single sample's delta. The sample's shape is [n_steps, n_features]."""
d = []
for step in range(seq_len):
if step == 0:
d.append(np.zeros(n_features))
else:
d.append(np.ones(n_features) + (1 - mask[step]) * d[-1])
d = np.asarray(d)
return d

if len(missing_mask.shape) == 2:
seq_len, n_features = missing_mask.shape
delta = cal_delta_for_single_sample(missing_mask)
else:
n_samples, seq_len, n_features = missing_mask.shape
delta_collector = []
for m_mask in missing_mask:
delta = cal_delta_for_single_sample(m_mask)
delta_collector.append(delta)
delta = np.asarray(delta_collector)
return delta


class DatasetForGRUD(BaseDataset):
"""Dataset class for model GRUD.
"""Dataset class for model GRU-D.

Parameters
----------
Expand All @@ -113,7 +33,6 @@ class DatasetForGRUD(BaseDataset):

def __init__(self, data, file_type="h5py"):
super().__init__(data, file_type)

self.locf = LOCF()

if not isinstance(self.data, str): # data from array
Expand Down
28 changes: 0 additions & 28 deletions pypots/data/integration.py

This file was deleted.

111 changes: 111 additions & 0 deletions pypots/data/utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
"""
Data utils.
"""

# Created by Wenjie Du <wenjay.du@gmail.com>
# License: GLP-v3

import numpy as np
import torch


import pycorruptor as corruptor
from tsdb import (
pickle_load as _pickle_load,
pickle_dump as _pickle_dump,
)

pickle_load = _pickle_load
pickle_dump = _pickle_dump


def cal_missing_rate(X):
return corruptor.cal_missing_rate(X)


def masked_fill(X, mask, val):
return corruptor.masked_fill(X, mask, val)


def mcar(X, rate, nan=0):
return corruptor.mcar(X, rate, nan)


def torch_parse_delta(missing_mask):
"""Generate time-gap (delta) matrix from missing masks. Please refer to :cite:`che2018GRUD` for its math definition.

Parameters
----------
missing_mask : torch.tensor, shape of [n_steps, n_features] or [n_samples, n_steps, n_features]
Binary masks indicate missing values.

Returns
-------
delta, torch.tensor,
Delta matrix indicates time gaps of missing values.
"""

def cal_delta_for_single_sample(mask):
"""calculate single sample's delta. The sample's shape is [n_steps, n_features]."""
d = []
for step in range(n_steps):
if step == 0:
d.append(torch.zeros(1, n_features, device=device))
else:
d.append(
torch.ones(1, n_features, device=device) + (1 - mask[step]) * d[-1]
)
d = torch.concat(d, dim=0)
return d

device = missing_mask.device
if len(missing_mask.shape) == 2:
n_steps, n_features = missing_mask.shape
delta = cal_delta_for_single_sample(missing_mask)
else:
n_samples, n_steps, n_features = missing_mask.shape
delta_collector = []
for m_mask in missing_mask:
delta = cal_delta_for_single_sample(m_mask)
delta_collector.append(delta.unsqueeze(0))
delta = torch.concat(delta_collector, dim=0)

return delta


def numpy_parse_delta(missing_mask):
"""Generate time-gap (delta) matrix from missing masks. Please refer to :cite:`che2018GRUD` for its math definition.

Parameters
----------
missing_mask : np.ndarray, shape of [n_steps, n_features] or [n_samples, n_steps, n_features]
Binary masks indicate missing values.

Returns
-------
delta, np.ndarray,
Delta matrix indicates time gaps of missing values.
"""

def cal_delta_for_single_sample(mask):
"""calculate single sample's delta. The sample's shape is [n_steps, n_features]."""
d = []
for step in range(seq_len):
if step == 0:
d.append(np.zeros(n_features))
else:
d.append(np.ones(n_features) + (1 - mask[step]) * d[-1])
d = np.asarray(d)
return d

if len(missing_mask.shape) == 2:
seq_len, n_features = missing_mask.shape
delta = cal_delta_for_single_sample(missing_mask)
else:
n_samples, seq_len, n_features = missing_mask.shape
delta_collector = []
for m_mask in missing_mask:
delta = cal_delta_for_single_sample(m_mask)
delta_collector.append(delta)
delta = np.asarray(delta_collector)
return delta
8 changes: 6 additions & 2 deletions pypots/imputation/brits.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,11 @@
"""
PyTorch BRITS model for the time-series imputation task.
Some part of the code is from https://github.com/caow13/BRITS.

Notes
-----
Partial implementation uses code from https://github.com/caow13/BRITS.
"""

# Created by Wenjie Du <wenjay.du@gmail.com>
# License: GPL-v3

Expand All @@ -16,7 +20,7 @@
from torch.utils.data import DataLoader

from pypots.data.dataset_for_brits import DatasetForBRITS
from pypots.data.integration import mcar, masked_fill
from pypots.data.utils import mcar, masked_fill
from pypots.imputation.base import BaseNNImputer
from pypots.utils.metrics import cal_mae

Expand Down
7 changes: 5 additions & 2 deletions pypots/imputation/saits.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
"""
PyTorch SAITS model for the time-series imputation task.
Some part of the code is from https://github.com/WenjieDu/SAITS.

Notes
-----
Partial implementation uses code from https://github.com/WenjieDu/SAITS.
"""

# Created by Wenjie Du <wenjay.du@gmail.com>
Expand All @@ -14,7 +17,7 @@

from pypots.data.base import BaseDataset
from pypots.data.dataset_for_mit import DatasetForMIT
from pypots.data.integration import mcar, masked_fill
from pypots.data.utils import mcar, masked_fill
from pypots.imputation.base import BaseNNImputer
from pypots.imputation.transformer import EncoderLayer, PositionalEncoding
from pypots.utils.metrics import cal_mae
Expand Down
7 changes: 5 additions & 2 deletions pypots/imputation/transformer.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
"""
PyTorch Transformer model for the time-series imputation task.
Some part of the code is from https://github.com/WenjieDu/SAITS.

Notes
-----
Partial implementation uses code from https://github.com/WenjieDu/SAITS.
"""

# Created by Wenjie Du <wenjay.du@gmail.com>
Expand All @@ -14,7 +17,7 @@

from pypots.data.base import BaseDataset
from pypots.data.dataset_for_mit import DatasetForMIT
from pypots.data.integration import mcar, masked_fill
from pypots.data.utils import mcar, masked_fill
from pypots.imputation.base import BaseNNImputer
from pypots.utils.metrics import cal_mae

Expand Down