Skip to content

Develop metatree #54

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 40 commits into from
Dec 24, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
7d294fb
Change _GenNode to _Node
yuta-nakahara Dec 12, 2022
a9942ec
Reduce arguments of _Node.__init__()
yuta-nakahara Dec 12, 2022
de99d95
Revise set_h_params
yuta-nakahara Dec 13, 2022
1249653
Modify gen_params
yuta-nakahara Dec 14, 2022
159b126
Bug fix
yuta-nakahara Dec 14, 2022
c06e23f
Remove _LearnNode
yuta-nakahara Dec 16, 2022
e7347ad
Revise set_params and estimate_params
yuta-nakahara Dec 16, 2022
99cdd2e
Revise set_h_params and make_prediction
yuta-nakahara Dec 16, 2022
41279d7
Pass self.rng to SubModel.GenModel
yuta-nakahara Dec 16, 2022
040474f
Create _metatree.py
yuta-nakahara Dec 16, 2022
0e0192a
Add pos_int_vec
yuta-nakahara Dec 16, 2022
691a0e7
Merge branch 'develop-check' into develop-metatree-continuous
yuta-nakahara Dec 16, 2022
e267f23
Add pos_ints
yuta-nakahara Dec 16, 2022
d6d1b17
Merge branch 'develop-check' into develop-metatree-continuous
yuta-nakahara Dec 16, 2022
09ebf4f
Revise set_h_params in GenModel
yuta-nakahara Dec 16, 2022
9a468a7
Merge branch 'develop-metatree' into develop-metatree-continuous
yuta-nakahara Dec 16, 2022
6f102fd
Revise set_h, h0, hn_params_recursion
yuta-nakahara Dec 16, 2022
fdc2878
Merge branch 'develop-metatree' into develop-metatree-continuous
yuta-nakahara Dec 16, 2022
8b4e35c
Add continuous features to GenModel
yuta-nakahara Dec 17, 2022
04cab6a
Revise visualize_model
yuta-nakahara Dec 17, 2022
dd81760
Modify __init__, get, set, visualize
yuta-nakahara Dec 18, 2022
7445e62
Faster copy of list and array
yuta-nakahara Dec 18, 2022
c2a9b00
Modify update_posterior
yuta-nakahara Dec 18, 2022
5a8458d
Modify _given_MT
yuta-nakahara Dec 18, 2022
c107fd3
Modify estimate_params
yuta-nakahara Dec 18, 2022
d7b2ea0
Modify calc_pred_dist
yuta-nakahara Dec 18, 2022
2bffa9f
Modify make_prediction
yuta-nakahara Dec 18, 2022
3b52aa3
Acceleration
yuta-nakahara Dec 18, 2022
d1583f3
Delete MAP prediction
yuta-nakahara Dec 18, 2022
42aa4d6
Merge branch 'develop' into develop-metatree-continuous
yuta-nakahara Dec 20, 2022
7f27f22
Refine sub_h0_params and sub_hn_params sharing
yuta-nakahara Dec 21, 2022
6be85f0
Refine description
yuta-nakahara Dec 21, 2022
6bd570c
Add reshaping of x
yuta-nakahara Dec 22, 2022
8574e98
Merge branch 'develop' into develop-metatree-continuous
yuta-nakahara Dec 22, 2022
f3da427
Support categorical and linearregression
yuta-nakahara Dec 23, 2022
80d7867
Revise documents
yuta-nakahara Dec 23, 2022
0b6d5eb
Delete _metatree_x_discrete.py
yuta-nakahara Dec 23, 2022
ae6c960
Merge pull request #53 from yuta-nakahara/develop-metatree-continuous
yuta-nakahara Dec 23, 2022
725ea62
Revise index search
yuta-nakahara Dec 24, 2022
3b4defb
Delete metatree_test.py
yuta-nakahara Dec 24, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions bayesml/_check.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,16 @@ def nonneg_ints(val,val_name,exception_class):
return val
raise(exception_class(val_name + " must be int or a numpy.ndarray whose dtype is int. Its values must be non-negative (including 0)."))

def pos_ints(val,val_name,exception_class):
try:
return pos_int(val,val_name,exception_class)
except:
pass
if type(val) is np.ndarray:
if np.issubdtype(val.dtype,np.integer) and np.all(val>0):
return val
raise(exception_class(val_name + " must be int or a numpy.ndarray whose dtype is int. Its values must be positive (not including 0)."))

def int_vec(val,val_name,exception_class):
if type(val) is np.ndarray:
if np.issubdtype(val.dtype,np.integer) and val.ndim == 1:
Expand All @@ -59,6 +69,12 @@ def nonneg_int_vec(val,val_name,exception_class):
return val
raise(exception_class(val_name + " must be a 1-dimensional numpy.ndarray whose dtype is int. Its values must be non-negative (including 0)."))

def pos_int_vec(val,val_name,exception_class):
if type(val) is np.ndarray:
if np.issubdtype(val.dtype,np.integer) and val.ndim == 1 and np.all(val>0):
return val
raise(exception_class(val_name + " must be a 1-dimensional numpy.ndarray whose dtype is int. Its values must be positive (not including 0)."))

def nonneg_int_vecs(val,val_name,exception_class):
if type(val) is np.ndarray:
if np.issubdtype(val.dtype,np.integer) and val.ndim >= 1 and np.all(val>=0):
Expand Down
11 changes: 10 additions & 1 deletion bayesml/bernoulli/_bernoulli.py
Original file line number Diff line number Diff line change
Expand Up @@ -290,6 +290,9 @@ def get_hn_params(self):
"""
return {"hn_alpha":self.hn_alpha, "hn_beta":self.hn_beta}

def _check_sample(self,x):
return _check.ints_of_01(x,'x',DataFormatError)

def update_posterior(self,x):
"""Update the hyperparameters of the posterior distribution using traning data.

Expand All @@ -298,7 +301,7 @@ def update_posterior(self,x):
x : numpy.ndarray
All the elements must be 0 or 1.
"""
_check.ints_of_01(x,'x',DataFormatError)
x = self._check_sample(x)
self.hn_alpha += np.count_nonzero(x==1)
self.hn_beta += np.count_nonzero(x==0)
return self
Expand Down Expand Up @@ -424,6 +427,12 @@ def calc_pred_dist(self):
self.p_theta = self.hn_alpha / (self.hn_alpha + self.hn_beta)
return self

def _calc_pred_density(self,x):
if x:
return self.p_theta
else:
return 1.0-self.p_theta

def make_prediction(self,loss="squared"):
"""Predict a new data point under the given criterion.

Expand Down
36 changes: 24 additions & 12 deletions bayesml/categorical/_categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -321,6 +321,22 @@ def get_hn_params(self):
"""
return {"hn_alpha_vec": self.hn_alpha_vec}

# default onehot option is False because it is used in metatree
def _check_sample(self,x,onehot=False):
if onehot:
_check.onehot_vecs(x,'x',DataFormatError)
if x.shape[-1] != self.c_degree:
raise(DataFormatError(f"x.shape[-1] must be c_degree:{self.c_degree}"))
return x.reshape(-1,self.c_degree)
else:
_check.nonneg_ints(x,'x',DataFormatError)
if np.max(x) >= self.c_degree:
raise(DataFormatError(
'np.max(x) must be smaller than self.c_degree: '
+f'np.max(x) = {np.max(x)}, self.c_degree = {self.c_degree}'
))
return x

def update_posterior(self,x,onehot=True):
"""Update the hyperparameters of the posterior distribution using traning data.

Expand All @@ -336,22 +352,12 @@ def update_posterior(self,x,onehot=True):
If True, the input sample must be one-hot encoded,
by default True.
"""
x = self._check_sample(x,onehot)
if onehot:
_check.onehot_vecs(x,'x',DataFormatError)
if x.shape[-1] != self.c_degree:
raise(DataFormatError(f"x.shape[-1] must be c_degree:{self.c_degree}"))
x = x.reshape(-1,self.c_degree)
self.hn_alpha_vec[:] += x.sum(axis=0)
else:
_check.nonneg_ints(x,'x',DataFormatError)
if np.max(x) >= self.c_degree:
raise(DataFormatError(
'np.max(x) must be smaller than self.c_degree: '
+f'np.max(x) = {np.max(x)}, self.c_degree = {self.c_degree}'
))
for k in range(self.c_degree):
self.hn_alpha_vec[k] += np.count_nonzero(x==k)

return self

def _update_posterior(self,x):
Expand Down Expand Up @@ -396,7 +402,10 @@ def estimate_params(self, loss="squared",dict_out=False):
return (self.hn_alpha_vec - 1) / (np.sum(self.hn_alpha_vec) - self.c_degree)
else:
warnings.warn("MAP estimate of lambda_mat doesn't exist for the current hn_alpha_vec.",ResultWarning)
return None
if dict_out:
return {'theta_vec':None}
else:
return None
elif loss == "KL":
return ss_dirichlet(alpha=self.hn_alpha_vec)
else:
Expand Down Expand Up @@ -476,6 +485,9 @@ def calc_pred_dist(self):
"""Calculate the parameters of the predictive distribution."""
self.p_theta_vec[:] = self.hn_alpha_vec / self.hn_alpha_vec.sum()
return self

def _calc_pred_density(self,x):
return self.p_theta_vec[x]

def make_prediction(self,loss="squared",onehot=True):
"""Predict a new data point under the given criterion.
Expand Down
8 changes: 7 additions & 1 deletion bayesml/exponential/_exponential.py
Original file line number Diff line number Diff line change
Expand Up @@ -292,6 +292,9 @@ def get_hn_params(self):
"""
return {"hn_alpha":self.hn_alpha, "hn_beta":self.hn_beta}

def _check_sample(self,x):
return _check.pos_floats(x, 'x', DataFormatError)

def update_posterior(self,x):
"""Update the hyperparameters of the posterior distribution using traning data.

Expand All @@ -300,7 +303,7 @@ def update_posterior(self,x):
x : numpy.ndarray
All the elements must be positive real numbers.
"""
_check.pos_floats(x, 'x', DataFormatError)
x = self._check_sample(x)
try:
self.hn_alpha += x.size
except:
Expand Down Expand Up @@ -420,6 +423,9 @@ def calc_pred_dist(self):
self.p_lambda = self.hn_beta
return self

def _calc_pred_density(self,x):
return ss_lomax.pdf(x,c=self.p_kappa,scale=self.p_lambda)

def make_prediction(self,loss="squared"):
"""Predict a new data point under the given criterion.

Expand Down
43 changes: 31 additions & 12 deletions bayesml/linearregression/_linearregression.py
Original file line number Diff line number Diff line change
Expand Up @@ -501,6 +501,24 @@ def get_hn_params(self):
"""
return {"hn_mu_vec":self.hn_mu_vec, "hn_lambda_mat":self.hn_lambda_mat, "hn_alpha":self.hn_alpha, "hn_beta":self.hn_beta}

def _check_sample_x(self,x):
_check.float_vecs(x,'x',DataFormatError)
if x.shape[-1] != self.c_degree:
raise(DataFormatError(f"x.shape[-1] must be c_degree:{self.c_degree}"))

def _check_sample_y(self,y):
_check.floats(y,'y',DataFormatError)

def _check_sample(self,x,y):
self._check_sample_x(x)
self._check_sample_y(y)
if type(y) is np.ndarray:
if x.shape[:-1] != y.shape:
raise(DataFormatError(f"x.shape[:-1] and y.shape must be same."))
elif x.shape[:-1] != ():
raise(DataFormatError(f"If y is a scaler, x.shape[:-1] must be the empty tuple ()."))
return x.reshape(-1,self.c_degree), np.ravel(y)

def update_posterior(self, x, y):
"""Update the hyperparameters of the posterior distribution using traning data.

Expand All @@ -512,18 +530,7 @@ def update_posterior(self, x, y):
y : numpy ndarray
float array.
"""
_check.float_vecs(x,'x',DataFormatError)
if x.shape[-1] != self.c_degree:
raise(DataFormatError(f"x.shape[-1] must be c_degree:{self.c_degree}"))
_check.floats(y,'y',DataFormatError)
if type(y) is np.ndarray:
if x.shape[:-1] != y.shape:
raise(DataFormatError(f"x.shape[:-1] and y.shape must be same."))
elif x.shape[:-1] != ():
raise(DataFormatError(f"If y is a scaler, x.shape[:-1] must be the empty tuple ()."))

x = x.reshape(-1,self.c_degree)
y = np.ravel(y)
x,y = self._check_sample(x,y)

hn1_Lambda = np.array(self.hn_lambda_mat)
hn1_mu = np.array(self.hn_mu_vec)
Expand All @@ -536,6 +543,8 @@ def update_posterior(self, x, y):

def _update_posterior(self, x, y):
"""Update opsterior without input check."""
x = x.reshape(-1,self.c_degree)
y = np.ravel(y)
hn1_Lambda = np.array(self.hn_lambda_mat)
hn1_mu = np.array(self.hn_mu_vec)
self.hn_lambda_mat += x.T @ x
Expand Down Expand Up @@ -703,6 +712,16 @@ def calc_pred_dist(self, x):
self.p_nu = 2.0 * self.hn_alpha
return self

def _calc_pred_dist(self, x):
"""Calculate predictive distribution without check."""
self.p_m = x @ self.hn_mu_vec
self.p_lambda = self.hn_alpha / self.hn_beta / (1.0 + x @ np.linalg.solve(self.hn_lambda_mat,x))
self.p_nu = 2.0 * self.hn_alpha
return self

def _calc_pred_density(self,y):
return ss_t.pdf(y,loc=self.p_m, scale=1.0/np.sqrt(self.p_lambda), df=self.p_nu)

def make_prediction(self,loss="squared"):
"""Predict a new data point under the given criterion.

Expand Down
30 changes: 6 additions & 24 deletions bayesml/metatree/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,19 @@
r"""
The stochastic data generative model is as follows:

* :math:`\mathcal{X}` : a space of an explanatory variable (a finite set)
* :math:`\boldsymbol{x}=[x_1, \ldots, x_d] \in \mathcal{X}^d` : an explanatory variable
* :math:`\boldsymbol{x}=[x_1, \ldots, x_p, x_{p+1}, \ldots , x_{p+q}]` : an explanatory variable. The first :math:`p` variables are continuous. The other :math:`q` variables are categorical.
* :math:`\mathcal{Y}` : a space of an objective variable
* :math:`y \in \mathcal{Y}` : an objective variable
* :math:`D_\mathrm{max} \in \mathbb{N}` : the maximum depth of trees
* :math:`T` : :math:`|\mathcal{X}|`-ary regular tree whose depth is smaller than or equal to :math:`D_\mathrm{max}`, where "regular" means that all inner nodes have :math:`k` child nodes.
* :math:`T` : a tree whose depth is smaller than or equal to :math:`D_\mathrm{max}`
* :math:`\mathcal{T}` : a set of :math:`T`
* :math:`s` : a node of a tree
* :math:`\mathcal{S}` : a set of :math:`s`
* :math:`\mathcal{I}(T)` : a set of inner nodes of :math:`T`
* :math:`\mathcal{L}(T)` : a set of leaf nodes of :math:`T`
* :math:`\boldsymbol{k}=(k_s)_{s \in \mathcal{S}}` : feature assign vector where :math:`k_s \in \{1,2,\ldots,d\}`
* :math:`\boldsymbol{k}=(k_s)_{s \in \mathcal{S}}` : feature assignmet vector where :math:`k_s \in \{1, 2,\ldots,p+q\}`. If :math:`k_s \leq p`, the node :math:`s` has a threshold.
* :math:`\boldsymbol{\theta}=(\theta_s)_{s \in \mathcal{S}}` : a set of parameter
* :math:`s(\boldsymbol{x}) \in \mathcal{L}(T)` : a leaf node of :math:`T` corresponding to :math:`\boldsymbol{x}`
* :math:`s(\boldsymbol{x}) \in \mathcal{L}(T)` : a leaf node of :math:`T` corresponding to :math:`\boldsymbol{x}`, which is determined according to :math:`\boldsymbol{k}` and the thresholds.

.. math::
p(y | \boldsymbol{x}, \boldsymbol{\theta}, T, \boldsymbol{k})=p(y | \theta_{s(\boldsymbol{x})})
Expand Down Expand Up @@ -103,29 +102,12 @@
\qquad + g_{n,s} \mathbb{E}_{\tilde{q}_{s_{\mathrm{child}}}(y_{n+1} | \boldsymbol{x}_{n+1}, \boldsymbol{x}^n, y^n, M_{T_b, \boldsymbol{k}_b})} [Y_{n+1} | \boldsymbol{x}_{n+1}, \boldsymbol{x}^n, y^n, \boldsymbol{k}_b] ,& ({\rm otherwise}).
\end{cases}

The maximum value of the predictive distribution can be calculated as follows.

.. math::
\max_{y_{n+1}} p(y_{n+1}| \boldsymbol{x}_{n+1}, \boldsymbol{x}^n, y^n) = \max_{b = 1, \dots , B} \left\{ p(\boldsymbol{k}_b | \boldsymbol{x}^n, y^n) \max_{y_{n+1}} \tilde{q}_{s_{\lambda}}(y_{n+1}|\boldsymbol{x}_{n+1},\boldsymbol{x}^n, y^n, M_{T_b, \boldsymbol{k}_b}) \right\},

where the maximum value of :math:`\tilde{q}` is recursively given as follows.

.. math::
&\max_{y_{n+1}} \tilde{q}_s(y_{n+1} | \boldsymbol{x}_{n+1}, \boldsymbol{x}^n, y^n, M_{T_b, \boldsymbol{k}_b}) \\
&= \begin{cases}
\max_{y_{n+1}} q_s(y_{n+1} | \boldsymbol{x}_{n+1}, \boldsymbol{x}^n, y^n, \boldsymbol{k}_b),& (s \ {\rm is \ the \ leaf \ node \ of} \ M_{T_b, \boldsymbol{k}_b}),\\
\max \{ (1-g_{n,s}) \max_{y_{n+1}} q_s(y_{n+1} | \boldsymbol{x}_{n+1}, \boldsymbol{x}^n, y^n, \boldsymbol{k}_b), \\
\qquad \qquad g_{n,s} \max_{y_{n+1}} \tilde{q}_{s_{\mathrm{child}}}(y_{n+1} | \boldsymbol{x}_{n+1}, \boldsymbol{x}^n, y^n, M_{T_b, \boldsymbol{k}_b}) \} ,& ({\rm otherwise}).
\end{cases}

The mode of the predictive distribution can be also calculated by using the above equation.

References

* Dobashi, N.; Saito, S.; Nakahara, Y.; Matsushima, T. Meta-Tree Random Forest: Probabilistic Data-Generative Model and Bayes Optimal Prediction. *Entropy* 2021, 23, 768. https://doi.org/10.3390/e23060768
* Nakahara, Y.; Saito, S.; Kamatsuka, A.; Matsushima, T. Probability Distribution on Full Rooted Trees. *Entropy* 2022, 24, 328. https://doi.org/10.3390/e24030328
"""
from ._metatree_x_discrete import GenModel
from ._metatree_x_discrete import LearnModel
from ._metatree import GenModel
from ._metatree import LearnModel

__all__ = ["GenModel", "LearnModel"]
Loading