Skip to content

Release 0.3.1 #92

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 50 commits into from
Apr 29, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
4404755
Merge pull request #86 from bayesml/release-0.3.0
yuta-nakahara Mar 9, 2025
a0a2d0a
Merge pull request #88 from bayesml/main-hotfix
yuta-nakahara Mar 25, 2025
8f46ad2
Batch updating of p_lambda
yuta-nakahara Mar 26, 2025
a080495
Update _linearregression.py
yuta-nakahara Apr 2, 2025
0a8b3c1
Batch prediction in metatree
yuta-nakahara Apr 3, 2025
73d3de4
Transpose calc_pred_density
yuta-nakahara Apr 3, 2025
64a9883
add demo on unittest
1jonao Apr 4, 2025
8edffcb
Fix a bug about input shapes
yuta-nakahara Apr 6, 2025
6a7dc96
added test_metatree.py
1jonao Apr 7, 2025
23f55db
changed directory name from BayesML/unittest/ to BayesML/tests/
1jonao Apr 7, 2025
f2141cd
Update test_metatree.py
yuta-nakahara Apr 9, 2025
3be614e
Add categorical
yuta-nakahara Apr 9, 2025
904118b
add test for bernoulli metatree but the tests(categorical, bernoulli)…
1jonao Apr 9, 2025
c485f8b
Update test_metatree.py
1jonao Apr 9, 2025
1ce2232
Update calc_pred_density for all submodels
yuta-nakahara Apr 9, 2025
76268ff
Add poisson, exponential
yuta-nakahara Apr 9, 2025
9df4407
add test_metatree_normal_batch_pred
1jonao Apr 9, 2025
6dbeef0
Modify variable names
yuta-nakahara Apr 10, 2025
ff9dd6b
Update docstrings
yuta-nakahara Apr 16, 2025
94b5884
Update docstrings
yuta-nakahara Apr 16, 2025
c0d39ba
Remove tmp files
yuta-nakahara Apr 16, 2025
087c56c
Merge pull request #89 from bayesml/develop-batch_prediction
yuta-nakahara Apr 16, 2025
50af009
Update _linearregression.py
yuta-nakahara Apr 24, 2025
dfff4b3
Update _bernoulli.py
yuta-nakahara Apr 24, 2025
b6e3c5d
Update _categorical.py
yuta-nakahara Apr 24, 2025
a718962
Update _exponential.py
yuta-nakahara Apr 24, 2025
8ab7a75
Update _exponential.py
yuta-nakahara Apr 24, 2025
8faf8ac
Update _multivariatenormal.py
yuta-nakahara Apr 24, 2025
5cd744f
Update _normal.py
yuta-nakahara Apr 26, 2025
33ddc5f
Update _exponential.py
yuta-nakahara Apr 26, 2025
45a63b7
Update _normal.py
yuta-nakahara Apr 26, 2025
ab02073
Update _poisson.py
yuta-nakahara Apr 26, 2025
c26c164
Update _metatree.py
yuta-nakahara Apr 26, 2025
a55888e
Update docstrings
yuta-nakahara Apr 26, 2025
b82c4cb
Merge pull request #90 from bayesml/develop-fit_predict
yuta-nakahara Apr 26, 2025
ce79e99
Update README_jp
yuta-nakahara Apr 27, 2025
8e3cda1
Update README_jp.md
yuta-nakahara Apr 27, 2025
1c4ca74
Update README and e-mail
yuta-nakahara Apr 27, 2025
071175e
Update README.md
yuta-nakahara Apr 27, 2025
03fa108
Update README
yuta-nakahara Apr 27, 2025
7ab709e
Update index.rst
yuta-nakahara Apr 27, 2025
884f9a8
Update setup.py
yuta-nakahara Apr 27, 2025
c079d2d
Update docstring for update_posterior
yuta-nakahara Apr 27, 2025
0db79cb
Update _metatree.py
yuta-nakahara Apr 28, 2025
23f8013
Add paper info
yuta-nakahara Apr 28, 2025
1327517
Update metatree_prediction_interval.ipynb
yuta-nakahara Apr 29, 2025
4fe5d79
Update description of algorithms
yuta-nakahara Apr 29, 2025
ef4999d
Update developers.rst
yuta-nakahara Apr 29, 2025
377e719
Update web pages in docs
yuta-nakahara Apr 29, 2025
437e333
Merge pull request #91 from bayesml/develop-web_pages
yuta-nakahara Apr 29, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<!--
Document Author
Yuta Nakahara <yuta.nakahara@aoni.waseda.jp>
Yuta Nakahara <y.nakahara@waseda.jp>
Shota Saito <shota.s@gunma-u.ac.jp>
-->
# How to contribute
Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING_jp.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<!--
Document Author
Yuta Nakahara <yuta.nakahara@aoni.waseda.jp>
Yuta Nakahara <y.nakahara@waseda.jp>
-->
# コントリビューションの方法

Expand Down
214 changes: 162 additions & 52 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<!--
Document Author
Yuta Nakahara <yuta.nakahara@aoni.waseda.jp>
Yuta Nakahara <y.nakahara@waseda.jp>
Shota Saito <shota.s@gunma-u.ac.jp>
-->

Expand All @@ -14,29 +14,34 @@ Shota Saito <shota.s@gunma-u.ac.jp>

<img src="./doc/logos/BayesML_logo.png" width="600">

## Purpose
# Your First Library for Bayesian Machine Learning

BayesML is a library designed for promoting research, education, and application of machine learning based on Bayesian statistics and Bayesian decision theory. Through these activities, BayesML aims to contribute to society.
BayesML contributes to wide society thourgh promoting education, research, and application of machine learning based on Bayesian statistics and Bayesian decision theory.

## Characteristics

BayesML has the following characteristics.
* **Easy-to-use:**
* You can use pre-defined Bayesian statistical models by simply importing it. You don't need to define models yourself like PyMC or Stan.
* **Bayesian Decision Theoretic API:**
* BayesML's API corresponds to the structure of decision-making based on Bayesian decision theory. Bayesian decision theory is a unified framework for handling various decision-making processes, such as parameter estimation and prediction of new data. Therefore, BayesML enables intuitive operations for a wider range of decision-making compared to the fit-predict type API adopted in libraries like scikit-learn. Moreover, many of our models also implement fit-predict functions.
* **Model Visuialization Functions:**
* All packages have methods to visualize the probabilistic data generative model, generated data from that model, and the posterior distribution learned from the data in 2~3 dimensional space. Thus, you can effectively understand the characteristics of probabilistic data generative models and algorithms through the generation of synthetic data and learning from them.
* **Fast Algorithms Using Conjugate Prior Distributions:**
* Many of our learning algorithms adopt exact calculation methods or variational Bayesian methods that effectively use the conjugacy between probabilistic data generative models and prior distributions. Therefore, they are much faster than general-purpose MCMC methods and are also suitable for online learning. Although some algorithms adopt MCMC methods, but they use MCMC methods specialized for each model, taking advantage of conjugacy.

* The structure of the library reflects the philosophy of Bayesian statistics and Bayesian decision theory: updating the posterior distribution learned from the data and outputting the optimal estimate based on the Bayes criterion.
* Many of our learning algorithms are much faster than general-purpose Bayesian learning algorithms such as MCMC methods because they effectively use the conjugate property of a probabilistic data generative model and a prior distribution. Moreover, they are suitable for online learning.
* All packages have methods to visualize the probabilistic data generative model, generated data from that model, and the posterior distribution learned from the data in 2~3 dimensional space. Thus, you can effectively understand the characteristics of probabilistic data generative models and algorithms through the generation of synthetic data and learning from them.

For more details, see our [website](https://bayesml.github.io/BayesML/ "BayesML's Documentation").
For more details, see our [website](https://bayesml.github.io/BayesML/).

## News

* Our algorithm for the meta-tree model is accepted at AISTATS 2025! A sample code is [here](https://bayesml.github.io/BayesML/examples/metatree_prediction_interval.html).
* Our algorithm for the meta-tree model has been accepted to AISTATS 2025! For more details, please see the links below.
* [Paper](https://proceedings.mlr.press/v258/nakahara25a.html)
* [Code Example](https://bayesml.github.io/BayesML/examples/metatree_prediction_interval.html)

## Installation

Please use the following commands to install BayesML.

``` bash
```bash
pip install bayesml
```

Expand All @@ -48,76 +53,181 @@ The following are required.
* MatplotLib (>= 3.5)
* Scikit-learn (>= 1.1)

## Example
## Tutorial

Each model in BayesML has two classes. One is `GenModel`, which can be used for parameter generation from prior or posterior distributions, and data generation. The other is `LearnModel`, which can be used for estimating posterior distributions from data and calculating predictive distributions. Each has an API that aligns with Bayesian decision theory. Let's look at how to use each with the `linearregression` model as an example.

### Synthetic Data Generation with `GenModel`

We show an example of generating data drawn according to the Bernoulli distribution and learning from them.
First, let's import the library.

First, we create an instance of a probabilistic data generative model. Here, the parameter `theta`, which represents an occurrence probability of 1, is set to 0.7.
```python
import numpy as np
from bayesml import linearregression
```

``` python
from bayesml import bernoulli
Next, we create an instance of the probabilistic data generative model. Here, we specify the dimension of the regression coefficients (including the constant term) as `c_degree=2` as a constant of the model, and we specify the regression coefficients as `theta_vec = np.array([1,1])` and the precision (inverse of variance) of the noise term as `tau = 10` as parameters.

gen_model = bernoulli.GenModel(theta=0.7)
```python
gen_model = linearregression.GenModel(
c_degree = 2, # degree
theta_vec = np.array([1,1]), # coefficients
tau = 10, # noise precision
)
```

You can visualize the characteristics of the created model by the following method.

``` python
```python
gen_model.visualize_model()
```

>theta:0.7
>x0:[1 1 1 0 1 1 1 0 1 1 1 1 1 1 0 1 1 1 0 1]
>x1:[1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0]
>x2:[1 0 1 1 0 1 1 1 0 1 1 1 1 1 0 0 1 1 1 1]
>x3:[1 1 1 0 1 1 0 1 0 0 0 0 1 0 1 1 1 1 1 1]
>x4:[0 0 1 0 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1]
>![bernoulli_example1](./doc/images/README_ex_img1.png)
>Output:
>theta_vec:
>[1. 1.]
>tau:
>10.0
>![png](./doc/images/README_LR1.png)

To generate a sample and save it to variables `x` and `y`, we use the following method:

```python
x,y = gen_model.gen_sample(sample_size=100)
```

After confirming that the frequency of occurrence of 1 is around `theta=0.7`, we generate a sample and store it to variable `x`.
Let's also generate test data for later use.

``` python
x = gen_model.gen_sample(sample_size=20)
```python
x_test,y_test = gen_model.gen_sample(sample_size=100)
```

Next, we create an instance of a model for learning posterior distribution.
### Learning and Decision Making with `LearnModel`

Let's use `LearnModel` to learn a model from the data we just generated.

Of course, the data that can be used with `LearnModel` is not limited to data generated from `GenModel`. You can analyze various real-world data.

``` python
learn_model = bernoulli.LearnModel()
First, let's create an instance of the learning model. Here, we only specify the degree `c_degree = 2` as a constant of the model, but you can also specify hyperparameters for the prior distribution.

```python
learn_model = linearregression.LearnModel(
c_degree = 2, # degree
)
```

A method to visualize the posterior distribution also exists (the prior distribution is shown here because learning from data has not been performed yet).
A method for visualizing the posterior distribution of parameters is implemented in `LearnModel`. If you visualize the posterior distribution at this point, the prior distribution will be displayed since learning from data has not yet been performed.

``` python
```python
learn_model.visualize_posterior()
```

>![bernoulli_example2](./doc/images/README_ex_img2.png)
>Output:
>![png](./doc/images/README_LR2.png)

To update the posterior distribution through learning from data, we use the following method.

After learning from the data, we can see that the density of the posterior distribution is concentrated around the true parameter `theta=0.7`.
```python
learn_model.update_posterior(x,y)
```

``` python
learn_model.update_posterior(x)
If you visualize the updated posterior distribution, you can see that the density of the posterior distribution has moved closer to the true parameters used to generate `x` and `y`.

```python
learn_model.visualize_posterior()
```

>![bernoulli_example3](./doc/images/README_ex_img3.png)
>Output:
>![png](./doc/images/README_LR3.png)

To make decisions such as parameter estimation and prediction of new data based on the learned model, we proceed as follows.

For parameter estimation, we use the `estimate_params` method. By specifying the `loss` option as `squared`, you can obtain an estimate that minimizes the Bayes risk function based on the squared error loss function. The resulting value is the expected value of the posterior distribution.

```python
learn_model.estimate_params(loss="squared",dict_out=True)
```

>Output:
>{'theta_vec': array([0.99846525, 0.96263024]), 'tau': 6.9036925167513195}

If you specify the `loss` option as `abs`, you can obtain an estimate that minimizes the Bayes risk function based on the absolute error loss function. The resulting value is the median of the posterior distribution, which is why the estimated value of `tau` differs from the previous one.

```python
learn_model.estimate_params(loss="abs",dict_out=True)
```

>Output:
>{'theta_vec': array([0.99846525, 0.96263024]), 'tau': 6.858623148933392}

To predict new data, we first use the following method to calculate the predictive distribution for new explanatory variables.

```python
learn_model.calc_pred_dist(x_test)
```

Next, we use the `make_prediction` method to obtain predicted values. Similar to parameter estimation, you can specify the loss function using the `loss` option. (In this example, the same predicted values will be returned whether you assume squared error loss or absolute error loss since the posterior predictive distribution is symmetrical.)

```python
y_pred = learn_model.make_prediction(loss="squared")
```

Let's calculate the mean squared error.

In Bayesian decision theory, the optimal estimator under the Bayes criterion is derived as follows. First, we set a loss function, e.g., a squared-error loss, absolute-error loss, and 0-1 loss. Then, the Bayes risk function is defined by taking the expectation of the loss function with respect to the distribution of data and parameters. By minimizing the Bayes risk function, we obtain the optimal estimator under the Bayes criterion. For example, if we set a squared-error loss, the optimal estimator under the Bayes criterion of the parameter `theta` is the mean of the posterior distribution.
```python
mse = np.sum((y_test - y_pred)**2) / len(y_test)
print(f"MSE: {mse}")
```

>Output:
>MSE: 0.09020880284291456

Taking into account that the precision (inverse of variance) of the noise term used for data generation was 10, we can see that the predictions are achieved with sufficient accuracy.

In BayesML, the above calclulation is performed by the following methods.
### Sampling from Posterior Distribution Using `GenModel`

``` python
print(learn_model.estimate_params(loss='squared'))
print(learn_model.estimate_params(loss='abs'))
print(learn_model.estimate_params(loss='0-1'))
`GenModel` can also be used to sample parameters from the posterior distribution learned by `LearnModel`, or to sample new data from the posterior predictive distribution.

First, the hyperparameters of the posterior distribution learned by `LearnModel` can be obtained as follows.

```python
hn_params = learn_model.get_hn_params()
print(hn_params)
```

>0.7380952380952381
>0.7457656349087012
>0.7631578947368421
>Output:
>{'hn_mu_vec': array([0.99846525, 0.96263024]), 'hn_lambda_mat': array(\[[ 99.87503339, 5.96145913],[ 5.96145913, 101. ]]), 'hn_alpha': 51.0, 'hn_beta': 7.387351026461872}

By passing these to `GenModel`, you can sample parameters from the posterior distribution.

Different settings of a loss function yield different optimal estimates.
We create a new `GenModel` instance for parameter sampling and pass the hyperparameters through the `set_h_params` method. (In the example below, we are unpacking the values of the dictionary `hn_params` using `*` for `hn_params.values()`. This is a Python feature, not a BayesML functionality.)

```python
posterior_gen_model = linearregression.GenModel(
c_degree = 2, # degree
)
posterior_gen_model.set_h_params(*hn_params.values())
```

We use the `gen_params` method to generate parameters and the `get_params` method to retrieve the generated parameters. If you want to perform multiple samplings, please repeat the following in a `for` loop.

```python
posterior_gen_model.gen_params()
print(posterior_gen_model.get_params())
```

>Output:
>{'theta_vec': array([1.00935782, 0.93804208]), 'tau': 5.50775630793475}

To sample new data from the posterior predictive distribution, we generate data after sampling parameters. When we generated the synthetic data, we did not provide explanatory variables as arguments to `gen_sample` (see [here](#synthetic-data-generation-with-genmodel)), but you can also specify them explicitly as follows.

```python
posterior_gen_model.gen_params()
_,y_new = posterior_gen_model.gen_sample(x=x_test[:10])
print(f"y_new: {y_new}")
```

>Output:
>y_new: [-0.49532975 2.03473075 1.13758759 -0.46735058 -0.71902336 -0.09288005 0.89463227 2.07886012 2.81211771 1.60020635]

## Package list

Expand Down Expand Up @@ -148,22 +258,22 @@ When you use BayesML for your academic work, please provide the following biblio

Plain text

```
```text
Y. Nakahara, N. Ichijo, K. Shimada, Y. Iikubo,
S. Saito, K. Kazama, T. Matsushima, BayesML Developers, ``BayesML,''
Python package version 0.3.0,
Python package version 0.3.1,
[Online] https://github.com/bayesml/BayesML
```

BibTeX

``` bibtex
```bibtex
@misc{bayesml,
author = {Nakahara, Yuta and Ichijo, Naoki and Shimada, Koshi and
Iikubo, Yuji and Saito, Shota and Kazama, Koki and
Matsushima, Toshiyasu and {BayesML Developers}},
title = {{BayesML}},
howpublished = {Python package version 0.3.0},
howpublished = {Python package version 0.3.1},
note = {\url{https://github.com/bayesml/BayesML}},
year = {2025}
}
Expand Down
Loading