Skip to content

Conversation

vpratz
Copy link
Collaborator

@vpratz vpratz commented Aug 5, 2025

This PR adds a pairs_quantity plotting function that provides a pairs plot of an arbitrary quantity.

For a trained BasicWorkflow, one could use it like that:

def identity(x, *args, **kwargs):
    return x

test_data = simulator.sample(500)
samples = workflow.sample(num_samples=10, conditions=test_data)
values = bf.diagnostics.metrics.posterior_contraction(samples, test_data, aggregation=identity)
fig = bf.diagnostics.pairs_quantity(values["values"], test_data)

The current implementation only features structures as returned by the values field of posterior_contraction, but should be generalizable to other settings. A similar non-paired plot (containing only the diagonal) could be nice as well.

Below is an example of the plot:

image

@vpratz vpratz requested a review from paul-buerkner August 5, 2025 17:10
Provides the same functionality as the diagonal of `pairs_quantity`.
Copy link

codecov bot commented Aug 5, 2025

Codecov Report

❌ Patch coverage is 96.68874% with 5 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
bayesflow/diagnostics/plots/plot_quantity.py 92.06% 5 Missing ⚠️
Files with missing lines Coverage Δ
bayesflow/diagnostics/__init__.py 100.00% <ø> (ø)
...sflow/diagnostics/metrics/posterior_contraction.py 100.00% <100.00%> (ø)
bayesflow/diagnostics/plots/__init__.py 100.00% <100.00%> (ø)
bayesflow/diagnostics/plots/calibration_ecdf.py 72.72% <100.00%> (-4.89%) ⬇️
bayesflow/diagnostics/plots/pairs_quantity.py 100.00% <100.00%> (ø)
bayesflow/utils/dict_utils.py 69.81% <100.00%> (+6.30%) ⬆️
bayesflow/diagnostics/plots/plot_quantity.py 92.06% <92.06%> (ø)

@paul-buerkner
Copy link
Contributor

Thank you! That looks good to be as a basic implementation. The documentation doesn't mention the required dimensionality of values. Could this be clarified?

@elseml
Copy link
Member

elseml commented Aug 6, 2025

Nice! I'd add a "Contraction" label to the colorbar and the y-axis of the diagonal for consistency with other plots and also to make the equivalence between the colorbar and the right-hand side y-axes clear. The cleanest solution would imo be to place them at the colorbar + at the right side of only the last diagonal element's y-axis, since it prevents cluttering the main plot.

@paul-buerkner
Copy link
Contributor

We cannot provide a default metric name if the user decides, via the values argument, which metric is shown. But of course, we could add convenience funcationality allowing "values" to also be a string naming built-in supported metrics "contraction" or "z-score" for example.

@vpratz
Copy link
Collaborator Author

vpratz commented Aug 6, 2025

@elseml Thanks for the feedback! The colorbar can already be labeled by providing the label parameter, but I like the idea with the last diagonal element and will add it there as well.
@paul-buerkner Exactly. I think the main remaining task is to decide what values can be, to implement the different options and to document it properly... Currently I think the most natural would be:

  • a numpy array (optionally a VariableArray from dict_utils)
  • The output dict from a metric function, which has to be called without aggregating

We could think about renaming values to quantity and allowing to pass a function/function name there as well. But then we would also need the parameter estimates, and it might be confusing to have them both.
What do you think?

@paul-buerkner
Copy link
Contributor

paul-buerkner commented Aug 6, 2025

I have thought about the same of whether we should allow functions too. If we aim to also support test_quantities similar to the SBC diagnostics, we would have to provide estimates anyway. If quantity (or however called) is an an array or dict of array, ignore estimates and test_quantites. If quantity is a function then require estimates and support test_quantities. We can of course choose to support test_quantities only later, since it is not high priority. However, we should think about it already now because quantity and test_quantities will have clashing terminology. Perhaps values is not too bad after all even if it can optionally be a function.

@vpratz vpratz self-assigned this Aug 10, 2025
@vpratz
Copy link
Collaborator Author

vpratz commented Aug 10, 2025

@paul-buerkner Thanks a lot for the input. The many different options make it a bit convoluted, but I have now implemented the following:

  • values as array
  • values as dict from a metric function with aggregation=None
  • values as function that has to return one of the above

If have also included the test_quantities, here as well we have two cases:

  1. the quantity is scalar, so we do not require estimates as we do not have to calculate it per quantity, they are only different axes
  2. the quantity is a vector with one entry per variable, we require estimates and a function to calculate the values

Here we could skip 1. to simplify the code, this would then correspond to the description in your comment, but I think having the more general implementation does not hurt. I have created helper functions to avoid code duplication, I'm not sure yet where we want to put them.

Let me know what you think.

Example
fig = bf.diagnostics.pairs_quantity(
    values=bf.diagnostics.posterior_contraction,
    targets=test_data,
    estimates=samples,
    test_quantities=test_quantities,
    variable_names=[r"$\theta_1$", r"$\theta_2$"],
)
image

@paul-buerkner
Copy link
Contributor

This looks very cool!

Two quick questions:

  • Is aggregation = None equal to aggregation = identity (with defined identity function). I assume we would definitely want the None feature so users don't need to define identy first.
  • I don't quite get the different options concerning test_quantities. Why do we have to distinquish the two cases here but not(?) in the SBC plots?

@vpratz
Copy link
Collaborator Author

vpratz commented Aug 12, 2025

Thanks for taking a look. Regarding your questions:

  • Yes, aggregation=None should behave like an identity, i.e., not aggregate over datasets but instead return the individual values, as we require them for plotting. If I read it correctly, for now this only makes sense for posterior_contraction. It would be viable for example for z scores as well, but as far as I can tell we have not implemented that as a metric yet.
  • I just discussed with @han-ol, and it probably makes sense to remove option 1. The idea was basically, if we have a value that is a property of the dataset, e.g. its log-prob, we do not need to recalculate it even if test_quantities is present, which might be nice if the quantity is costly or cannot easily be recalculated. test_quantities would then just supply new axes to plot against. This can become muddy quite quickly though (imagine the user wants to calculate the mean posterior contraction over all variables, which would be a scalar value that should change), and it is an edge case that probably no one will ever use, so for the sake of maintainability and clarity I would remove it. Do you agree?

@paul-buerkner
Copy link
Contributor

  • Okay. I think we should add a z-score metric too but of course this will be a different issue.
  • Agreed.

@vpratz
Copy link
Collaborator Author

vpratz commented Aug 14, 2025

Thanks a lot for your input. I think the functions are now ready for a final review, @paul-buerkner.
I added a few tests and updated a few details to incorporate the changes discussed above.

@vpratz vpratz changed the title [WIP] Add pairs plot for arbitrary quantities Add pairs plot for arbitrary quantities Aug 18, 2025
Copy link
Contributor

@paul-buerkner paul-buerkner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thank you! Just found a typo in the doc, I think. After fixing it, feel free to merge :-)

@vpratz vpratz merged commit c1407df into dev Aug 19, 2025
9 checks passed
@vpratz vpratz deleted the feat-pairs-plot branch August 19, 2025 09:16
@stefanradev93 stefanradev93 mentioned this pull request Aug 26, 2025
stefanradev93 added a commit that referenced this pull request Aug 27, 2025
* fix trainable parameters in distributions (#520)

* Improve numerical precision in MVNScore.log_prob

* add log_gamma diagnostic (#522)

* add log_gamma diagnostic

* add missing export for log_gamma

* add missing export for gamma_null_distribution, gamma_discrepancy

* fix broken unit tests

* rename log_gamma module to sbc

* add test_log_gamma unit test

* add return information to log_gamma doc string

* fix typo in docstring, use fixed-length np array to collect log_gammas instead of appending to an empty list

* Breaking changes: Fix bugs regarding counts in standardization layer (#525)

* standardization: add test for multi-input values (failing)

This test reveals to bugs in the standarization layer:

- count is updated multiple times
- batch_count is too small, as the sizes from reduce_axes have to be
  multiplied

* breaking: fix bugs regarding count in standardization layer

Fixes #524

This fixes the two bugs described in c4cc133:

- count was accidentally updated, leading to wrong values
- count was calculated wrongly, as only the batch size was used. Correct
  is the product of all reduce dimensions. This lead to wrong standard
  deviations

While the batch dimension is the same for all inputs, the size of the
second dimension might vary. For this reason, we need to introduce an
input-specific `count` variable. This breaks serialization.

* fix assert statement in test

* rename log_gamma to calibration_log_gamma (#527)

* simple fix

* Hotfix: numercial stability of non-log-stabilized sinkhorn plan (#531)

* fix numerical stability issues in sinkhorn plan

* improve test suite

* fix ultra-strict convergence criterion in log_sinkhorn_plan

* update dependencies

* add comment about convergence check

* update docsting to reflect fixes

* sinkhorn_plan now returns a transport plan with uniform marginal distributions

* add unit test for sinkhorn_plan

* fix sinkhorn function by sampling from the logits of the transpose of the plan, instead of the plan directly

* sinkhorn(x1, x2) now samples from log(plan) to receive assignments such that x2[assignments] matches x1

* re-enable test_assignment_is_optimal() for method='sinkhorn'

* log_sinkhorn now correctly uses log_plan instead of keras.ops.exp(log_plan), log_sinkhorn_plan returns logits of the transport plan

* add unit tests for log_sinkhorn_plan

* fix faulty indexing with tensor for tensorflow backend

* re-add numItermax for ot pot test

---------

Co-authored-by: Daniel Habermann <133031176+daniel-habermann@users.noreply.github.com>

* isinstance sequence

* Pass correct training stage in compute_metrics (#534)

* Pass correct training stage in CouplingFlow.compute_metrics

* Pass correct training stage in CIF and PointInferenceNetwork

* Custom test quantity support for calibration_ecdf (#528)

* Custom test quantity support for calibration_ecdf

* rename variable [no ci]

* Consistent defaults for variable_keys/names in calibration_ecdf with test quantiles

* Tests for calibration_ecdf with test_quantities

* Remove redundant and simplify comments

* Fix docstrings and typehints

---------

Co-authored-by: stefanradev93 <stefan.radev93@gmail.com>

* Log gamma test fix (#535)

* fix test_calibration_log_gamma_end_to_end unit test failing too often than expected

* set alpha to 0.1% in binom.ppf

* fix typo in comment

* Stateless adapters (#536)

* Remove stateful adapter features

* Fix tests

* Fix typo

* Remove nnpe from adapter

* Bring back notes [skip ci]

* Remove unncessary restriction to kwargs only [skip ci]

* Remove old super call [skip ci]

* Robustify type [skip ci]

* remove standardize from multimodal sim notebook [no ci]

* add draft module docstring to augmentations module [no ci]

Feel free to modify.

* adapt and run neurocognitive modeling notebook [no ci]

* adapt cCM playground notebook [no ci]

* adapt signature of Adapter.standardize

* add parameters missed in previous commit

* Minor NNPE polishing

* remove stage in docstring from OnlineDataset

---------

Co-authored-by: Lasse Elsemüller <60779710+elseml@users.noreply.github.com>
Co-authored-by: Valentin Pratz <git@valentinpratz.de>

* Fix training strategies in BasicWorkflow

* move multimodal data notebook to regular examples [no ci]

* make pip install call on homepage more verbose [no ci]

* remove deprecated summaries function

The function was renamed to summarize in v2.0.4.

* detail subsampling behavior docs for SIR simulator [no ci]

fixes #518

* move DiffusionModel from experimental to networks

Stabilizes the DiffusionModel class. A deprecation warning for the
DiffusionModel class in the experimental module was added.

* Add citation for resnet (#537) [no ci]

* added citation for resnet

* minor formatting

---------

Co-authored-by: Valentin Pratz <git@valentinpratz.de>

* Bump up version [skip ci]

* Allow separate inputs to subnets for continuous models (#521)

Introduces easy access to the different inputs x, t and conditions, to allow for specialized processing of each input, which can be beneficial for more advanced use cases.

---------

Co-authored-by: Valentin Pratz <git@valentinpratz.de>

* Auto-select backend (#543)

* add automatic backend detection and selection

* Fix typo

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Add priority ordering of backends

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: stefanradev93 <stefan.radev93@gmail.com>

* Breaking: parameterize MVNormalScore by inverse cholesky factor to improve stability (#545)

* breaking: parameterize MVNormalScore by inverse cholesky factor

The log_prob can be completely calculated using the inverse cholesky
factor L^{-1}. Using this also stabilizes the initial loss, and speeds
up computation.

This commit also contains two optimizations.
Moving the computation of the precision matrix into the einsum, and
using the sum of the logs instead of the log of a product.

As the parameterization changes, this is a breaking change.

* Add right_side_scale_inverse and test [no ci]

The transformation necessary to undo standardization for a Cholesky
factor of the precision matrix is x_ij = x_ij' / sigma_j, which is now
implemented by a right_side_scale_inverse transformation_type.

* Stop skipping MVN tests

* Remove stray keyword argument in fill_triangular_matrix

* Rename cov_chol_inv to precision_chol and update docstrings [no ci]

* rename precision_chol to precision_cholesky_factor

to improve clarity.

* rename cov_chol to covariance_cholesky_factor

* remove check_approximator_multivariate_normal_score function [no ci]

---------

Co-authored-by: han-ol <g@hans.olischlaeger.com>

* fix unconditional sampling in ContinuousApproximator (#548)

- batch shape was calculated from inference_conditions even if they are
  known to be None
- add approximator test for unconditional setting

* Test quantities Linear Regression Starter notebook (#544)

* Implementation of log-lik test quantity for SBC in starter notebook

* update data-dependent test-quantities example

* Small typo fixes in linear regression notebook

---------

Co-authored-by: Paul-Christian Bürkner <paul.buerkner@gmail.com>

* fix: optimizer was not used in workflow with multiple fits

For the optimizer to be used, the approximator.compile function has to
be called. This was not the case. I adapted the `setup_optimizer`
function to match the description in its docstring, and made the
compilation conditional on its output. The output indicates if a new
optimizer was configured.

* fix: remove extra deserialize call for SummaryNetwork

The extra call leads to the DTypePolicy to be deserialized. This is then
passed as a class, and cannot be handled by autoconf, leading to the
error discussed in
#549

* Compatibility: deserialize when get_config was overridden

* unify log_prob signature in PointApproximator [no ci]

ContinuousApproximator and BasicWorkflow allow passing the data
positionally, we can allow the same for the PointApproximator.

* Tutorial on spatial data with Gaussian Random Fields (#540) [no ci]

The tutorial uses the experimental ResNet class to build a summary
network for spatial data.

* Support non-array data in test_quantity calibration ecdf [no ci]

Simulator outputs are allowed to be of type int or float, and
consequently have no batch dimension. This needs to be considered
in the broadcasting of inference_conditions for data based SBC
test quantities.

"examples/Linear_Regression_Starter.ipynb" contains an example where this is
necessary, where N is a non-batchable integer.

* import calibration_log_gamma in diagnostics namespace [no ci]

* Add wrapper around scipy.integrate.solve_ivp for integration

* minor fixes and improvements to the pairs plot functions

- pass target color to legend
- do not use common norm, so that prior stays visible in kdeplots
- do not share y on the diagonal, so that all marginal distributions
  stay visible, even if one is very peaked

* fix: layers were not deserialized for Sequential and Residual

As layers were passed with the `*layers` syntax, they could not be
passed as keyword arguments. In `from_config`, however, this was
attempted, leading to the layers to be ignored during reserialization.
This commit fixes this by taking the layers from `kwargs` if they are
passed as a keyword argument.

* add serialization tests for Sequential and Residual

* Fix: ensure checkpoint filepath exists before training

Previously choosing a non-existant directory as checkpoint_filepath
would lead to silently not saving at all.

* Revert 954c16c since it was unnecessary

The alledged issue didn't exist and checkpoint folders are created by
the keras callback automatically already.

I misread tests on this and didn't catch that the problem I was seeing
was caused by a different part of my pipeline.

* improvements to diagnostic plots (#556)

* improvements to diagnostics plots

add markersize parameter, add tests, support dataset_id for
pairs_samples

Fixes #554.

* simplify test_calibration_ecdf_from_quantiles

* Add pairs plot for arbitrary quantities (#550)

Add pairs_quantity and plot_quantity functions that allow plotting of quantities that can be calculated for each individual dataset. Currently, for the provided metrics this is only useful for posterior contraction, but could be useful for posterior z-score and other quantities as well.

* minor fix in diffusion edm schedule (#560)

* minor fix in diffusion edm schedule

* DeepSet: Adapt output dimension of invariant module inside the equivariant module (#557) (#561)

* adapt output dim of invariant module in equivariant module

See #557. The DeepSet showed bad performance and was not able to learn
diverse summary statistics. Reducing the dimension of the output of the
invariant module inside the equivariant module improves this, probably
because the invidividual information of each set member gains importance
compared to the shared information provided by the invariant module.

There might be better settings for this, so we might update the default
later on. However, this is already an improvement over the previous
setting.

* DeepSet: adapt docstring to reflect code

* pairs_postorior: inconsistent type hint fix (#562)

* allow exploding variance type in EDM schedule

* fix type hint

* Bump up version [skip ci]

* Fix instructions for backend spec [skip ci]

* Add New Flow Matching Schedules (#565)

* add fm schedule

* add fm schedule

* add comments

* expose time_power_law_alpha

* Improve doc [skip ci]

---------

Co-authored-by: stefanradev93 <stefan.radev93@gmail.com>

* change default integration method to rk45

for DiffusionModel and FlowMatching. Euler shows significant deviations
when computing the log-prob, which risks misleading users regarding the
performance of the networks.

rk45 is slower, but the problem is heavily reduced with this method.

* fix nan to num inverse

* fix setting markersize in lotka volterra notebook

* fix: actually set KERAS_BACKEND to chosen backend

Add warning if KERAS_BACKEND and actually loaded backend do not match.
This can happen if keras is imported before BayesFlow.

* Fix warning msg

---------

Co-authored-by: Valentin Pratz <112951103+vpratz@users.noreply.github.com>
Co-authored-by: han-ol <g@hans.olischlaeger.com>
Co-authored-by: Daniel Habermann <133031176+daniel-habermann@users.noreply.github.com>
Co-authored-by: Valentin Pratz <git@valentinpratz.de>
Co-authored-by: arrjon <jonas.arruda@uni-bonn.de>
Co-authored-by: Lars <lars@kuehmichel.de>
Co-authored-by: Hans Olischläger <106988117+han-ol@users.noreply.github.com>
Co-authored-by: Lasse Elsemüller <60779710+elseml@users.noreply.github.com>
Co-authored-by: Leona Odole <88601208+eodole@users.noreply.github.com>
Co-authored-by: Jonas Arruda <69197639+arrjon@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Paul-Christian Bürkner <paul.buerkner@gmail.com>
Co-authored-by: The-Gia Leo Nguyen <Leo.Nguyen@gmx.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants