Skip to content

Conversation

carloslihu
Copy link

  • Python code is now formatted with black and isort, and has been refactored according to PEP 8 style guides.
  • Python code partially commented with google docstring format.
  • C++ code partially commented with doxygen docstring format.
  • Scott's and Normal Reference Rule's bandwidth calculation have been reordered and commented.
  • ArcOperatorSet::update_incoming_arcs_scores formulas have been reordered and commented.

carloslihu and others added 23 commits March 8, 2025 11:38
@davenza
Copy link
Owner

davenza commented Aug 10, 2025

Thank you for the PR @carloslihu . I am reviewing the PR. I don't understand the bug in hillclimbing.hpp (lines 152-160). Can you provide more detail?

I see some commented tests in tests/learning/algorithms/hillclimbing_test.py (I think they are related), but I do not know where is the "generate_normal_data_dep(1000)" (line 9) is defined.

@carloslihu
Copy link
Author

carloslihu commented Aug 11, 2025

Hello David, thank you for reviewing my changes, they are a lot:

  • The bug I'm talking about is about performing HC with cross-validation when you have a variable with low variability (but not 0 variance). For instance, a variable A with training values [0, 0, ..., 0, 1].
    When performing the validated k-fold cross-validation, it finds problems fitting the distribution when the "1" value is not found in the training fold.
    It is not a bug, more like a problem I found with some datasets. I do not have a clear solution for the algorithm. You may ignore the comment; I will remove it.

  • I still have pending to review that the python tests are still working and cleaning comments.
    It's a minor change, but I have planned to finish this month.
    Please wait for them, thanks

@carloslihu
Copy link
Author

Another issue is that I have retrocompatibility with the tests until the tag v0.5.2
However, afterwards I started using for KDE by default the diagonal covariance as the bandwidth matrix, instead of the full covariance (as suggested here)

This KDE learning faster and removes the Singular Covariance Matrix error.
Although it may return less accurate Bandwidth Matrices, in my experience, the differences are minimal.

If you like this change, I could adapt the tests so that they pass the results given this new method.

@davenza
Copy link
Owner

davenza commented Aug 11, 2025

Thank you @carloslihu. I will try to create a test with a case of almost 0 variance to try to control it. First, I wanna know where it is raising the error.

About the diagonal matrices: yes, many authors recommend using diagonal matrices. Less parameters to estimate and usually small degradation of performance. That is why I defined ProductKDE (https://pybnesian.readthedocs.io/en/latest/api/factors.html#pybnesian.ProductKDE).

I know that ProductKDE is not exactly the same as KDE with diagonal bandwidth matrix, but:

  • Do you remember if the methods to estimate the bandwidth are equal (in ProductKDE and diagonal KDE)?
  • If previous question is no. Do you think we need an specific implementation for the estimation of bandwidth in diagonal KDEs?

@carloslihu
Copy link
Author

carloslihu commented Aug 12, 2025

Hello David,
After analyzing the definition of PyBNesian ProductKDE and the definition of diagonal bandwidth matrix KDE found in the literature.
I think they represent the same concept with different notation.

For my implementation I wanted to use ProductKDE for Hill Climbing, but by lack of skill and time, I wasn't able to do so.
Therefore, my Hack Fix was to manually remove the non-diagonal elements from the covariance matrix in
NormalReferenceRule.hpp and ScottsBandwidth.hpp

This is just a temporary fix, I think that the best solution would be to allow the option of allowing the selection of your ProductKDE implementation for the Hill Climbing algorithm, but my lack of knowledge wouldn't let me.

@carloslihu
Copy link
Author

Hello David,
For the moment I have rebased my master branch to tag v0.5.2 (before adapting my diagonal bandwidth fix)
This way you may merge my latest changes without altering your functionality.

I will continue using the diagonal bandwidth in another personal branch (feature/diagonal-bandwidth) for the moment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants