Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable histogram with variable bin lengths #216

Closed
mansenfranzen opened this issue Jul 17, 2019 · 2 comments
Closed

Enable histogram with variable bin lengths #216

mansenfranzen opened this issue Jul 17, 2019 · 2 comments
Labels
feature request 💬 Requests for new features

Comments

@mansenfranzen
Copy link

mansenfranzen commented Jul 17, 2019

First of all, thanks for the work on this great package. We've just discovered it recently and it will surely provide some benefit for our daily work in our team.

Painpoint
Having numerical columns with a dense area of values and some extreme outliers, the default histogram does not provide any useful insight because there is basically only one bin with almost all values (dense area) and the other bins are almost empty (outliers).

Solution
Instead of fixed bin widths, it would be useful to use algorithms with variable bin widths to account for unevenly distributed densities. AstroML already has two implementations for this (see here). The algorithms are not to complicated, so that they could be vendored without introducing a new dependency (see here) @jakevdp.

@mansenfranzen mansenfranzen added the feature request 💬 Requests for new features label Jul 17, 2019
@sbrugman
Copy link
Collaborator

Great suggestion. I'll look into it.

sbrugman added a commit that referenced this issue Jul 17, 2019
- Added Variable bin sizing via Bayesian Boxing (feature request [#216])
- PyCharm integration, console attempts to detect file type.
- Fixed bug [#215].
- Updated the `missingno` package to 0.4.2, fixing the font size in the `bar` diagram.
- Various optimizations
@jakevdp
Copy link

jakevdp commented Jul 17, 2019

The Astropy implementations are a better reference: http://docs.astropy.org/en/stable/visualization/histogram.html

chanedwin pushed a commit to chanedwin/pandas-profiling that referenced this issue Oct 11, 2020
- Added Variable bin sizing via Bayesian Boxing (feature request [ydataai#216])
- PyCharm integration, console attempts to detect file type.
- Fixed bug [ydataai#215].
- Updated the `missingno` package to 0.4.2, fixing the font size in the `bar` diagram.
- Various optimizations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request 💬 Requests for new features
Projects
None yet
Development

No branches or pull requests

3 participants