Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
elena-pascal authored Sep 22, 2022
1 parent 1b6b998 commit 5f6ba99
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,17 +27,17 @@ as implemented by `scipy`
the fit parameters are: FitParams(a=2.2195448757323435, loc=0.0)
the negative log likelihood is: 7.86098064513523

![image](fits.png)
![image](fits.png)

## Notes
The goodness of fit is determined here using the maximum likehood approach. The fit itself is done by [`scipy.stats.fit`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.fit.html#scipy.stats.fit) which optimises for the parameters that maximise the likelyhood estimate. Note, `scipy.stats.fit` is a new feature introduced in scipy 1.9.0 that allows seamless fitting both discrete and continuous distributions.

The goodness of fit metric shown here is the negative log of the [Probability Mass Function (PMF)](https://docs.scipy.org/doc/scipy/tutorial/stats/discrete.html?highlight=fit#probability-mass-function-pmf).

## Bounds
Fitting requires some rough information about parameters bounds or a guess value to start from. Here, I hard-coded very rough bounds that could very well fail for a variety of data edge cases. These bounds were tested on datasets sampled from uniform distributions.
Fitting requires some rough information about parameters bounds or a guess value to start from. Here, I hardcoded very rough bounds that could very well fail for a variety of data edge cases. These bounds were tested on datasets sampled from uniform distributions.

## Future work
The script could easily be extended to fit any other [scipy discrete distribution](https://docs.scipy.org/doc/scipy/tutorial/stats/discrete.html?highlight=maximum%20likelihood#discrete-distributions-in-scipy-stats). If additional distributions are required one would have to manually write the distribution function, and then manually find the parameters that maximise the PMF.

Ideally the choise of distributions would not be hardcoded and the user could choose based on looking at their data what distributions to try.
Ideally, the choice of distributions would not be hardcoded and the user could choose based on looking at their data what distributions to try.

0 comments on commit 5f6ba99

Please sign in to comment.