Skip to content

Conversation

@Smit-create
Copy link
Member

@Smit-create Smit-create commented Mar 1, 2020

Added sampling methods of continuous random variables

References to other Issues or PRs

Closes #17057
Related to #19061

Brief description of what is fixed or changed

Other comments

Release Notes

  • stats
    • Added sampling methods for continuous variables
    • Added library option in sample
    • sample returns an iterator object since version 1.7

@sympy-bot
Copy link

sympy-bot commented Mar 1, 2020

Hi, I am the SymPy bot (v158). I'm here to help you write a release notes entry. Please read the guide on how to write release notes.

Your release notes are in good order.

Here is what the release notes will look like:

This will be added to https://github.com/sympy/sympy/wiki/Release-Notes-for-1.7.

Note: This comment will be updated with the latest check if you edit the pull request. You need to reload the page to see it.

Click here to see the pull request description that was parsed.

<!-- Your title above should be a short description of what
was changed. Do not include the issue number in the title. -->
Added sampling methods of continuous random variables
#### References to other Issues or PRs
<!-- If this pull request fixes an issue, write "Fixes #NNNN" in that exact
format, e.g. "Fixes #1234" (see
https://tinyurl.com/auto-closing for more information). Also, please
write a comment on that issue linking back to this pull request once it is
open. -->
Closes #17057 
Related to #19061
#### Brief description of what is fixed or changed


#### Other comments


#### Release Notes

<!-- Write the release notes for this release below. See
https://github.com/sympy/sympy/wiki/Writing-Release-Notes for more information
on how to write release notes. The bot will check your release notes
automatically to see if they are formatted correctly. -->

<!-- BEGIN RELEASE NOTES -->
* stats
   * Added sampling methods for continuous variables
   * Added `library` option in `sample`
   *  `sample` returns an iterator object since version 1.7
<!-- END RELEASE NOTES -->

Update

The release notes on the wiki have been updated.

@sylee957 sylee957 added the stats label Mar 3, 2020
@Smit-create
Copy link
Member Author

Please review.

@Smit-create Smit-create changed the title [WIP] Added sampling methods of crv_types Added sampling methods of crv_types Mar 4, 2020
@Smit-create
Copy link
Member Author

This is ready for review. @czgdp1807

@czgdp1807
Copy link
Member

The diff coverage is quite low. Can you add more tests or modify current tests to or merge master to increase the coverage?

@czgdp1807 czgdp1807 mentioned this pull request Mar 4, 2020
3 tasks
@Smit-create
Copy link
Member Author

Screenshot from 2020-03-04 16-21-20
Code coverage now increases to 100%

@Smit-create
Copy link
Member Author

@czgdp1807 Does it look good to go?

@Smit-create Smit-create requested a review from czgdp1807 March 5, 2020 11:06
@czgdp1807
Copy link
Member

LGTM. Will merge if no one raises objections.

@oscarbenjamin
Copy link
Collaborator

The code here is very repetitive. It would be better to make a data structure that stores the different functions like in lambdify and the code printers.

Would it make more sense to handle this as part of lambdify and the code printers?

@Smit-create
Copy link
Member Author

Would it make more sense to handle this as part of lambdify and the code printers?

I will look into this suggestion.

@oscarbenjamin
Copy link
Collaborator

It probably makes more sense to group the code together in one class for each of the libraries and have a mapping from sympy types to the other library types rather than having each of the sympy classes repeat the boiler plate for calling into the external library. That is what I mean by saying that it is repetitive. If we have m other libraries and k sympy distribution classes then we will end up having to add m*k methods for connecting them all. Since m is smaller than k it makes more sense to have m classes for the output each of which can have a mapping of size k that efficiently describes how to connect between them.

@Smit-create
Copy link
Member Author

Thanks! @oscarbenjamin, I get your point. I will think of some way to work upon that.

@Smit-create
Copy link
Member Author

I have designed another way. I will explain with an example of sample_scipy.

We will create 5 dictionaries, one which will import the corresponding random variable from scipy, other four will map scipy arguments with sympy arguments. Generally, scipy has 4 main arguments, i.e, a , b, loc, scale. The other four dictionaries will map these arguments using attributes of the corresponding class of random variable.
Then, finally, return the sample using mapped random variable from first dictionary and arguments from other 4 dictionaries.

@Smit-create
Copy link
Member Author

I have designed a function that works fine with scipy:


def _sample_scipy(dist, size):

    dist_list =  ['BetaDistribution', 'BetaPrimeDistribution',
    'CauchyDistribution', 'ChiDistribution', 'ChiSquaredDistribution',
    'ExponentialDistribution', 'GammaDistribution', 'GammaInverseDistribution',
    'LogNormalDistribution', 'NormalDistribution', 'GaussianInverseDistribution',
    'ParetoDistribution', 'UniformDistribution']

    if dist.__class__.__name__ not in dist_list:
        return None
    from scipy.stats import (beta, betaprime, cauchy, chi, chi2, expon, gamma,
                             invgamma, lognorm, norm, invgauss, pareto, uniform)

    scipy_rv_map = {
            'BetaDistribution': lambda dist, size: beta.rvs(a=float(dist.a), b=float(dist.b),
                                    size=size),
            'BetaPrimeDistribution':lambda dist, size: betaprime.rvs(a=float(dist.alpha),
                                    b=float(dist.beta), size=size),
            'CauchyDistribution': lambda dist, size: cauchy.rvs(loc=float(dist.x0),
                                    scale=float(dist.gamma), size=size),
            'ChiDistribution': lambda dist, size: chi.rvs(df=float(dist.k), size=size),
            'ChiSquaredDistribution': lambda dist, size: chi2.rvs(df=float(dist.k), size=size),
            'ExponentialDistribution': lambda dist, size: expon.rvs(loc=0, scale=1/float(dist.rate),
                                    size=size),
            'GammaDistribution': lambda dist, size: gamma.rvs(a=float(dist.k), loc=0,
                                    scale=float(dist.theta), size=size),
            'GammaInverseDistribution': lambda dist, size: invgamma.rvs(a=float(dist.a), loc=0,
                                    scale=float(dist.b), size=size),
            'LogNormalDistribution': lambda dist, size: lognorm.rvs(s=std, loc=0,
                                    scale=exp(float(dist.mean), size=size)),
            'NormalDistribution': lambda dist, size: norm.rvs(float(dist.mean), float(dist.std), size=size),
            'GaussianInverseDistribution': lambda dist, size: invgauss.rvs(
                mu=float(dist.mean)/float(dist.shape), scale=float(dist.shape), size=size),
            'ParetoDistribution': lambda dist, size: pareto.rvs(b=float(dist.alpha),
                                    scale=float(dist.xm), size=size),
            'UniformDistribution': lambda dist, size: uniform.rvs(loc=float(dist.left),
                            scale=float(dist.right)-float(dist.left), size=size),
        }
    return scipy_rv_map[dist.__class__.__name__](dist, size)

This will remove all _sample_scipy methods from individual classes, and just a single function will be used.
@oscarbenjamin @czgdp1807 Does this look good to commit for all libraries?

@oscarbenjamin
Copy link
Collaborator

That looks better. There's probably a way to organise this nicely into one class for each external library and perhaps factor some code out to a base class.

@Smit-create
Copy link
Member Author

Can anyone please have a look at the failing test https://travis-ci.org/github/sympy/sympy/jobs/677325425?

@Smit-create
Copy link
Member Author

Please review this.

@czgdp1807
Copy link
Member

LGTM. Ready for merge.

@Smit-create
Copy link
Member Author

Smit-create commented May 2, 2020

@czgdp1807 I think we can merge this if no-one has objection?

@czgdp1807
Copy link
Member

Let's merge this after, 1.6 branch is created. Some public API change is made in this PR, and various changes made in the coming months to the sampling APIs, so 1.7 will be more appropriate to have this change.

@czgdp1807 czgdp1807 added the GSoC label May 5, 2020
@czgdp1807
Copy link
Member

Finally, it's good to go in. Will merge it by tonight.

@czgdp1807 czgdp1807 merged commit 6c4df17 into sympy:master May 7, 2020
return FinitePSpace(domain, density)

def sample(self, size=()):
def sample(self, size=(1,), library='scipy'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of substituting () with (1,) or 1 ? The tuple notation should represent the dimensions of an N-dimensional array you want to be returned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants