Skip to content

remove normalization of likelihood on cl-test #117

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 12, 2021

Conversation

bayonato89
Copy link
Contributor

No description provided.

# set seed for the likelihood test
if seed is not None:
numpy.random.seed(seed)
# def _poisson_likelihood_test(forecast_data, observed_data, num_simulations=1000, random_numbers=None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this comment somehow made it to the commit on accident.

Comment on lines 493 to 501
"""
Computes the likelihood-test from CSEP using an efficient simulation based approach.
Args:
forecast_data (numpy.ndarray): nd array where [:, -1] are the magnitude bins.
observed_data (numpy.ndarray): same format as observation.
num_simulations: default number of simulations to use for likelihood based simulations
seed: used for reproducibility of the prng
random_numbers (numpy.ndarray): can supply an explicit list of random numbers, primarily used for software testing
use_observed_counts (bool): if true, will simulate catalogs using the observed events, if false will draw from poisson distribution
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these lines should be indented

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indented, Bill.

@bayonato89
Copy link
Contributor Author

The lines are properly idented now :)

@wsavran
Copy link
Collaborator

wsavran commented Jun 17, 2021

fixes #78

- greatly simplified logic to choose number of simulated events and handing of simulated likelihood values
- fixed formatting issue of line breaks and tabs vs. spaces
- cleaned up other various formatting issues in poisson_evaluations.py
@wsavran
Copy link
Collaborator

wsavran commented Jun 17, 2021

the ci build/test failed because of some python tabs vs. spaces issue in the poisson_evaluations.py file. @bayonato89 i fetched the branch from your forked copied and i fixed the spacing issues in the commit. the build and tests run without error.

i believe the original implementation of this pull request didn't produce the correct normalization / conditioning for the cl test. i went through the _poisson_likelihood_test(...) function and greatly simplified the logic for choosing the number of events to simulate and whether we want to normalize the likelihoods. before i merge these changes, @bayonato89 @mjw98 and @khawajasim do you all mind reviewing the changes to make sure that we have the correct form implemented?

desired behavior for normalizing and conditioning

TEST NAME normalize_likelihood use_observed_counts
M-test True True
S-test True True
CL-test False True
L-test False False

implementation from previous commit

# used for conditional-likelihood, magnitude, and spatial tests to normalize the rate-component of the forecasts.
if use_observed_counts:
    scale = n_obs / n_fore
    if normalize_likelihood:
        expected_forecast_count = numpy.sum(forecast_data)
        log_bin_expectations = numpy.log(forecast_data.ravel())
    else:    
        expected_forecast_count = int(n_obs)
        log_bin_expectations = numpy.log(forecast_data.ravel() * scale)
else:
    expected_forecast_count = numpy.sum(forecast_data)
    log_bin_expectations = numpy.log(forecast_data.ravel())

# ... code omitted ...

for idx in range(num_simulations):
    if use_observed_counts:
        if normalize_likelihood:
            num_events_to_simulate = int(n_obs)
        else:    
            num_events_to_simulate = expected_forecast_count
    else:
        num_events_to_simulate = int(numpy.random.poisson(expected_forecast_count))

the error appears to be in the if normalize_likelihood clause, because if we are normalizing the likelihood we want to scale the rates to the observed rates. additionally, the m-test and s-test are called with normalize_likelihood=True and l-test and cl-test are called with normalize_likelihood=False. i propose a fix to this along with some simplification of the logic below.

simplified implementation

expected_forecast_count = numpy.sum(forecast_data)
log_bin_expectations = numpy.log(forecast_data.ravel())
# used for conditional-likelihood, magnitude, and spatial tests to normalize the rate-component of the forecasts
if use_observed_counts and normalize_likelihood:
    scale = n_obs / n_fore
    expected_forecast_count = int(n_obs)
    log_bin_expectations = numpy.log(forecast_data.ravel() * scale)

# ... code omitted ...

for idx in range(num_simulations):
    if use_observed_counts:
        num_events_to_simulate = int(n_obs)
    else:
        num_events_to_simulate = int(numpy.random.poisson(expected_forecast_count))

in this implementation, we scale the rates when normalizing the likelihood and simulating using the observed event counts, else the rates aren't scaled. we now either simulate using the number of observed events or numbers drawn from a poisson distribution. the simplified version matches our desired behavior.

@codecov-commenter
Copy link

Codecov Report

Merging #117 (88d171d) into master (415ebfc) will decrease coverage by 0.04%.
The diff coverage is 76.92%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #117      +/-   ##
==========================================
- Coverage   57.23%   57.19%   -0.05%     
==========================================
  Files          19       19              
  Lines        3136     3133       -3     
  Branches      452      452              
==========================================
- Hits         1795     1792       -3     
  Misses       1231     1231              
  Partials      110      110              
Impacted Files Coverage Δ
csep/core/poisson_evaluations.py 42.46% <76.92%> (-0.78%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 415ebfc...88d171d. Read the comment docs.

@wsavran wsavran merged commit 0605783 into SCECcode:master Jul 12, 2021
@wsavran
Copy link
Collaborator

wsavran commented Jul 12, 2021

@bayonato89 im merging this into master. nice job on finding this issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants