Skip to content

Conversation

@JoseKling
Copy link
Owner

@JoseKling JoseKling commented Jul 21, 2025

Statistics

Implemented two versions of the Kolmogorov-Smirnov distance as a test statistic.

  1. Time-rescaled event times against a uniform distribution
  2. Time-rescaled interevent times against a unit exponential distribution

Tests

Bootstrap and non-bootstrap based tests

  1. Bootstrap

    1. Estimate process
    2. Calculate test statistic
    3. Simulate data with estimated process
    4. Estimate processes from each simulation
    5. Calculate test statistic for each simulation
    6. Compare test statistic from data and from simulations to get the p value
  2. Non-bootstrap
    Same as above, but skip step 4, so use estimation in step 1 to calculate test statistic

Issues

  • Types and methods names
    Perhaps the names can be improved. If you have any suggestions...

  • Type piracy
    Tried implementing a fit method for the Dirac distribution (there isn't one in Distributions.jl), so the fit method works for the UnivariatePoissonProcess, but tests complain about type piracy. Decided to implement a fit method specifically for unmarked Poisson processes instead.

- Bootstrap and non-bootstrap based tests
- KSDistance statistic for hypothesis testing
    - Time re-scaled events againt uniform
    - Time re-scaled inter events against exponential
- Tests take types instead of instances of point processes
- `NoBootstrapTest` may take an instances

TODO:
- Add Tests
- Statistics now work for general types
- Added a `AbstractRNG` to the tests to control simulations
- `BootstrapTest` and `NoBootstrapTest` are subtypes of `HypothesisTest`
    - Fields have fixed types, as done in `HypothesisTests.jl`
- Fixed some mistakes

TODO: Add tests
@codecov
Copy link

codecov bot commented Aug 4, 2025

Codecov Report

❌ Patch coverage is 98.79518% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/HypothesisTests/point_process_tests.jl 90.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@JoseKling JoseKling mentioned this pull request Oct 28, 2025
ISSUES:
- Type piracy: needed to implement `fit` for the `Dirac` distribution
- Names: type and method names could maybe be improved
- PPTest: Fields in `BootstrapTest` and `NoBootstrapTest` make sense
          only for simulation based tests. Anything more general?
@JoseKling JoseKling marked this pull request as ready for review November 17, 2025 10:54
`Distributions.jl` does not provide a `fit` method, so a separate `fit`
method for unmarked Poisson processes was added.
@JoseKling JoseKling requested a review from rsenne December 4, 2025 10:56
on the event times according to the selected distribution.
=#
function transform(::Type{<:Uniform}, pp::AbstractPointProcess, h)
(length(h.times) < 1) && return 1.0 # No events ⇒ maximum distance
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line could be a problem. The first is that this creates a type instability. In the normal branching logic we return a tuple (inter_transf, Uniform(1)) but in this other logic we return just 1.0. Would it make more sense to throw an error? 1.0 doesn'thave a clear statistical meaning to me as we can't even really do any type of GOF here? E.g., if i witnessed a single observation how can i really say whether or not it fits my model without strong prior info? We could also throw a warning and do something like:

if length(h.times) < 2
    @warn "No inter-event times: length(h.times) = $(length(h.times))"
    return Float64[], Uniform(1)
end

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a reminiscent of some old code, it does not make sense.
In principle, we could just send the result of the time_transform as it is, no matter if it is empty or not.
Now, perhaps performing a test on an empty history would not make sense, so it might make sense to throw an error if the argument h in NoBootstrapTest or BootstrapTest is an empty history. What would make sense is if one of the simulated histories inside the loop are empty.
The thing is how to define the empirical cumulative distribution when the number of samples is 0. To me, it makes sense to define it to be $\hat{F}(t) = 0$ (which is not really a probability distribution, but this is a degenerate case), therefore, the KS distance between this and any other non-degenerate distribution (like Uniform(0, 0)) would be 1.

end

function transform(::Type{<:Exponential}, pp::AbstractPointProcess, h)
(length(h.times) < 2) && return 1.0 # If `h` has only 2 elements, than there are no interevent times
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as in the uniform case.

- Fixed tests for empty histories
- Added `@testset`s for clarity
@JoseKling
Copy link
Owner Author

JoseKling commented Dec 11, 2025

I think everything was resolved.
The only issue is that I get error in the tests here, but not locally. I will have a look at it later.

Two remarks:

Not sure about all the names, especially NoBootstrapTest and PPTest. Of course, this is just a small detail, but if you have something better...

And I kept the value of the test statistic equal to 1 when the simulated event history is empty, but I raise an error when history to be tested is empty. The exception is when the NoBootstrapTest is called with an empty history and a instantiated process (as opposed to called with a type).
For example, if the history is empty and the process is a Poisson(10). then the p value would return the percentage of simulations that generated an empty history, which is actually how probable it is for the process to produce an empty history.

- `pp::Union{AbstractPointProcess, Type{<:AbstractPointProcess}}`: the null hypothesis model family
- `h::History`: the observed event history
- `n_sims::Int=1000`: number of simulations to perform for the test
- `rng::AbstractRNG=default_rng()test statistics from simulated data`: Random number generator
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `rng::AbstractRNG=default_rng()test statistics from simulated data`: Random number generator
- `rng::AbstractRNG=default_rng()`: Random number generator

@rsenne
Copy link
Collaborator

rsenne commented Dec 11, 2025

My thoughts on naming:

  1. PPTest i think should be PointProcessTest its more verbose but i think Julia style reccs generally prefer verbosity to non-canonical acronyms.
  2. NoBootstrapTest would better be described as MonteCarloTest i think its better to describe what it does and not what it doesn't do.
  3. BootstrapTest is good to me. If you want to be more specific ParametricBootstrapTest is an alternative.

Comment on lines 25 to 27
function StatsAPI.pvalue(bt::BootstrapTest)
(count(>=(bt.stat), bt.sim_stats) + 1) / (bt.n_sims + 1)
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function StatsAPI.pvalue(bt::BootstrapTest)
(count(>=(bt.stat), bt.sim_stats) + 1) / (bt.n_sims + 1)
end
function StatsAPI.pvalue(bt::BootstrapTest)
return (count(>=(bt.stat), bt.sim_stats) + 1) / (bt.n_sims + 1)
end

Calculate the p-value of a goodness-of-fit test on a process.

# Arguments
- `::PPTest`: the bootstrap test result object
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `::PPTest`: the bootstrap test result object
- `::PPTest`: the test result object

@rsenne
Copy link
Collaborator

rsenne commented Dec 11, 2025

Left my last final comments as well!

@JoseKling
Copy link
Owner Author

JoseKling commented Dec 12, 2025

Incorporated the minor suggestions, but, more importantly, refactored the multi-threaded part of the tests.

I was getting an error in test (lts) saying that the vector rngs = [Xoshiro(rand(rng, UInt)) for _ in 1:Threads.nthreads] was being accessed at nthreads + 1. I found out that the thread ids inside the loop were not from 1 to nthreads anymore, but from 2 to nthreads + 1.
I looked more into it and found this Julia Language Blog post and this Discord discussion. Good to keep in mind.

@rsenne
Copy link
Collaborator

rsenne commented Dec 13, 2025

Looks good to me. Just approved the merge

@JoseKling
Copy link
Owner Author

Added a lock for accessing the master rng in the tests. Although the default_rng() is thread safe and would not cause problems without the lock, this works for any rng.

@JoseKling JoseKling merged commit 3f0e8ac into main Dec 16, 2025
5 checks passed
@JoseKling JoseKling deleted the GoFTest branch December 19, 2025 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants