-
Notifications
You must be signed in to change notification settings - Fork 2
Standard Hawkes Process #59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- src/hawkes/hawkes_process.jl
- Methods: rand, fit, time_change, ground_intensity,
integrated_ground_intensity, logdensityof
- tests/hawkes.jl
- Some unit tests for Hawkes processes
- history.jl
- added 'event_times(h, tmin, tmax)'
- added 'event_marks(h, tmin, tmax)'
- simulation.jl
- added 'simulate_poisson' - for other methods to use
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
jucheval
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not check every line, but there are already lots of changes requested and questions.
test/hawkes.jl
Outdated
| @test isa(h_sim, History{Nothing,Float64}) | ||
| # Estimation | ||
| hp_est = fit(HawkesProcess, h_sim) | ||
| @test isa(hp_est, HawkesProcess) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a test to check that the estimated parameters are close to the real ones ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am no sure how to do this, to be honest.
One way would be to simulate many processes with known parameters and then get the mean/median of the estimations, but how would I define a bound for how far off this can be from the true parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to give us a warning in case of a harmful change in a future update.
You could chose a history and a rng seed such that the estimation is "close" to the true parameters. Then, add a unit test to check that this distance to the true parameters do not deteriorate in the next updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is the best approach, but, in principle, this would show us if something changes the algorithm too much.
Could make it simpler by just fixing a seed and checking if it is close to the median, if this is too much.
Also, I saved the result of a simulation in a text file, because this is recommended in the Random.jl documentation, as they might change the algorithm and, therefore, change the results in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Julien is right, checking proximity between estimated and true parameters is an excellent way to guard against mistakes or breaking changes.
You can use StableRNGs.jl to get reproducible random seeds across versions and platforms, which makes text files unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Room for improvement is good ; )
About your question, the problem is just a poor explanation in the comments.
In reality this t90 is the inverse of the time it takes for a 90% decay. The only reason for sampling the inverse of t90 instead of t90 is that this ensures that the distribution of
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, got it. I assumed it was something like this but wanted to make sure there wasn't a bug. I'll move forward on optimizing a bit!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I optimized to significantly improve performance and increase memory efficiency. The main change replaces the previous O(n²) double loop used to compute descendant probabilities with an O(n) recurrence formulation based on running sums (A and S). This eliminates redundant computations and scales efficiently with the number of events. I also introduced a few small buffers to avoid unnecessary array allocations and removed redundant exponential recalculations within the EM loop. This caused the fit time to go from 55s for ~8k events to 10.53 ms without changing the final estimates. All tests still pass as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rsenne!
Completely missed that! It is even in the paper, actually. Anyway, that's the good thing about having extra pair of eyes going through it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happens to the best of us, glad I could help :)
- Fixed `logdensityof` - HawkesProcess is now parametric and has as check in the inner constructor - Added explanation for `generate_descendants` and choice of initial guess in `fit`
- Fixed logdensityof - HawkesProcess is now parametric and has as check in the inner constructor - Added explanation for generate_descendants and choice of initial guess in fit Fixed formatting
- Changed parameter names of `HawkesProcess` - Added test for estimation
- Changed parameter names of `HawkesProcess` - Added test for estimation Forgot to run the formatter
- Changed parameter names of `HawkesProcess` - Added test for estimation Forgot to run the formatter Removed the docstring from `simulate_poisson_times` causing docs deployment problem.
gdalle
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome that you're getting started with new processes!
I left a few comments to try and guide you towards idiomatic Julia code, let me know what you think!
src/hawkes/hawkes_process.jl
Outdated
| Expectation-Maximization algorithm from [E. Lewis, G. Mohler (2011)](https://api.semanticscholar.org/CorpusID:2105771)). | ||
| The relevant calculations are in page 4, equations 6-13. | ||
|
|
||
| Let (t_1 < ... < t_N) be the event times over the interval [0, T). We use the immigrant-descendant representation, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, one should refrain from using LaTeX in docstrings because it renders badly in the REPL. If you have to include formulas or symbols, it's better to put them inside backticks so that they display as monospace code, and then add a page to the Documenter.jl documentation for proper LaTeX equations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can visualize the docstring by typing
?fit
in a Julia REPL
src/hawkes/hawkes_process.jl
Outdated
| T = duration(h) | ||
| norm_ts = h.times .* (N / T) # Average inter-event time equal to 1. Numerical stability | ||
| n_iters = 0 | ||
| step = step_tol + 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whenever you use floating point numbers like 1.0, you're forcing the entire computation to happen in Float64, even if the user provided Float32 everywhere. This could be replaced by one(T) for the right choice of T.
What is the right choice of T? Probably the result of the promotion between the time type of the History and the parameter type of the HawkesProcess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the types returned by rand and time_change too. The rule I used is the following:
time_change(hp::HawkesProcess{HP}, h::{M,H})
- Returns a
History{M,H}, same type as argument
fit(::Type{HawkesProcess{HP}}, h::History{M,H})
- Returns a
HawkesProcess{HP}, the exact type passed as an argument - If argument passed is
HawkesProcess(without a type parmeter), returnHawkesProcess{T}, whereT == promote_type(Float64, H).
rand(hp::HawkesProcess{HP}, tmin, tmax)
- Returns a
History{M,H}whereH == promote_type(typeof(tmin), typeof(tmax))
src/hawkes/hawkes_process.jl
Outdated
| # Internal function for simulating Hawkes processes | ||
| # The first generation, gen_0, is the `immigrants`, which is a set of event times. | ||
| # For each t_g ∈ gen_n, simulate an inhomogeneous Poisson process over the interval [t_g, T] | ||
| # with intensity λ(t) = α exp(-ω(t - t_g)) with the inverse method. | ||
| # gen_{n+1} is the set of all events simulated from all events in gen_n. | ||
| # The algorithm stops when the simulation from one generation results in no further events. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use block comments like so:
#=
First line
second line
=#
src/hawkes/hawkes_process.jl
Outdated
| function generate_descendants(rng::AbstractRNG, immigrants::Vector, T, α, ω) | ||
| descendants = eltype(immigrants)[] | ||
| next_gen = immigrants | ||
| while ~isempty(next_gen) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| while ~isempty(next_gen) | |
| while !isempty(next_gen) |
src/history.jl
Outdated
| Return the sorted vector of event times between `tmin` and `tmax` for `h`. | ||
| """ | ||
| function event_times(h::History, tmin, tmax) | ||
| h.times[searchsortedfirst(h.times, tmin):(searchsortedfirst(h.times, tmax) - 1)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be a @view?
test/hawkes.jl
Outdated
| @test isa(h_sim, History{Nothing,Float64}) | ||
| # Estimation | ||
| hp_est = fit(HawkesProcess, h_sim) | ||
| @test isa(hp_est, HawkesProcess) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Julien is right, checking proximity between estimated and true parameters is an excellent way to guard against mistakes or breaking changes.
You can use StableRNGs.jl to get reproducible random seeds across versions and platforms, which makes text files unnecessary.
test/hawkes_times.txt
Outdated
| @@ -0,0 +1 @@ | |||
| [0.01017337415246089, 0.013640560097159748, 0.013925428236956083, 0.014608995173263944, 0.015735173436628665, 0.017731455643465788, 0.018173140414903033, 0.019398584582704777, 0.02169310038049055, 0.026539311564590558, 0.03535296046523939, 0.035385747642310054, 0.04888791827373706, 0.049245437622213895, 0.05207317680771159, 0.05417399567567511, 0.06282683568521101, 0.07081244234996387, 0.0791015159737285, 0.0861333024505229, 0.087551141093, 0.08772509630067638, 0.09411335483093886, 0.09724491891582732, 0.0988625084503767, 0.10010837332313449, 0.10135018373031557, 0.10363364868768966, 0.10592656977963455, 0.10715180725875367, 0.11028563316661419, 0.1134331444864608, 0.12272955724493595, 0.12726221430956405, 0.12836304084244704, 0.12844801940952075, 0.13597112047166005, 0.15072073156516308, 0.15114633616168527, 0.15451628188316555, 0.1615175726493574, 0.17404961232856206, 0.19135505688083287, 0.22076206529908327, 0.23982012455994361, 0.24316977218179758, 0.24474350243009585, 0.24644188741540635, 0.24758741892662547, 0.26335250468309224, 0.2657968651067172, 0.26846031955131877, 0.2714547249648518, 0.2752925737618863, 0.27565342198838594, 0.27609484665183454, 0.2793520184225446, 0.28043167139411607, 0.28266772707753685, 0.2892355684930643, 0.28935629867942364, 0.2910561718681186, 0.29157899048672486, 0.30523440749876696, 0.31427715680728796, 0.3160799465423002, 0.3186409783533404, 0.31928256511258085, 0.3561905127846101, 0.35726422828011145, 0.37542826109696614, 0.3761737738905845, 0.383211879818265, 0.38412250885804855, 0.3892202718510143, 0.38973687161088244, 0.3903222364578105, 0.3921460092118373, 0.3927565640326585, 0.3929931412548428, 0.39550199927854035, 0.4004287509915203, 0.4012584947658139, 0.4188933113083467, 0.4222797931631154, 0.42547015568968916, 0.43096758007962505, 0.4318550285906202, 0.44109809732147987, 0.4411113062102109, 0.4414891960155095, 0.4419159595553975, 0.4455996720525952, 0.4496160036372285, 0.4498795159779206, 0.45469339795091446, 0.45799213961322816, 0.4582125417978449, 0.4593459923561455, 0.460330625273055, 0.46151099501089304, 0.4663379688495272, 0.4668223709854475, 0.4689158086068933, 0.47513195296839694, 0.48261581524557473, 0.5018848417890917, 0.5082613332032163, 0.5098025975578675, 0.5103574120239697, 0.5105994460715356, 0.5251305262758121, 0.5394744828867348, 0.5398929198384044, 0.5451644369828262, 0.5456458484033474, 0.5461471626817158, 0.5562494823653819, 0.5577837518050861, 0.5629099224574312, 0.5639313234396448, 0.5652484663007217, 0.56579448745316, 0.569761662149659, 0.5772534932071732, 0.57747271973829, 0.581312103742883, 0.5897117080989583, 0.6042594925582441, 0.6062770267185059, 0.6146506742853325, 0.619164121825413, 0.6233280853442464, 0.6562295767400433, 0.657191314194358, 0.6683396665157701, 0.6934098253153014, 0.7098168085084123, 0.7142551495263125, 0.7149292250425998, 0.7222898663646184, 0.7230225303240918, 0.7258703766070407, 0.7329257086975677, 0.734965964575541, 0.7367052753163055, 0.7369917675462134, 0.7377952054108926, 0.7431668020456873, 0.7442335599332027, 0.7494356056840505, 0.7592464069425915, 0.76513378821372, 0.7778400193541839, 0.783570871666886, 0.7892533149681009, 0.8151907848154715, 0.8163624371624438, 0.8203512192351479, 0.8231907075189483, 0.8254450315531231, 0.8327541619684787, 0.8340446391585701, 0.8365483397626655, 0.8380870238704485, 0.8396006152087313, 0.8424109556863868, 0.8449279007257848, 0.8493882720082039, 0.8499247624654398, 0.8570201197254786, 0.8708262595580382, 0.8757516578617132, 0.8851984472410247, 0.8943124667367437, 0.9099513775965703, 0.9158323560687229, 0.916124781335173, 0.9186099384299968, 0.9207850430116301, 0.9261702419474741, 0.9266763718478324, 0.927525059408, 0.9326181870421033, 0.9332024579347786, 0.9369755618858636, 0.9371907441403706, 0.9389797249122629, 0.939953948070523, 0.9420001470511369, 0.943798275663379, 0.9466861545841257, 0.947173907438536, 0.94880346770712, 0.9500849345227314, 0.953356815578679, 0.9535829930299445, 0.9596111063743575, 0.9686141854288621, 0.9810644372158931] No newline at end of file | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above, this is not needed
- Added docstring to `HawkesProcess` - Avoided LaTeX in docstrings - Remove all `@inbounds`
- Reference parameters inside `hawkes_times.txt` file, instead of hard coded in tests. - Fixed return types for `rand`, `fit`, and `time_change` and added tests.
|
Hello. |
|
Anything else we want to do for this PR? |
|
I think all issues were resolved... |
Are we good to merge then? What would be good to work on next? |
jucheval
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I quickly looked over the changes and they seem fine to me.
|
Since this is a big PR and @gdalle was quite involved, I just want to make sure. Is this good to go or do you have anything to add? |
|
This is your project now, fire away :) |
src/hawkes/hawkes_process.jl
tests/hawkes.jl
history.jl
simulation.jl