-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crosscheck with Paper, complex matrix optimizations #137
Conversation
- use general vmul! etc methods in the BlockDiagonal methods - better support for irregularly sized square blocks
https://carstenbauer.github.io/MonteCarlo.jl/dev/examples/triangular_Hubbard/ clearly does not |
I benchmarked the new model with StructArrays - thermalization is twice as fast (roughly 0.5s -> 0.25s in my test), but measurements are just as slow (2s -> 2s, occupation and pairing susceptibility). I see two potential reasons for this:
|
Making use of StructArrays in measurements is tough atm. There is no left vs right multiplier like in |
- remove preceding / from relative paths - don't put preceding / on relative path on update - make FileWrapper hold absolute paths - skip saving unloaded BufferedConfigRecorder
I think LoopVectorization is what's making tests slow. Precompilation doesn't help, but puts the time spent on the testset rather than include...
Judging from Hofmann Berg paper it's not the matrix element that gets daggered but the prefactor in the Hamiltonian. And that is the same as taking the transpose-element because T must be Hermitian
- add cache for maps - add templates (as a light recipe for iterators) - generalize Quad methods to allow discontinuous directions - update measurements to use objects instead of types
I'm done with all of those
on-site needed for s-wave
This pr will add a crosscheck/example comparing to https://arxiv.org/pdf/1912.08848.pdf.
TODO
General Changes:
dagger
toswapop
(see Better syntax for DQMC measurements #135)superfluid_stiffness
based on this paperComplex Linear Algebra
The model from the reference paper uses complex hoppings. To be able to run it efficiently this pr implements all the neglected complex linear algebra. It also cleans up
BlockDiagonal
so that it works with ComplexF64 and can be combined with complex StructArrays. Closes #78vmul!
(and alike) methods fall back onto othervmul!
methods. This performs a little bit worse, but makes things generic. Maybe@inline
would help?BLAS.set_num_threads(1)
our methods win. I assume in the one simulation per core setting (i.e. on a supercomputer) the single threaded BLAS performance is relevant.Lattice Iterator revamp/changes
The reference paper calculates the current current correlations with unsynchronized directions. However it's not just every bond up to the K-th farthest. It's every bond that has a hopping associated to it without reversed bonds (these are explicitly included in the formula pre Wicks) and without bonds with
dir[bond][1] == 0
. This gets rid of quite a few bonds. With out current iterators and NN and NNN bonds, for example, we'd include 9 directions (on-site, 4x NN, 4x NNN) but only really need 3 (+x NN, 2 +x NNN). And because it'S usnyhcronized the number of directions goes in squared, so we do 9²/3² = 9 times the work.To fix this I've revamped the lattice iterator system here. The main goal was to allow passing directional indices directly, e.g.
EachLocalQuadByDistance([2, 5, 9])
. I also wanted to stop relying on only one of each lattice iterator type being created at the start of the simulation to memory usage down. So I made the following changes:There is now a
LatticeIteratorCache
attached to DQMC. Perhaps later I will put this together with the lattice. The cache contains a dict that gets filled with maps such asdir_idx -> (src, trg)
which are accessible viacache[Dir2SrcTrg()]
and can be created viapush!(cache, Dir2SrcTrg(), lattice)
. Every lattice iterator will create the maps it needs (no duplicated) and holds references to it for iteration. So now the difference between 1 and 1000EachLocalQuadByDistance
iterators is just 3000 integers and 3000 references.The second change I made was that every iterator now has a template version and a runtime version. The template is light, used as a field in DQMCMeasurements and can be saved without extra care. The runtime version (same type name with _ in front) follows from the template and will construct the maps it needs during creation.