Crosscheck with Paper, complex matrix optimizations #137

ffreyer · 2021-09-12T16:35:54Z

This pr will add a crosscheck/example comparing to https://arxiv.org/pdf/1912.08848.pdf.

TODO

finish complex linalg optimizations
implement model
get matching observables (t5 = 0, L = 8 for simplicity)
- Z - spin susceptibility (figure 1d red)
- Superfluid Stiffness (figure 2b)
- s-wave pairing correlation (figure 3a, solid blue)
- charge susceptibility (figure 3a, solid red)
document the crosscheck as an example
update other documentation

General Changes:

switch to $\Delta(i) \Delta^\dagger(k) + \Delta^\dagger(i) \Delta(k)$ for pairing correlation. This matches the paper above (removing terms that are 0 in DQMC) and some others (e.g. the triangular crosscheck).
fix some issues with loading moved BufferedConfigRecorder files and saving uninitialized BufferedConfigRecorder
better heuristic for counting the number of NNs (actually counting equal length distances)
add warning if T is not (approximately) hermitian
rename dagger to swapop (see Better syntax for DQMC measurements #135)
add superfluid_stiffness based on this paper

Complex Linear Algebra

The model from the reference paper uses complex hoppings. To be able to run it efficiently this pr implements all the neglected complex linear algebra. It also cleans up BlockDiagonal so that it works with ComplexF64 and can be combined with complex StructArrays. Closes #78

cleanup/fix BlockDiagonal for generic matrices (i.e. complex)
- let all vmul! (and alike) methods fall back onto other vmul! methods. This performs a little bit worse, but makes things generic. Maybe @inline would help?
- remove some resulting redundancies
- generalize most methods to arbitrarily sized blocks (though typically still square)
finish implementing Complex StructArrays
- closes complex matrices are not optimized #78
- note that BLAS will use threads by default while our LoopVectorization methods do not. With multiple threads BLAS will outperform, with BLAS.set_num_threads(1) our methods win. I assume in the one simulation per core setting (i.e. on a supercomputer) the single threaded BLAS performance is relevant.

Lattice Iterator revamp/changes

The reference paper calculates the current current correlations with unsynchronized directions. However it's not just every bond up to the K-th farthest. It's every bond that has a hopping associated to it without reversed bonds (these are explicitly included in the formula pre Wicks) and without bonds with dir[bond][1] == 0. This gets rid of quite a few bonds. With out current iterators and NN and NNN bonds, for example, we'd include 9 directions (on-site, 4x NN, 4x NNN) but only really need 3 (+x NN, 2 +x NNN). And because it'S usnyhcronized the number of directions goes in squared, so we do 9²/3² = 9 times the work.

To fix this I've revamped the lattice iterator system here. The main goal was to allow passing directional indices directly, e.g. EachLocalQuadByDistance([2, 5, 9]). I also wanted to stop relying on only one of each lattice iterator type being created at the start of the simulation to memory usage down. So I made the following changes:

There is now a LatticeIteratorCache attached to DQMC. Perhaps later I will put this together with the lattice. The cache contains a dict that gets filled with maps such as dir_idx -> (src, trg) which are accessible via cache[Dir2SrcTrg()] and can be created via push!(cache, Dir2SrcTrg(), lattice). Every lattice iterator will create the maps it needs (no duplicated) and holds references to it for iteration. So now the difference between 1 and 1000 EachLocalQuadByDistance iterators is just 3000 integers and 3000 references.

The second change I made was that every iterator now has a template version and a runtime version. The template is light, used as a field in DQMCMeasurements and can be saved without extra care. The runtime version (same type name with _ in front) follows from the template and will construct the maps it needs during creation.

- use general vmul! etc methods in the BlockDiagonal methods - better support for irregularly sized square blocks

ffreyer · 2021-09-21T14:37:34Z

https://carstenbauer.github.io/MonteCarlo.jl/dev/examples/triangular_Hubbard/ clearly does not \Delta^\dagger \Delta + \Delta |delta^\dagger for its pairing correlation. It fits way worse than \Delta \Delta^\dagger. Needs to be adjusted if I go through with switching the default from pc_kernel to pc_combined_kernel.
The reference for this PR clearly does use the combined kernel though...

ffreyer · 2021-09-22T16:46:04Z

I benchmarked the new model with StructArrays - thermalization is twice as fast (roughly 0.5s -> 0.25s in my test), but measurements are just as slow (2s -> 2s, occupation and pairing susceptibility).

I see two potential reasons for this:

The real and imaginary parts of the Greens function are not indexed in separate loops, which is not cache friendly.
Lattice iterators themselves are probably not cache friendly. They're essentially semi-random lists of indices. But given that they are static per simulation there is probably some optimization potential here as well...

ffreyer · 2021-09-27T18:36:36Z

Making use of StructArrays in measurements is tough atm. There is no left vs right multiplier like in mul!, matrix elements could technically be multiplied by i or conjugates could be taken. So instead I'm just converting complex StructArrays back to plain complex matrices. This isn't optimal, but still a lot better... (~2s -> ~1.33s)

- remove preceding / from relative paths - don't put preceding / on relative path on update - make FileWrapper hold absolute paths - skip saving unloaded BufferedConfigRecorder

I think LoopVectorization is what's making tests slow. Precompilation doesn't help, but puts the time spent on the testset rather than include...

Judging from Hofmann Berg paper it's not the matrix element that gets daggered but the prefactor in the Hamiltonian. And that is the same as taking the transpose-element because T must be Hermitian

- add cache for maps - add templates (as a light recipe for iterators) - generalize Quad methods to allow discontinuous directions - update measurements to use objects instead of types

I'm done with all of those

on-site needed for s-wave

ffreyer added 10 commits September 9, 2021 17:11

add vinv!(trg, src)

1085f3c

update comments

0e5ae4a

allow passing kernel in pairing observables

e6f010a

start cleanup/generalization of BlockDiagonal

1bccc40

Cleanup complex BlockDIagonal

0db1c38

cleanup BlockDiagonal

542bd9e

- use general vmul! etc methods in the BlockDiagonal methods - better support for irregularly sized square blocks

fix tests

9c57698

finish implementing all the StructArray business

f5f9360

fix allocations

0a0b873

remove unused vmul!

123d8fb

ffreyer added 2 commits September 22, 2021 13:54

inline a bunch of stuff

b861a7f

add some missing necessities

2376a23

bandaid performance fix for CMat64 measurements

a155138

Frederic Freyer and others added 14 commits September 28, 2021 13:19

fix dispatch error

87d60ae

add missing copyto!

7be9c08

warn if hopping matrix not Hermitian

9118dbf

fix NN counting

1acf746

remove extremely frequent @bm

6e0b4e6

fix some file io issues with BufferedConfigRecorder

1944854

- remove preceding / from relative paths - don't put preceding / on relative path on update - make FileWrapper hold absolute paths - skip saving unloaded BufferedConfigRecorder

fix chunk index when loading is skip

98dac4b

add implementation details for new crosscheck

cfa7c28

update BCR path on load too

62bdc99

keep file wrapped during load

f7ee557

add link_id to verify BCR file, tests and cleanup

c9fe0d7

fix typo

5e77eb7

fix missing string cast

5ecac2e

fix link_id compat

8dcde0c

ffreyer force-pushed the crosscheck branch from b364aff to 3bce0cf Compare October 12, 2021 13:30

ffreyer added 2 commits October 12, 2021 16:43

run full tests again

b622c75

I think LoopVectorization is what's making tests slow. Precompilation doesn't help, but puts the time spent on the testset rather than include...

maybe fix windows ci?

90c502f

ffreyer changed the title ~~Crosscheck with Paper~~ Crosscheck with Paper, complex matrix optimizations Oct 12, 2021

fix CCS?

014fda9

ffreyer mentioned this pull request Oct 18, 2021

another attempt at superfluid density/stiffness #136

Closed

ffreyer added 22 commits October 18, 2021 18:17

fix docstring

8f86059

fix typo

f2cdb41

don't move file if correct file already exists

72570f1

fix test

47b3ee6

fix T^\dagger -> t^\dagger

e5de54c

Judging from Hofmann Berg paper it's not the matrix element that gets daggered but the prefactor in the Hamiltonian. And that is the same as taking the transpose-element because T must be Hermitian

rework lattice iterators

5924346

- add cache for maps - add templates (as a light recipe for iterators) - generalize Quad methods to allow discontinuous directions - update measurements to use objects instead of types

minor cleanup

0714062

cleanup old code

c094306

clear TODO

1b2c760

I'm done with all of those

Discuss simulation set up, measurements

95089ab

start results section

d8803f8

finish crosschecks docs

463e5db

simplify access to lattice iterator maps

2019e61

add simplified superfluid stiffness measurement

67263c7

add hopping distance based factors

dc62fad

include superfluid stiffness

398bc95

cleanup comments

0648b3e

fix default pairing correlations

4032396

on-site needed for s-wave

update triangular crosscheck & move assets

4124828

fix test failure

711d8ee

cleanup

d3cc78d

don't assume T real symmetric

2ed8233

ffreyer merged commit b348442 into carstenbauer:master Nov 1, 2021

ffreyer mentioned this pull request Dec 13, 2021

Singletons #110

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crosscheck with Paper, complex matrix optimizations #137

Crosscheck with Paper, complex matrix optimizations #137

ffreyer commented Sep 12, 2021 •

edited

Loading

ffreyer commented Sep 21, 2021 •

edited

Loading

ffreyer commented Sep 22, 2021

ffreyer commented Sep 27, 2021

Crosscheck with Paper, complex matrix optimizations #137

Crosscheck with Paper, complex matrix optimizations #137

Conversation

ffreyer commented Sep 12, 2021 • edited Loading

TODO

General Changes:

Complex Linear Algebra

Lattice Iterator revamp/changes

ffreyer commented Sep 21, 2021 • edited Loading

ffreyer commented Sep 22, 2021

ffreyer commented Sep 27, 2021

ffreyer commented Sep 12, 2021 •

edited

Loading

ffreyer commented Sep 21, 2021 •

edited

Loading