Simplify color algebra (A-iB)(M)(A+iB) as AMA+BMB by valassi · Pull Request #457 · madgraph5/madgraph4gpu

valassi · 2022-05-18T14:01:57Z

This is a spinoff of #155 about color algebra optimisations.

While discussing with @roiser we realised that one can rewrite (A-iB)(M)(A+iB) as AMA+BMB in the color algebra calculation (where A,B are vectors and M is a matrix, with ncol and ncol*ncol dimensions).

This will in any case simplify tensor core calculations (as there is no need to deal with complex arithmetics, one can just use two real quadratic forms).

I was also naively expecting that this might improve performance by a factor 2 or 4, because some calculations are not needed. It turns out that the physics results are the same, but performance is excatly the same.

I would imagine that this may be because in the older implementation the c++ compiler was in any case optimising away some calculations, refusing to compute calculations whose results are not stored? Essentially the change is this

madgraph4gpu/epochX/cudacpp/gg_ttgg/SubProcesses/P1_Sigma_sm_gg_ttxgg/CPPProcess.cc

Line 1942 in 6f5688b

// Sum and square the color flows to get the matrix element

      // Sum and square the color flows to get the matrix element
      // (compute |M|^2 by squaring |M|, taking into account colours)
      fptype_sv deltaMEs = { 0 }; // all zeros https://en.cppreference.com/w/c/language/array_initialization#Notes
      for( int icol = 0; icol < ncolor; icol++ )
      {
        cxtype_sv ztemp_sv = cxzero_sv();
        for( int jcol = 0; jcol < ncolor; jcol++ )
          ztemp_sv += cf[icol][jcol] * jamp_sv[jcol];
        // OLD implementation (slower?)
        //deltaMEs += cxreal( ztemp_sv * cxconj( jamp_sv[icol] ) ) / denom[icol];
        // NEW implementation (faster?) 
        // Rewrite the quadratic form (A-iB)(M)(A+iB) as AMA - iBMA + iBMA + BMB = AMA + BMB!
        deltaMEs += ( cxreal( ztemp_sv ) * cxreal( jamp_sv[icol] ) +
                      cximag( ztemp_sv ) * cximag( jamp_sv[icol] ) ) / denom[icol];
      }

This is almost a no-op but I would merge it anyway. Maybe I will first rerun a full set of tests just for good habit.

I only tested one log and essentially there is no difference with upstream/master:
ed0890b

… faster?!

…tually not faster?!

Revert "[quad] rerun ggttgg with rewriting (A-iB)(M)(A+iB) as AMA + BMB... actually not faster?!" This reverts commit a4f6df6.

… AMA + BMB - faster than first attempt, but as fast as upstream/master?!

…- faster than first attempt, but as fast as upstream/master?!

Revert "[quad] rerun ggttgg with new rewrite of (A-iB)(M)(A+iB) as AMA + BMB - faster than first attempt, but as fast as upstream/master?!" This reverts commit 4f459d9334a41840cec0a4cbc223401aa2c36bac.

…ot faster than upstream/master

…and performance as upsttream/master (?!) It is possible that in the old implementation the compiler was optimising away the calculation of the imaginary part of ztemp_sv * cxconj( jamp_sv[icol] ), which is not used/stored anywhere? Anyway, keep th enew implementation because it is clearer - and potentially easier for tensor cores

…sults for physics and performance

NB I have not yet rerun all logs - just checked on ggtt that no performance difference is expected

…ange in performance STARTED AT Wed May 18 21:32:23 CEST 2022 ENDED(1) AT Wed May 18 22:40:43 CEST 2022 ENDED(2) AT Wed May 18 23:06:23 CEST 2022 ENDED(3) AT Wed May 18 23:09:53 CEST 2022 ENDED(4) AT Wed May 18 23:12:16 CEST 2022 ENDED(5) AT Wed May 18 23:14:35 CEST 2022

…sier merging Revert "[quad] rerun 56 logs manu from allTees.sh - all ok, essentially no change in performance" This reverts commit 1e5162c.

valassi · 2022-05-20T05:47:36Z

All tests pased, I am self merging

STARTED AT Thu May 19 20:20:35 CEST 2022 ./tput/teeThroughputX.sh -mad -flt -hrd -makej -makeclean -eemumu -ggtt -ggttg -ggttgg -ggttggg ENDED(1) AT Thu May 19 22:31:12 CEST 2022 [Status=0] ./tput/teeThroughputX.sh -mad -flt -hrd -makej -makeclean -eemumu -ggtt -ggttgg -inlonly ENDED(2) AT Thu May 19 22:44:12 CEST 2022 [Status=0] ./tput/teeThroughputX.sh -mad -eemumu -ggtt -ggttgg -flt -bridge ENDED(3) AT Thu May 19 22:48:12 CEST 2022 [Status=0] ./tput/teeThroughputX.sh -mad -eemumu -ggtt -ggttgg -flt -rmbhst ENDED(4) AT Thu May 19 22:50:41 CEST 2022 [Status=0] ./tput/teeThroughputX.sh -mad -eemumu -ggtt -ggttgg -flt -curhst ENDED(5) AT Thu May 19 22:53:04 CEST 2022 [Status=0]

valassi added 14 commits May 18, 2022 14:53

[quad] rewrite (A-iB)(M)(A+iB) as AMA + BMB in ggttgg... actually not…

667a636

… faster?!

[quad] rerun ggttgg with rewriting (A-iB)(M)(A+iB) as AMA + BMB... ac…

a4f6df6

…tually not faster?!

[quad] Revert to previous log for ggttgg

cdd7be6

Revert "[quad] rerun ggttgg with rewriting (A-iB)(M)(A+iB) as AMA + BMB... actually not faster?!" This reverts commit a4f6df6.

[quad] in ggttgg rewrite the previous rewriting of (A-iB)(M)(A+iB) as…

7572019

… AMA + BMB - faster than first attempt, but as fast as upstream/master?!

[quad] rerun ggttgg with new rewrite of (A-iB)(M)(A+iB) as AMA + BMB …

bfd8f89

…- faster than first attempt, but as fast as upstream/master?!

[quad] Revert to previous log for ggttgg

e51ed73

Revert "[quad] rerun ggttgg with new rewrite of (A-iB)(M)(A+iB) as AMA + BMB - faster than first attempt, but as fast as upstream/master?!" This reverts commit 4f459d9334a41840cec0a4cbc223401aa2c36bac.

[quad] yet another rewriting of (A-iB)(M)(A+iB) as AMA + BMB, again n…

abde4d9

…ot faster than upstream/master

[quad] rerun ggttgg with final rewriting of (A-iB)(M)(A+iB) - same re…

ed0890b

…sults for physics and performance

[quad] improve comments of final quad implementation in ggttgg

6073f3c

[quad] backport to codegen - with minor change for clang-format

6a53adf

[quad] regenerate 5 processes auto and resync manual - all ok

0fbb217

[quad] regenerate heftggh auto and resync manual

7641a68

[quad] regenerate also eemumu.mad ggtt.mad and ggtt.madonly

d9f9d02

valassi assigned roiser and valassi May 18, 2022

valassi mentioned this pull request May 18, 2022

Color algebra optimisations (use tensor cores?) #155

Open

valassi added 3 commits May 19, 2022 09:49

[quad] go back to upstream/master logs

752813f

NB I have not yet rerun all logs - just checked on ggtt that no performance difference is expected

[quad] ** COMPLETE QUAD ** revert to logs from upstream/master for ea…

a081214

…sier merging Revert "[quad] rerun 56 logs manu from allTees.sh - all ok, essentially no change in performance" This reverts commit 1e5162c.

valassi force-pushed the quad branch from 496b081 to a081214 Compare May 19, 2022 07:50

valassi merged commit b590089 into madgraph5:master May 20, 2022

valassi added a commit to valassi/madgraph4gpu that referenced this pull request May 20, 2022

[fvsc] regenerate all five *mad including latest quad PR madgraph5#457

f3ed17e

valassi mentioned this pull request Jun 5, 2022

Simplify color algebra (A-iB)(M)(A+iB) as AMA+BMB and in addition use the fact M is symmetric [done/faster for c++, notdone/slower for cuda] #475

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify color algebra (A-iB)(M)(A+iB) as AMA+BMB#457

Simplify color algebra (A-iB)(M)(A+iB) as AMA+BMB#457
valassi merged 17 commits intomadgraph5:masterfrom
valassi:quad

valassi commented May 18, 2022

Uh oh!

valassi commented May 20, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

valassi commented May 18, 2022

Uh oh!

valassi commented May 20, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants