Skip to content

Simplify color algebra (A-iB)(M)(A+iB) as AMA+BMB#457

Merged
valassi merged 17 commits intomadgraph5:masterfrom
valassi:quad
May 20, 2022
Merged

Simplify color algebra (A-iB)(M)(A+iB) as AMA+BMB#457
valassi merged 17 commits intomadgraph5:masterfrom
valassi:quad

Conversation

@valassi
Copy link
Member

@valassi valassi commented May 18, 2022

This is a spinoff of #155 about color algebra optimisations.

While discussing with @roiser we realised that one can rewrite (A-iB)(M)(A+iB) as AMA+BMB in the color algebra calculation (where A,B are vectors and M is a matrix, with ncol and ncol*ncol dimensions).

This will in any case simplify tensor core calculations (as there is no need to deal with complex arithmetics, one can just use two real quadratic forms).

I was also naively expecting that this might improve performance by a factor 2 or 4, because some calculations are not needed. It turns out that the physics results are the same, but performance is excatly the same.

I would imagine that this may be because in the older implementation the c++ compiler was in any case optimising away some calculations, refusing to compute calculations whose results are not stored? Essentially the change is this

// Sum and square the color flows to get the matrix element

      // Sum and square the color flows to get the matrix element
      // (compute |M|^2 by squaring |M|, taking into account colours)
      fptype_sv deltaMEs = { 0 }; // all zeros https://en.cppreference.com/w/c/language/array_initialization#Notes
      for( int icol = 0; icol < ncolor; icol++ )
      {
        cxtype_sv ztemp_sv = cxzero_sv();
        for( int jcol = 0; jcol < ncolor; jcol++ )
          ztemp_sv += cf[icol][jcol] * jamp_sv[jcol];
        // OLD implementation (slower?)
        //deltaMEs += cxreal( ztemp_sv * cxconj( jamp_sv[icol] ) ) / denom[icol];
        // NEW implementation (faster?) 
        // Rewrite the quadratic form (A-iB)(M)(A+iB) as AMA - iBMA + iBMA + BMB = AMA + BMB!
        deltaMEs += ( cxreal( ztemp_sv ) * cxreal( jamp_sv[icol] ) +
                      cximag( ztemp_sv ) * cximag( jamp_sv[icol] ) ) / denom[icol];
      }

This is almost a no-op but I would merge it anyway. Maybe I will first rerun a full set of tests just for good habit.

I only tested one log and essentially there is no difference with upstream/master:
ed0890b

valassi added 14 commits May 18, 2022 14:53
Revert "[quad] rerun ggttgg with rewriting (A-iB)(M)(A+iB) as AMA + BMB... actually not faster?!"
This reverts commit a4f6df6.
… AMA + BMB - faster than first attempt, but as fast as upstream/master?!
…- faster than first attempt, but as fast as upstream/master?!
Revert "[quad] rerun ggttgg with new rewrite of (A-iB)(M)(A+iB) as AMA + BMB - faster than first attempt, but as fast as upstream/master?!"
This reverts commit 4f459d9334a41840cec0a4cbc223401aa2c36bac.
…and performance as upsttream/master (?!)

It is possible that in the old implementation the compiler was optimising away the calculation
of the imaginary part of ztemp_sv * cxconj( jamp_sv[icol] ), which is not used/stored anywhere?

Anyway, keep th enew implementation because it is clearer - and potentially easier for tensor cores
valassi added 3 commits May 19, 2022 09:49
NB I have not yet rerun all logs - just checked on ggtt that no performance difference is expected
…ange in performance

STARTED  AT Wed May 18 21:32:23 CEST 2022
ENDED(1) AT Wed May 18 22:40:43 CEST 2022
ENDED(2) AT Wed May 18 23:06:23 CEST 2022
ENDED(3) AT Wed May 18 23:09:53 CEST 2022
ENDED(4) AT Wed May 18 23:12:16 CEST 2022
ENDED(5) AT Wed May 18 23:14:35 CEST 2022
…sier merging

Revert "[quad] rerun 56 logs manu from allTees.sh - all ok, essentially no change in performance"
This reverts commit 1e5162c.
@valassi
Copy link
Member Author

valassi commented May 20, 2022

All tests pased, I am self merging

@valassi valassi merged commit b590089 into madgraph5:master May 20, 2022
valassi added a commit to valassi/madgraph4gpu that referenced this pull request May 20, 2022
valassi added a commit to valassi/madgraph4gpu that referenced this pull request May 20, 2022
STARTED  AT Thu May 19 20:20:35 CEST 2022
./tput/teeThroughputX.sh -mad -flt -hrd -makej -makeclean -eemumu -ggtt -ggttg -ggttgg -ggttggg
ENDED(1) AT Thu May 19 22:31:12 CEST 2022 [Status=0]
./tput/teeThroughputX.sh -mad -flt -hrd -makej -makeclean -eemumu -ggtt -ggttgg -inlonly
ENDED(2) AT Thu May 19 22:44:12 CEST 2022 [Status=0]
./tput/teeThroughputX.sh -mad -eemumu -ggtt -ggttgg -flt -bridge
ENDED(3) AT Thu May 19 22:48:12 CEST 2022 [Status=0]
./tput/teeThroughputX.sh -mad -eemumu -ggtt -ggttgg -flt -rmbhst
ENDED(4) AT Thu May 19 22:50:41 CEST 2022 [Status=0]
./tput/teeThroughputX.sh -mad -eemumu -ggtt -ggttgg -flt -curhst
ENDED(5) AT Thu May 19 22:53:04 CEST 2022 [Status=0]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants