Simplify color algebra (A-iB)(M)(A+iB) as AMA+BMB#457
Merged
valassi merged 17 commits intomadgraph5:masterfrom May 20, 2022
Merged
Simplify color algebra (A-iB)(M)(A+iB) as AMA+BMB#457valassi merged 17 commits intomadgraph5:masterfrom
valassi merged 17 commits intomadgraph5:masterfrom
Conversation
…tually not faster?!
Revert "[quad] rerun ggttgg with rewriting (A-iB)(M)(A+iB) as AMA + BMB... actually not faster?!" This reverts commit a4f6df6.
… AMA + BMB - faster than first attempt, but as fast as upstream/master?!
…- faster than first attempt, but as fast as upstream/master?!
Revert "[quad] rerun ggttgg with new rewrite of (A-iB)(M)(A+iB) as AMA + BMB - faster than first attempt, but as fast as upstream/master?!" This reverts commit 4f459d9334a41840cec0a4cbc223401aa2c36bac.
…ot faster than upstream/master
…and performance as upsttream/master (?!) It is possible that in the old implementation the compiler was optimising away the calculation of the imaginary part of ztemp_sv * cxconj( jamp_sv[icol] ), which is not used/stored anywhere? Anyway, keep th enew implementation because it is clearer - and potentially easier for tensor cores
…sults for physics and performance
NB I have not yet rerun all logs - just checked on ggtt that no performance difference is expected
…ange in performance STARTED AT Wed May 18 21:32:23 CEST 2022 ENDED(1) AT Wed May 18 22:40:43 CEST 2022 ENDED(2) AT Wed May 18 23:06:23 CEST 2022 ENDED(3) AT Wed May 18 23:09:53 CEST 2022 ENDED(4) AT Wed May 18 23:12:16 CEST 2022 ENDED(5) AT Wed May 18 23:14:35 CEST 2022
…sier merging Revert "[quad] rerun 56 logs manu from allTees.sh - all ok, essentially no change in performance" This reverts commit 1e5162c.
Member
Author
|
All tests pased, I am self merging |
valassi
added a commit
to valassi/madgraph4gpu
that referenced
this pull request
May 20, 2022
valassi
added a commit
to valassi/madgraph4gpu
that referenced
this pull request
May 20, 2022
STARTED AT Thu May 19 20:20:35 CEST 2022 ./tput/teeThroughputX.sh -mad -flt -hrd -makej -makeclean -eemumu -ggtt -ggttg -ggttgg -ggttggg ENDED(1) AT Thu May 19 22:31:12 CEST 2022 [Status=0] ./tput/teeThroughputX.sh -mad -flt -hrd -makej -makeclean -eemumu -ggtt -ggttgg -inlonly ENDED(2) AT Thu May 19 22:44:12 CEST 2022 [Status=0] ./tput/teeThroughputX.sh -mad -eemumu -ggtt -ggttgg -flt -bridge ENDED(3) AT Thu May 19 22:48:12 CEST 2022 [Status=0] ./tput/teeThroughputX.sh -mad -eemumu -ggtt -ggttgg -flt -rmbhst ENDED(4) AT Thu May 19 22:50:41 CEST 2022 [Status=0] ./tput/teeThroughputX.sh -mad -eemumu -ggtt -ggttgg -flt -curhst ENDED(5) AT Thu May 19 22:53:04 CEST 2022 [Status=0]
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a spinoff of #155 about color algebra optimisations.
While discussing with @roiser we realised that one can rewrite (A-iB)(M)(A+iB) as AMA+BMB in the color algebra calculation (where A,B are vectors and M is a matrix, with ncol and ncol*ncol dimensions).
This will in any case simplify tensor core calculations (as there is no need to deal with complex arithmetics, one can just use two real quadratic forms).
I was also naively expecting that this might improve performance by a factor 2 or 4, because some calculations are not needed. It turns out that the physics results are the same, but performance is excatly the same.
I would imagine that this may be because in the older implementation the c++ compiler was in any case optimising away some calculations, refusing to compute calculations whose results are not stored? Essentially the change is this
madgraph4gpu/epochX/cudacpp/gg_ttgg/SubProcesses/P1_Sigma_sm_gg_ttxgg/CPPProcess.cc
Line 1942 in 6f5688b
This is almost a no-op but I would merge it anyway. Maybe I will first rerun a full set of tests just for good habit.
I only tested one log and essentially there is no difference with upstream/master:
ed0890b