Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix link order in PETSc easyblock for SCOTCH >= 7.x #3069

Merged

Conversation

boegel
Copy link
Member

@boegel boegel commented Jan 5, 2024

(created using eb --new-pr)

When building petsc4py on top of PETSc/3.17.4-foss-2022a, I ran into the following problem during the from petsc4py import PETSc sanity check:

ImportError: /software/PETSc/3.17.4-foss-2022a/lib/libpetsc.so.3.17: undefined symbol: SCOTCH_dgraphOrderTreeDist

This is due to a change in the linking order of SCOTCH libraries to order them alphabetically, which was introduced in #2796 (on my request 🤦 ).
It's vital that libptscotch (which provides SCOTCH_dgraphOrderTreeDist) is linked in after libptscotchparmetisv3 (which requires SCOTCH_dgraphOrderTreeDist).

@boegel boegel added the bug fix label Jan 5, 2024
@boegel boegel added this to the release after 4.9.0 milestone Jan 5, 2024
@boegel
Copy link
Member Author

boegel commented Jan 5, 2024

@boegelbot please test @ generoso
CORE_CNT=16
EB_ARGS="PETSc-3.15.1-intel-2021a.eb PETSc-3.17.4-foss-2022a.eb PETSc-3.18.4-intel-2021b.eb PETSc-3.19.2-foss-2022b.eb"

@boegelbot
Copy link

@boegel: Request for testing this PR well received on login1

PR test command 'EB_PR=3069 EB_ARGS="PETSc-3.15.1-intel-2021a.eb PETSc-3.17.4-foss-2022a.eb PETSc-3.18.4-intel-2021b.eb PETSc-3.19.2-foss-2022b.eb" EB_CONTAINER= EB_REPO=easybuild-easyblocks /opt/software/slurm/bin/sbatch --job-name test_PR_3069 --ntasks="16" ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 12516

Test results coming soon (I hope)...

- notification for comment with ID 1879230009 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

'libptscotcherr.a', 'libptscotchparmetisv3.a', 'libscotch.a',
'libscotcherr.a']
# which is the reason for this new code;
# note: order matters here, don't sort these alphabetically!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it's much easier for human eyes! :p

lgtm

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... :)

@boegel
Copy link
Member Author

boegel commented Jan 6, 2024

Test report by @boegel

Overview of tested easyconfigs (in order)

  • SUCCESS PETSc-3.17.4-foss-2022a.eb
  • SUCCESS PETSc-3.19.2-foss-2022b.eb

Build succeeded for 2 out of 2 (2 easyconfigs in total)
node3123.skitty.os - Linux RHEL 8.8, x86_64, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz (skylake_avx512), Python 3.6.8
See https://gist.github.com/boegel/73b19fe05e6b65080d4d2c8e79424ddb for a full test report.

@boegelbot
Copy link

Test report by @boegelbot

Overview of tested easyconfigs (in order)

Build succeeded for 3 out of 4 (4 easyconfigs in total)
cnx1 - Linux Rocky Linux 8.5, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/c9761e8065636243c8b2f277a49746ad for a full test report.

@migueldiascosta
Copy link
Member

Hm, another issue with Intel MPI on generoso?

@boegel
Copy link
Member Author

boegel commented Jan 6, 2024

Hm, another issue with Intel MPI on generoso?

Hmm, yeah, maybe... I wouldn't let this block this PR though, since PETSc-3.18.4-intel-2021b.eb is actually not affected by the changes being made at all (since it uses SCOTCH 6.x < 7.x)

@boegel
Copy link
Member Author

boegel commented Jan 6, 2024

@boegelbot please test @ jsc-zen2
CORE_CNT=8
EB_ARGS="PETSc-3.15.1-intel-2021a.eb PETSc-3.17.4-foss-2022a.eb PETSc-3.18.4-intel-2021b.eb PETSc-3.19.2-foss-2022b.eb"

@easybuilders easybuilders deleted a comment from boegelbot Jan 6, 2024
@easybuilders easybuilders deleted a comment from boegelbot Jan 6, 2024
@boegelbot
Copy link

@boegel: Request for testing this PR well received on jsczen2l1.int.jsc-zen2.easybuild-test.cluster

PR test command 'EB_PR=3069 EB_ARGS="PETSc-3.15.1-intel-2021a.eb PETSc-3.17.4-foss-2022a.eb PETSc-3.18.4-intel-2021b.eb PETSc-3.19.2-foss-2022b.eb" EB_REPO=easybuild-easyblocks /opt/software/slurm/bin/sbatch --mem-per-cpu=4000M --job-name test_PR_3069 --ntasks="8" ~/boegelbot/eb_from_pr_upload_jsc-zen2.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 4017

Test results coming soon (I hope)...

- notification for comment with ID 1879595628 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegel
Copy link
Member Author

boegel commented Jan 6, 2024

More info from configure.log for failure with PETSc-3.18.4-intel-2021b.eb on generoso:

          UNABLE to CONFIGURE with GIVEN OPTIONS    (see configure.log for details):
 -------------------------------------------------------------------------------
 Timeout: Unable to run MPI program with /project/boegelbot/Rocky8/haswell/software/impi/2021.4.0-intel-compilers-2021.4.0/mpi/2021.4.0/bin/mpiexec -n 1
     (1) make sure this is the correct program to run MPI jobs
     (2) your network may be misconfigured; see https://petsc.org/release/faq/#mpi-network-misconfigure
     (3) you may have VPN running whose network settings may not play nice with MPI
 *******************************************************************************
   File "/tmp/boegelbot/PETSc/3.18.4/intel-2021b/petsc-3.18.4/config/configure.py", line 461, in petsc_configure
     framework.configure(out = sys.stdout)
   File "/tmp/boegelbot/PETSc/3.18.4/intel-2021b/petsc-3.18.4/config/BuildSystem/config/framework.py", line 1412, in configure
     self.processChildren()
   File "/tmp/boegelbot/PETSc/3.18.4/intel-2021b/petsc-3.18.4/config/BuildSystem/config/framework.py", line 1400, in processChildren
     self.serialEvaluation(self.childGraph)
   File "/tmp/boegelbot/PETSc/3.18.4/intel-2021b/petsc-3.18.4/config/BuildSystem/config/framework.py", line 1375, in serialEvaluation
     child.configure()
   File "/tmp/boegelbot/PETSc/3.18.4/intel-2021b/petsc-3.18.4/config/BuildSystem/config/package.py", line 1222, in configure
     self.executeTest(self.configureLibrary)
   File "/tmp/boegelbot/PETSc/3.18.4/intel-2021b/petsc-3.18.4/config/BuildSystem/config/base.py", line 138, in executeTest
     ret = test(*args,**kargs)
   File "/tmp/boegelbot/PETSc/3.18.4/intel-2021b/petsc-3.18.4/config/BuildSystem/config/packages/MPI.py", line 881, in configureLibrary
     self.executeTest(self.configureMPIEXEC)
   File "/tmp/boegelbot/PETSc/3.18.4/intel-2021b/petsc-3.18.4/config/BuildSystem/config/base.py", line 138, in executeTest
     ret = test(*args,**kargs)
   File "/tmp/boegelbot/PETSc/3.18.4/intel-2021b/petsc-3.18.4/config/BuildSystem/config/packages/MPI.py", line 377, in configureMPIEXEC
     raise RuntimeError('Timeout: %s' % error_message)

looks like a very (very) basic MPI test: https://gitlab.com/petsc/petsc/-/blob/80f88c66aac8f9d1abe2df64bcd926bcfcc2a09a/config/BuildSystem/config/packages/MPI.py#L380

I can reproduce this issue easily when running in an interactive job started with srun -c 16 --time 600 --pty /bin/bash.

After running for x in $(env | grep SLURM| cut -f1 -d=); do unset $x; done to unset all $SLURM_* environment variables, configure works fine.

@boegelbot
Copy link

Test report by @boegelbot

Overview of tested easyconfigs (in order)

Build succeeded for 8 out of 10 (4 easyconfigs in total)
jsczen2c1.int.jsc-zen2.easybuild-test.cluster - Linux Rocky Linux 8.5, x86_64, AMD EPYC 7742 64-Core Processor (zen2), Python 3.6.8
See https://gist.github.com/boegelbot/b900abf19ea35a00a92f5668dbce4ac0 for a full test report.

Copy link
Member

@migueldiascosta migueldiascosta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, and failures in the test reports are not related nor affected by these changes

@migueldiascosta
Copy link
Member

Going in, thanks @boegel!

@migueldiascosta migueldiascosta merged commit c68d093 into easybuilders:develop Jan 10, 2024
47 checks passed
@migueldiascosta
Copy link
Member

@boegel SciPy-bundle/2021.05-intel-2021a at jsc-zen2 seems to be broken though, which will likely affect other tests

@boegel boegel deleted the 20240105213202_new_pr_petsc branch February 11, 2024 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants