-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xrootd file open on the grid sometimes fail with status code 139 #6948
Comments
Unfortunately this is the sort of issue that could have been easier to track/discuss on JIRA. But since ROOT doesn't use that anymore, here we go... My suspicion is that the grid nodes in question put some locally installed XRootD version high up in the library search path of the jobs. I don't know how they would do that, but that's my educated guess. ATLAS analysis releases using ROOT 6.18/04 (https://gitlab.cern.ch/atlas/atlasexternals/-/blob/1.0.65/External/ROOT/CMakeLists.txt) use XRootD 4.10.0 (https://gitlab.cern.ch/atlas/atlasexternals/-/blob/1.0.65/External/XRootD/CMakeLists.txt). While releases using ROOT 6.16/00 (https://gitlab.cern.ch/atlas/atlasexternals/-/blob/1.0.60/External/ROOT/CMakeLists.txt) used XRootD 4.8.4 (https://gitlab.cern.ch/atlas/atlasexternals/-/blob/1.0.60/External/XRootD/CMakeLists.txt). My educated guess is that the XRootD version force fed into your jobs @rdschaffer is binary compatible with XRootD 4.8.4, but not with 4.10.0 (or newer). However we definitely need some follow up from our grid experts on this. @rodwalker would it be possible to look at the problematic jobs / grid nodes for this? Cheers, |
Hi,
The whole ENV is dumped in eg.
https://bigpanda.cern.ch//media/filebrowser/5e40cf5d-179e-4126-ad56-e0bb0173cbd5/panda/tarball_PandaJob_4911855304_CERN/payload.stdout
Does that give any clue?
LD_LIBRARY_PATH=/srv/workDir/usr/HZZAnalRun2Code/1.0.0/InstallArea/x86_64-centos7-gcc8-opt/lib:/cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib:/cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib:/cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib64:/cvmfs/atlas.cern.ch/repo/sw/software/21.2/sw/lcg/releases/gcc/8.3.0-cebb0/x86_64-centos7/lib:/cvmfs/atlas.cern.ch/repo/sw/software/21.2/sw/lcg/releases/gcc/8.3.0-cebb0/x86_64-centos7/lib64:/cvmfs/atlas.cern.ch/repo/sw/software/21.2/sw/lcg/releases/binutils/2.30-e5b21/x86_64-centos7/lib:/.singularity.d/libs
Cheers,
Rod.
…On Mon, 14 Dec 2020 at 09:57, Attila Krasznahorkay ***@***.***> wrote:
Unfortunately this is the sort of issue that could have been easier to
track/discuss on JIRA. But since ROOT doesn't use that anymore, here we
go...
My suspicion is that the grid nodes in question put some locally installed
XRootD version high up in the library search path of the jobs. I don't know
how they would do that, but that's my educated guess.
ATLAS analysis releases using ROOT 6.18/04 (
https://gitlab.cern.ch/atlas/atlasexternals/-/blob/1.0.65/External/ROOT/CMakeLists.txt)
use XRootD 4.10.0 (
https://gitlab.cern.ch/atlas/atlasexternals/-/blob/1.0.65/External/XRootD/CMakeLists.txt).
While releases using ROOT 6.16/00 (
https://gitlab.cern.ch/atlas/atlasexternals/-/blob/1.0.60/External/ROOT/CMakeLists.txt)
used XRootD 4.8.4 (
https://gitlab.cern.ch/atlas/atlasexternals/-/blob/1.0.60/External/XRootD/CMakeLists.txt).
My educated guess is that the XRootD version force fed into your jobs
@rdschaffer <https://github.com/rdschaffer> is binary compatible with
XRootD 4.8.4, but not with 4.10.0 (or newer).
However we definitely need some follow up from our grid experts on this.
@rodwalker <https://github.com/rodwalker> would it be possible to look at
the problematic jobs / grid nodes for this?
Cheers,
Attila
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#6948 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABNFVRUVHO6ZGSG5ZIJCI73SUXHN3ANCNFSM4U2MLUJA>
.
--
Tel. +49 89 289 14152
|
Hi Rod, What does
do? That would be my first suspect. Since
Do you know what that preload is (supposed to be) doing exactly? Cheers, |
Hi,
It is overloading some network related commands to provide a record of what
users are remote accessing. It creates
https://bigpanda.cern.ch//media/filebrowser/5e40cf5d-179e-4126-ad56-e0bb0173cbd5/panda/tarball_PandaJob_4911855304_CERN/pandatracerlog.txt
2020-12-04 18:55:07.949713 : INFO connect:
::2001:1458:301:62:0:0:1094 cmd: runH4lAnalRun2
where IPv6 always rings alarm bells with me. This would be a
node/site,RSE dependence.
Cheers,
Rod.
…On Mon, 14 Dec 2020 at 10:31, Attila Krasznahorkay ***@***.***> wrote:
Hi Rod,
What does
LD_PRELOAD=/srv/workDir/96340ef3-75b1-46cf-8910-8a2f76b7068c/$LIB/wrapper.so
do? That would be my first suspect. Since $LD_LIBRARY_PATH lists our
software directories in the correct order, based on just that XRootD
*should* be found under:
[bash][thor]:~ > ls -l /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrd*
lrwxrwxrwx 1 cvmfs cvmfs 19 Sep 10 13:12 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdAppUtils.so -> libXrdAppUtils.so.1
lrwxrwxrwx 1 cvmfs cvmfs 23 Sep 10 13:12 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdAppUtils.so.1 -> libXrdAppUtils.so.1.0.0
-rwxr-xr-x 1 cvmfs cvmfs 74512 Sep 10 03:19 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdAppUtils.so.1.0.0
-rwxr-xr-x <http://atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdAppUtils.so.1.0.0-rwxr-xr-x> 1 cvmfs cvmfs 18432 Sep 10 03:19 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdBlacklistDecision-4.so
-rwxr-xr-x <http://atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdBlacklistDecision-4.so-rwxr-xr-x> 1 cvmfs cvmfs 82136 Sep 10 03:21 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdBwm-4.so
-rwxr-xr-x <http://atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdBwm-4.so-rwxr-xr-x> 1 cvmfs cvmfs 13552 Sep 10 03:19 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCksCalczcrc32-4.so
lrwxrwxrwx 1 cvmfs cvmfs 17 Sep 10 13:12 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdClient.so -> libXrdClient.so.2
lrwxrwxrwx 1 cvmfs cvmfs 21 Sep 10 13:12 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdClient.so.2 -> libXrdClient.so.2.0.0
-rwxr-xr-x 1 cvmfs cvmfs 663320 Sep 10 03:19 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdClient.so.2.0.0
-rwxr-xr-x <http://atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdClient.so.2.0.0-rwxr-xr-x> 1 cvmfs cvmfs 42096 Sep 10 03:21 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdClProxyPlugin-4.so
lrwxrwxrwx 1 cvmfs cvmfs 13 Sep 10 13:12 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCl.so -> libXrdCl.so.2
lrwxrwxrwx 1 cvmfs cvmfs 17 Sep 10 13:12 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCl.so.2 -> libXrdCl.so.2.0.0
-rwxr-xr-x 1 cvmfs cvmfs 1416944 Sep 10 03:20 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCl.so.2.0.0
lrwxrwxrwx 1 cvmfs cvmfs 21 Sep 10 13:12 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCryptoLite.so -> libXrdCryptoLite.so.1
lrwxrwxrwx 1 cvmfs cvmfs 25 Sep 10 13:12 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCryptoLite.so.1 -> libXrdCryptoLite.so.1.0.0
-rwxr-xr-x 1 cvmfs cvmfs 13632 Sep 10 03:19 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCryptoLite.so.1.0.0
lrwxrwxrwx 1 cvmfs cvmfs 17 Sep 10 13:12 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCrypto.so -> libXrdCrypto.so.1
lrwxrwxrwx 1 cvmfs cvmfs 21 Sep 10 13:12 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCrypto.so.1 -> libXrdCrypto.so.1.0.0
-rwxr-xr-x 1 cvmfs cvmfs 129112 Sep 10 03:19 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCrypto.so.1.0.0
-rwxr-xr-x <http://atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCrypto.so.1.0.0-rwxr-xr-x> 1 cvmfs cvmfs 222064 Sep 10 03:19 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCryptossl-4.so
lrwxrwxrwx 1 cvmfs cvmfs 14 Sep 10 13:12 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdFfs.so -> libXrdFfs.so.2
lrwxrwxrwx 1 cvmfs cvmfs 18 Sep 10 13:12 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdFfs.so.2 -> libXrdFfs.so.2.0.0
-rwxr-xr-x 1 cvmfs cvmfs 65152 Sep 10 03:21 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdFfs.so.2.0.0
-rwxr-xr-x <http://atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdFfs.so.2.0.0-rwxr-xr-x> 1 cvmfs cvmfs 271416 Sep 10 03:21 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdFileCache-4.so
-rwxr-xr-x <http://atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdFileCache-4.so-rwxr-xr-x> 1 cvmfs cvmfs 13104 Sep 10 03:21 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdHttp-4.so
-rwxr-xr-x <http://atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdHttp-4.so-rwxr-xr-x> 1 cvmfs cvmfs 115880 Sep 10 03:21 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdHttpTPC-4.so
lrwxrwxrwx 1 cvmfs cvmfs 20 Sep 10 13:12 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdHttpUtils.so -> libXrdHttpUtils.so.1
lrwxrwxrwx 1 cvmfs cvmfs 24 Sep 10 13:12 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdHttpUtils.so.1 -> libXrdHttpUtils.so.1.0.0
-rwxr-xr-x 1 cvmfs cvmfs 206640 Sep 10 03:21 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdHttpUtils.so.1.0.0
-rwxr-xr-x <http://atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdHttpUtils.so.1.0.0-rwxr-xr-x> 1 cvmfs cvmfs 18824 Sep 10 03:19 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdN2No2p-4.so
-rwxr-xr-x <http://atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdN2No2p-4.so-rwxr-xr-x> 1 cvmfs cvmfs 13304 Sep 10 03:19 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdOssSIgpfsT-4.so
lrwxrwxrwx 1 cvmfs cvmfs 23 Sep 10 13:12 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdPosixPreload.so -> libXrdPosixPreload.so.1
lrwxrwxrwx 1 cvmfs cvmfs 27 Sep 10 13:12 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdPosixPreload.so.1 -> libXrdPosixPreload.so.1.0.0
-rwxr-xr-x 1 cvmfs cvmfs 87568 Sep 10 03:21 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdPosixPreload.so.1.0.0
lrwxrwxrwx 1 cvmfs cvmfs 16 Sep 10 13:12 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdPosix.so -> libXrdPosix.so.2
lrwxrwxrwx 1 cvmfs cvmfs 20 Sep 10 13:12 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdPosix.so.2 -> libXrdPosix.so.2.0.0
-rwxr-xr-x 1 cvmfs cvmfs 195944 Sep 10 03:21 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdPosix.so.2.0.0
-rwxr-xr-x <http://atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdPosix.so.2.0.0-rwxr-xr-x> 1 cvmfs cvmfs 1001552 Sep 10 03:26 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdProofd.so
-rwxr-xr-x <http://atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdProofd.so-rwxr-xr-x> 1 cvmfs cvmfs 83216 Sep 10 03:21 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdPss-4.so
-rwxr-xr-x <http://atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdPss-4.so-rwxr-xr-x> 1 cvmfs cvmfs 70544 Sep 10 03:19 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdSec-4.so
-rwxr-xr-x <http://atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdSec-4.so-rwxr-xr-x> 1 cvmfs cvmfs 220600 Sep 10 03:19 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdSecgsi-4.so
-rwxr-xr-x <http://atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdSecgsi-4.so-rwxr-xr-x> 1 cvmfs cvmfs 19480 Sep 10 03:19 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdSecgsiAUTHZVO-4.so
-rwxr-xr-x <http://atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdSecgsiAUTHZVO-4.so-rwxr-xr-x> 1 cvmfs cvmfs 23808 Sep 10 03:19 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdSecgsiGMAPDN-4.so
-rwxr-xr-x <http://atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdSecgsiGMAPDN-4.so-rwxr-xr-x> 1 cvmfs cvmfs 53384 Sep 10 03:19 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdSeckrb5-4.so
-rwxr-xr-x <http://atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdSeckrb5-4.so-rwxr-xr-x> 1 cvmfs cvmfs 25152 Sep 10 03:19 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdSecProt-4.so
-rwxr-xr-x <http://atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdSecProt-4.so-rwxr-xr-x> 1 cvmfs cvmfs 142864 Sep 10 03:19 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdSecpwd-4.so
-rwxr-xr-x <http://atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdSecpwd-4.so-rwxr-xr-x> 1 cvmfs cvmfs 45192 Sep 10 03:19 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdSecsss-4.so
-rwxr-xr-x <http://atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdSecsss-4.so-rwxr-xr-x> 1 cvmfs cvmfs 19320 Sep 10 03:19 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdSecunix-4.so
lrwxrwxrwx 1 cvmfs cvmfs 17 Sep 10 13:12 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdServer.so -> libXrdServer.so.2
lrwxrwxrwx 1 cvmfs cvmfs 21 Sep 10 13:12 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdServer.so.2 -> libXrdServer.so.2.0.0
-rwxr-xr-x 1 cvmfs cvmfs 1040472 Sep 10 03:19 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdServer.so.2.0.0
-rwxr-xr-x <http://atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdServer.so.2.0.0-rwxr-xr-x> 1 cvmfs cvmfs 134808 Sep 10 03:21 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdSsi-4.so
lrwxrwxrwx 1 cvmfs cvmfs 17 Sep 10 13:12 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdSsiLib.so -> libXrdSsiLib.so.1
lrwxrwxrwx 1 cvmfs cvmfs 21 Sep 10 13:12 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdSsiLib.so.1 -> libXrdSsiLib.so.1.0.0
-rwxr-xr-x 1 cvmfs cvmfs 161352 Sep 10 03:21 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdSsiLib.so.1.0.0
-rwxr-xr-x <http://atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdSsiLib.so.1.0.0-rwxr-xr-x> 1 cvmfs cvmfs 18544 Sep 10 03:21 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdSsiLog-4.so
lrwxrwxrwx 1 cvmfs cvmfs 19 Sep 10 13:12 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdSsiShMap.so -> libXrdSsiShMap.so.1
lrwxrwxrwx 1 cvmfs cvmfs 23 Sep 10 13:12 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdSsiShMap.so.1 -> libXrdSsiShMap.so.1.0.0
-rwxr-xr-x 1 cvmfs cvmfs 39624 Sep 10 03:19 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdSsiShMap.so.1.0.0
-rwxr-xr-x <http://atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdSsiShMap.so.1.0.0-rwxr-xr-x> 1 cvmfs cvmfs 76664 Sep 10 03:21 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdThrottle-4.so
lrwxrwxrwx 1 cvmfs cvmfs 16 Sep 10 13:12 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdUtils.so -> libXrdUtils.so.2
lrwxrwxrwx 1 cvmfs cvmfs 20 Sep 10 13:12 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdUtils.so.2 -> libXrdUtils.so.2.0.0
-rwxr-xr-x 1 cvmfs cvmfs 763032 Sep 10 03:19 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdUtils.so.2.0.0
lrwxrwxrwx 1 cvmfs cvmfs 14 Sep 10 13:12 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdXml.so -> libXrdXml.so.2
lrwxrwxrwx 1 cvmfs cvmfs 18 Sep 10 13:12 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdXml.so.2 -> libXrdXml.so.2.0.0
-rwxr-xr-x 1 cvmfs cvmfs 122928 Sep 10 03:19 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdXml.so.2.0.0
-rwxr-xr-x <http://atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdXml.so.2.0.0-rwxr-xr-x> 1 cvmfs cvmfs 13104 Sep 10 03:21 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdXrootd-4.so
[bash][thor]:~ >
Do you know what that preload is (supposed to be) doing exactly?
Cheers,
Attila
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#6948 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABNFVRQW3GJAVEWH5EBEGX3SUXLNXANCNFSM4U2MLUJA>
.
--
Tel. +49 89 289 14152
|
Hey @Axel-Naumann, Have you found a moment to have a look at this?
|
@simonmichal would you have a recommendation what to look at? |
Well, I doubt there are some out-of-band data being sent/received. @rodwalker, @rdschaffer would it be possible to reproduce the problem with xrootd client logs switched on ( Regarding ABI compatibility, we ensure ABI forward compatibility, meaning that it is safe to link an application built with an older version of xrootd, with a newer version of the library (e.g. one can build his application with say |
OK, I ran with XRD_LOGLEVEL=Dump, and you can see the response after === stderr === saying: Unable to process directory /alrb/.xrootd/client.plugins.d: [ERROR] OS Error: No such file or directory Log file: The file: root://marsedpm.in2p3.fr:1094//dpm/in2p3.fr/home/atlas/atlasdatadisk/rucio/mc16_13TeV/9c/ab/DAOD_HIGG2D1.23315577._000001.pool.root.1 of course opens correctly for a simple TOpen in any interactive ROOT session.
|
The above is running in Marseilles: CCIN2P3-CCPM. Another for reading from eos from the CERN-T0 facility: |
Is it possible to determine the exact version of xrootd client that is being used? Unfortunately, the crash happens before the client logs in so I cannot see it from logs. The server reported protocol version |
Does this help: 2020-12-16 12:22:18,612 | INFO | Thread-1 | gfal2 | connect | [gfal_module_load] plugin /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/emi/4.0.2-1_200423.fix1/usr/lib64/gfal2-plugins//libgfal_plugin_xrootd.so loaded with success |
Marseilles job logs are in: and Cern jobs logs are in: |
Hi,
Submit a job with compiled C to just open the Marseille file (code at
bottom)
https://bigpanda.cern.ch/job?pandaid=4923453571
It has the same release, and it works! I am not sure if anything else is
different, but it points at the specific code rather than a pure TFile open
problem.
Cheers,
Rod.
$ cat main.C
#include <iostream>
#include <string>
#include "TFile.h"
using namespace std;
int main()
{
TFile* davixFile = TFile::Open("root://
eosatlas.cern.ch:1094//eos/atlas/atlasdatadisk/rucio/mc16_13TeV/25/31/DAOD_HIGG2D1.23315648._000001.pool.root.1
","READ");
cout << "coucou 5" << endl;
davixFile->ls();
davixFile->Close();
return 0;
}
…On Wed, 16 Dec 2020 at 15:50, rdschaffer ***@***.***> wrote:
Marseilles job logs are in:
marseilles
<https://bigpanda.cern.ch/filebrowser/?guid=00354dec-89f9-4687-bc9e-d0151ddff358&lfn=panda.um.group.phys-higgs.user.schaffer.mc16_13TeV.500995.H4lMinitree_nominal.0.16e..201216_01.log.23578674.000051.log.tgz&site=IN2P3-CPPM/SCORE&scope=panda&fileid=23156311480>
and Cern jobs logs are in:
Cern
<https://bigpanda.cern.ch/filebrowser/?guid=52428b18-b810-4194-be8a-fb11e92bc4f8&lfn=panda.um.group.phys-higgs.user.schaffer.mc16_13TeV.500995.H4lMinitree_nominal.0.16e..201216_01.log.23578674.000050.log.tgz&site=CERN-T0/SCORE&scope=panda&fileid=23156311459>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#6948 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABNFVRTBK3POEBIWCBST673SVDCJNANCNFSM4U2MLUJA>
.
--
Tel. +49 89 289 14152
|
Hi Rod, 😕 So, how did you compile that code exactly? Just In that case XRootD would be picked up from
This is why I said at the beginning, that I'm suspicious about the Best, |
Right, forgot to mention I don`t know what I`m doing with C. A colleague
gave me
g++ $(root-config --cflags --libs) -o main main.C
which I`m doing after the asetup of the same release. Does ldd answer the
question?
$ ldd main.mars | grep -i root
libROOTVecOps.so => /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libROOTVecOps.so
(0x00007f8b07f3d000)
libROOTDataFrame.so => /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libROOTDataFrame.so
(0x00007f8b062af000)
libROOTNTuple.so => /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libROOTNTuple.so
(0x00007f8b04041000)
I don`t think the LD_PRELOAD has any xroot stuff - it is at a lower level
to get network ops (hton or whatever). The code is from
wget http://pandaserver.cern.ch:25085/trf/user/runGen-00-00-02
chmod u+x runGen-00-00-02
./runGen-00-00-02
less pandawnutil/tracer/wrapper.c
Cheers,
Rod.
$ ldd main.mars
linux-vdso.so.1 => (0x00007fff34109000)
libCore.so => /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libCore.so
(0x00007f09e6284000)
libImt.so => /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libImt.so
(0x00007f09e6077000)
libRIO.so => /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libRIO.so
(0x00007f09e5adb000)
libNet.so => /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libNet.so
(0x00007f09e57fc000)
libHist.so => /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libHist.so
(0x00007f09e5210000)
libGraf.so => /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libGraf.so
(0x00007f09e4e22000)
libGraf3d.so => /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libGraf3d.so
(0x00007f09e4b71000)
libGpad.so => /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libGpad.so
(0x00007f09e488a000)
libROOTVecOps.so => /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libROOTVecOps.so
(0x00007f09e45b2000)
libTree.so => /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libTree.so
(0x00007f09e4233000)
libTreePlayer.so => /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libTreePlayer.so
(0x00007f09e3eae000)
libRint.so => /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libRint.so
(0x00007f09e3c85000)
libPostscript.so => /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libPostscript.so
(0x00007f09e3a0d000)
libMatrix.so => /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libMatrix.so
(0x00007f09e3695000)
libPhysics.so => /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libPhysics.so
(0x00007f09e3448000)
libMathCore.so => /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libMathCore.so
(0x00007f09e3037000)
libThread.so => /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libThread.so
(0x00007f09e2de4000)
libMultiProc.so => /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libMultiProc.so
(0x00007f09e2bd7000)
libROOTDataFrame.so => /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libROOTDataFrame.so
(0x00007f09e2924000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f09e2720000)
libstdc++.so.6 => /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/sw/lcg/releases/gcc/8.3.0-cebb0/x86_64-centos7/lib64/libstdc++.so.6
(0x00007f09e2397000)
libm.so.6 => /lib64/libm.so.6 (0x00007f09e2095000)
libgcc_s.so.1 => /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/sw/lcg/releases/gcc/8.3.0-cebb0/x86_64-centos7/lib64/libgcc_s.so.1
(0x00007f09e1e7d000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f09e1c61000)
libc.so.6 => /lib64/libc.so.6 (0x00007f09e1893000)
libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f09e1631000)
libz.so.1 => /lib64/libz.so.1 (0x00007f09e141b000)
/lib64/ld-linux-x86-64.so.2 (0x00007f09e6946000)
libtbb.so.2 => /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libtbb.so.2
(0x00007f09e11db000)
libssl.so.10 => /lib64/libssl.so.10 (0x00007f09e0f69000)
libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007f09e0b06000)
libvdt.so => /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libvdt.so
(0x00007f09e08fe000)
libROOTNTuple.so => /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libROOTNTuple.so
(0x00007f09e06b6000)
librt.so.1 => /lib64/librt.so.1 (0x00007f09e04ae000)
libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x00007f09e0261000)
libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007f09dff78000)
libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f09dfd74000)
libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007f09dfb41000)
libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x00007f09df931000)
libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007f09df72d000)
libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f09df513000)
libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f09df2ec000)
…On Wed, 16 Dec 2020 at 16:08, Attila Krasznahorkay ***@***.***> wrote:
Hi Rod,
😕 So, how did you compile that code exactly? Just g++ main.cxx, right?
In that case XRootD would be picked up from /usr. Which doesn't tell us
much about our problem. Since RD's test job will pick up XRootD from:
/cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/
This is why I said at the beginning, that I'm suspicious about the
LD_PRELOAD setting. If that library wants to use XRootD, but it was
compiled against a different version of XRootD than what the analysis
release comes with, then we're in trouble. Note that *all* ATLAS releases
come with their own version of XRootD, not just the analysis releases. So
any grid node setup that wants to force one particular version of XRootD on
the job, will give us a really bad time...
Best,
Attila
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#6948 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABNFVRWZO2A6S5ZA3H35VYDSVDENRANCNFSM4U2MLUJA>
.
--
Tel. +49 89 289 14152
|
Hmm... That in principle looks fine... So okay, your test job is relevant. Unfortunately I'm running out of ideas. The XRootD build in
Could the version of some of these not be "well defined" on the grid nodes? |
RD says it is line 1244 that causes a sigsegv
https://gitlab.cern.ch/HZZ/HZZSoftware/HZZAnalRun2Code/-/blob/changes-for-v25-fJVT/H4lAnalysisRun2/Root/H4lAnalRun2Init.cxx
At least nothing of the ATH_MSG_ERROR in the subsequent lines makes it into
the log.
I`m not sure how close I can get to that in my test.
Cheers,
Rod.
…On Wed, 16 Dec 2020 at 16:50, Attila Krasznahorkay ***@***.***> wrote:
Hmm... That in principle looks fine... So okay, your test job *is*
relevant.
Unfortunately I'm running out of ideas. The XRootD build in
AnalysisBaseExternals does depend on a couple of libraries from the OS.
But these should only be things that are part of HEP_OSlibs. So the worker
nodes should not really have different versions of them...
[bash][lxplus730]:~ > ldd -r /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrd*.so | grep " /lib" | sed "s/\(.*\) (0x.*)/\1/g" | sort | uniq
libc.so.6 => /lib64/libc.so.6
libcom_err.so.2 => /lib64/libcom_err.so.2
libcrypt.so.1 => /lib64/libcrypt.so.1
libcrypto.so.10 => /lib64/libcrypto.so.10
libcurl.so.4 => /lib64/libcurl.so.4
libdl.so.2 => /lib64/libdl.so.2
libfreebl3.so => /lib64/libfreebl3.so
libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2
libidn.so.11 => /lib64/libidn.so.11
libk5crypto.so.3 => /lib64/libk5crypto.so.3
libkeyutils.so.1 => /lib64/libkeyutils.so.1
libkrb5.so.3 => /lib64/libkrb5.so.3
libkrb5support.so.0 => /lib64/libkrb5support.so.0
liblber-2.4.so.2 => /lib64/liblber-2.4.so.2
libldap-2.4.so.2 => /lib64/libldap-2.4.so.2
libm.so.6 => /lib64/libm.so.6
libnspr4.so => /lib64/libnspr4.so
libnss3.so => /lib64/libnss3.so
libnssutil3.so => /lib64/libnssutil3.so
libpcre.so.1 => /lib64/libpcre.so.1
libplc4.so => /lib64/libplc4.so
libplds4.so => /lib64/libplds4.so
libpthread.so.0 => /lib64/libpthread.so.0
libresolv.so.2 => /lib64/libresolv.so.2
librt.so.1 => /lib64/librt.so.1
libsasl2.so.3 => /lib64/libsasl2.so.3
libselinux.so.1 => /lib64/libselinux.so.1
libsmime3.so => /lib64/libsmime3.so
libssh2.so.1 => /lib64/libssh2.so.1
libssl.so.10 => /lib64/libssl.so.10
libssl3.so => /lib64/libssl3.so
libz.so.1 => /lib64/libz.so.1
[bash][lxplus730]:~ >
Could the version of some of these not be "well defined" on the grid nodes?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#6948 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABNFVRUAMSYVG52Y6VA3CU3SVDJNFANCNFSM4U2MLUJA>
.
--
Tel. +49 89 289 14152
|
Hi Rod, Well, I added a 'print' before and after 1244 in the current jobs - didn't check it it. So this looks like:
and in the log, one sees: `H4lAnalRun2 INFO processEvents: try to open file: root://eosatlas.cern.ch:1094//eos/atlas/atlasdatadisk/rucio/mc16_13TeV/25/31/DAOD_HIGG2D1.23315648._000001.pool.root.1 === stderr === So one sees the 'try to open file', then there is the TFile::Open, and nothing else. So I conclude that this is coming from the Open.
|
Well, one thing that is clear is that this problem seems to be associated with specific sites. For my 'test' job: The sites that are successful either have local reading, or they use xrootd without problems. The latter are: For the failures, these are all just xrootd problems, at sites: So I would suspect some difference in the xrootd installation between these two sites. (I personally have no idea how to check this.) |
@rdschaffer : could you add following code to your job:
and then at the beginning of your
This will print paths of all the loaded shared libraries to stdout. |
Hi @simonmichal, Jobs are running. For a "failed" job at our CERN T0 reading from eos, have a look here. Let me know if you don't have access, and I'll make a pdf file.
|
I don't see libXrxxx in the list. Would this appear later after a request to xrootd has been made? I put the call at the very beginning, as suggested: int main( int argc, char* argv[] ) {
|
@rodwalker : hmm, let me dwell on this for a minute ... |
I'm bit puzzled here, if I link dummy |
Hi,
I had just switched to gdb. Just submitted a test with your -ex too. Will
send the outputs.
I also got the core file - can you do anything with that. I see
(gdb) backtrace
#0 0x00002ac87cbfee5d in __res_context_search () from /lib64/libresolv.so.2
#1 0x00002ac889833f09 in _nss_dns_gethostbyname4_r () from
/lib64/libnss_dns.so.2
#2 0x00002ac8530161c4 in gaih_inet.constprop.8 () from /lib64/libc.so.6
#3 0x00002ac853017564 in getaddrinfo () from /lib64/libc.so.6
#4 0x00002ac888e18ffd in XrdNetAddr::Set(char const*, int) ()
from /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdUtils.so.2
#5 0x00002ac888e20d6b in XrdNetUtils::MyHostName(char const*, char
const**) ()
from /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdUtils.so.2
#6 0x00002ac88936148d in
XrdCl::XRootDTransport::GenerateLogIn(XrdCl::HandShakeData*,
XrdCl::XRootDChannelInfo*) ()
from /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCl.so.2
#7 0x00002ac889365b6e in
XrdCl::XRootDTransport::HandShakeMain(XrdCl::HandShakeData*,
XrdCl::AnyObject&) ()
from /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCl.so.2
#8 0x00002ac889365cac in
XrdCl::XRootDTransport::HandShake(XrdCl::HandShakeData*, XrdCl::AnyObject&)
()
from /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCl.so.2
#9 0x00002ac8893c0580 in
XrdCl::AsyncSocketHandler::OnReadWhileHandshaking() ()
from /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCl.so.2
#10 0x00002ac8893c0915 in XrdCl::AsyncSocketHandler::Event(unsigned char,
XrdCl::Socket*) ()
from /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCl.so.2
#11 0x00002ac889355c4b in (anonymous
namespace)::SocketCallBack::Event(XrdSys::IOEvents::Channel*, void*, int) ()
from /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCl.so.2
#12 0x00002ac888df384a in
XrdSys::IOEvents::Poller::CbkXeq(XrdSys::IOEvents::Channel*, int, int, char
const*) ()
from /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdUtils.so.2
#13 0x00002ac888df4d57 in
XrdSys::IOEvents::PollE::Dispatch(XrdSys::IOEvents::Channel*, unsigned int)
()
from /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdUtils.so.2
#14 0x00002ac888df4f62 in XrdSys::IOEvents::PollE::Begin(XrdSysSemaphore*,
int&, char const**) ()
from /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdUtils.so.2
#15 0x00002ac888df1655 in XrdSys::IOEvents::BootStrap::Start(void*) ()
from /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdUtils.so.2
#16 0x00002ac888dfa488 in XrdSysThread_Xeq ()
from /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdUtils.so.2
#17 0x00002ac85277aea5 in start_thread () from /lib64/libpthread.so.0
#18 0x00002ac85302e8dd in sysctl () from /lib64/libc.so.6
#19 0x0000000000000000 in ?? ()
Cheers,
Rod.
|
The gdb output with your -ex ....
H4lAnalRun2 INFO processEvents: try to open file:
root://dcgftp.usatlas.bnl.gov:1096//pnfs/usatlas.bnl.gov/BNLT0D1/rucio/mc16_13TeV/90/56/DAOD_HIGG2D1.23315538._000001.pool.root.1
[New Thread 0x2aaae4902700 (LWP 2522)]
[New Thread 0x2aaae50e7700 (LWP 2523)]
[New Thread 0x2aaae58cc700 (LWP 2524)]
[New Thread 0x2aaae60b1700 (LWP 2525)]
[New Thread 0x2aaae6896700 (LWP 2526)]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x2aaae4902700 (LWP 2522)]
0x00002aaad6426e5d in __res_context_search () from /lib64/libresolv.so.2
Thread 6 (Thread 0x2aaae6896700 (LWP 2526)):
#0 0x00002aaaac84fbf9 in syscall () from /lib64/libc.so.6
#1 0x00002aaae2becb5d in XrdCl::JobManager::RunJobs() () from
/cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCl.so.2
#2 0x00002aaae2bece99 in RunRunnerThread () from
/cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCl.so.2
#3 0x00002aaaabfa1ea5 in start_thread () from /lib64/libpthread.so.0
#4 0x00002aaaac8558dd in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x2aaae60b1700 (LWP 2525)):
#0 0x00002aaaac84fbf9 in syscall () from /lib64/libc.so.6
#1 0x00002aaae2becb5d in XrdCl::JobManager::RunJobs() () from
/cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCl.so.2
#2 0x00002aaae2bece99 in RunRunnerThread () from
/cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCl.so.2
#3 0x00002aaaabfa1ea5 in start_thread () from /lib64/libpthread.so.0
#4 0x00002aaaac8558dd in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x2aaae58cc700 (LWP 2524)):
#0 0x00002aaaac84fbf9 in syscall () from /lib64/libc.so.6
#1 0x00002aaae2becb5d in XrdCl::JobManager::RunJobs() () from
/cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCl.so.2
#2 0x00002aaae2bece99 in RunRunnerThread () from
/cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCl.so.2
#3 0x00002aaaabfa1ea5 in start_thread () from /lib64/libpthread.so.0
#4 0x00002aaaac8558dd in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x2aaae50e7700 (LWP 2523)):
#0 0x00002aaaabfa8e9d in nanosleep () from /lib64/libpthread.so.0
#1 0x00002aaae2622ca6 in XrdSysTimer::Wait(int) () from
/cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdUtils.so.2
#2 0x00002aaae2b9113a in XrdCl::TaskManager::RunTasks() () from
/cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCl.so.2
#3 0x00002aaae2b91299 in RunRunnerThread () from
/cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCl.so.2
#4 0x00002aaaabfa1ea5 in start_thread () from /lib64/libpthread.so.0
#5 0x00002aaaac8558dd in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x2aaae4902700 (LWP 2522)):
#0 0x00002aaad6426e5d in __res_context_search () from /lib64/libresolv.so.2
#1 0x00002aaae305bf09 in _nss_dns_gethostbyname4_r () from
/lib64/libnss_dns.so.2
#2 0x00002aaaac83d1c4 in gaih_inet.constprop.8 () from /lib64/libc.so.6
#3 0x00002aaaac83e564 in getaddrinfo () from /lib64/libc.so.6
#4 0x00002aaae2640ffd in XrdNetAddr::Set(char const*, int) () from
/cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdUtils.so.2
#5 0x00002aaae2648d6b in XrdNetUtils::MyHostName(char const*, char
const**) () from
/cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdUtils.so.2
#6 0x00002aaae2b8948d in
XrdCl::XRootDTransport::GenerateLogIn(XrdCl::HandShakeData*,
XrdCl::XRootDChannelInfo*) () from
/cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCl.so.2
#7 0x00002aaae2b8db6e in
XrdCl::XRootDTransport::HandShakeMain(XrdCl::HandShakeData*,
XrdCl::AnyObject&) () from
/cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCl.so.2
#8 0x00002aaae2b8dcac in
XrdCl::XRootDTransport::HandShake(XrdCl::HandShakeData*,
XrdCl::AnyObject&) () from
/cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCl.so.2
#9 0x00002aaae2be8580 in
XrdCl::AsyncSocketHandler::OnReadWhileHandshaking() () from
/cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCl.so.2
#10 0x00002aaae2be8915 in XrdCl::AsyncSocketHandler::Event(unsigned
char, XrdCl::Socket*) () from
/cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCl.so.2
#11 0x00002aaae2b7dc4b in (anonymous
namespace)::SocketCallBack::Event(XrdSys::IOEvents::Channel*, void*,
int) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCl.so.2
#12 0x00002aaae261b84a in
XrdSys::IOEvents::Poller::CbkXeq(XrdSys::IOEvents::Channel*, int, int,
char const*) () from
/cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdUtils.so.2
#13 0x00002aaae261cd57 in
XrdSys::IOEvents::PollE::Dispatch(XrdSys::IOEvents::Channel*, unsigned
int) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdUtils.so.2
#14 0x00002aaae261cf62 in
XrdSys::IOEvents::PollE::Begin(XrdSysSemaphore*, int&, char const**)
() from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdUtils.so.2
#15 0x00002aaae2619655 in XrdSys::IOEvents::BootStrap::Start(void*) ()
from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdUtils.so.2
#16 0x00002aaae2622488 in XrdSysThread_Xeq () from
/cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdUtils.so.2
#17 0x00002aaaabfa1ea5 in start_thread () from /lib64/libpthread.so.0
#18 0x00002aaaac8558dd in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2aaad704dc80 (LWP 2495)):
#0 0x00002aaaabfa5a35 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1 0x00002aaae26224c6 in XrdSysCondVar::Wait() () from
/cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdUtils.so.2
#2 0x00002aaae2bb4558 in
XrdCl::File::Open(std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const&,
XrdCl::OpenFlags::Flags, XrdCl::Access::Mode, unsigned short) () from
/cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCl.so.2
#3 0x00002aaae23dcfd9 in TNetXNGFile::TNetXNGFile(char const*, char
const*, char const*, char const*, int, int, bool) () from
/cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libNetxNG.so
#4 0x00002aaae23ddecc in TNetXNGFile::TNetXNGFile(char const*, char
const*, char const*, int, int, bool) () from
/cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libNetxNG.so
#5 0x00002aaae1fef643 in ?? ()
#6 0x00002aaa00000000 in ?? ()
#7 0x00007ffffffeb7c4 in ?? ()
#8 0x0000000001b8de70 in ?? ()
#9 0x00002aaaabf493f8 in vtable for TString () from
/cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libCore.so
#10 0x0000005e00000061 in ?? ()
#11 0x0000000001bc8380 in ?? ()
#12 0x00002aaaabf493f8 in vtable for TString () from
/cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libCore.so
#13 0x0000000000000000 in ?? ()
On Thu, 14 Jan 2021 at 14:33, Rodney Walker <
rodney.walker@physik.uni-muenchen.de> wrote:
… Hi,
I had just switched to gdb. Just submitted a test with your -ex too. Will
send the outputs.
I also got the core file - can you do anything with that. I see
(gdb) backtrace
#0 0x00002ac87cbfee5d in __res_context_search () from
/lib64/libresolv.so.2
#1 0x00002ac889833f09 in _nss_dns_gethostbyname4_r () from
/lib64/libnss_dns.so.2
#2 0x00002ac8530161c4 in gaih_inet.constprop.8 () from /lib64/libc.so.6
#3 0x00002ac853017564 in getaddrinfo () from /lib64/libc.so.6
#4 0x00002ac888e18ffd in XrdNetAddr::Set(char const*, int) ()
from /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdUtils.so.2
#5 0x00002ac888e20d6b in XrdNetUtils::MyHostName(char const*, char
const**) ()
from /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdUtils.so.2
#6 0x00002ac88936148d in
XrdCl::XRootDTransport::GenerateLogIn(XrdCl::HandShakeData*,
XrdCl::XRootDChannelInfo*) ()
from /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCl.so.2
#7 0x00002ac889365b6e in
XrdCl::XRootDTransport::HandShakeMain(XrdCl::HandShakeData*,
XrdCl::AnyObject&) ()
from /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCl.so.2
#8 0x00002ac889365cac in
XrdCl::XRootDTransport::HandShake(XrdCl::HandShakeData*, XrdCl::AnyObject&)
()
from /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCl.so.2
#9 0x00002ac8893c0580 in
XrdCl::AsyncSocketHandler::OnReadWhileHandshaking() ()
from /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCl.so.2
#10 0x00002ac8893c0915 in XrdCl::AsyncSocketHandler::Event(unsigned char,
XrdCl::Socket*) ()
from /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCl.so.2
#11 0x00002ac889355c4b in (anonymous
namespace)::SocketCallBack::Event(XrdSys::IOEvents::Channel*, void*, int) ()
from /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdCl.so.2
#12 0x00002ac888df384a in
XrdSys::IOEvents::Poller::CbkXeq(XrdSys::IOEvents::Channel*, int, int, char
const*) ()
from /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdUtils.so.2
#13 0x00002ac888df4d57 in
XrdSys::IOEvents::PollE::Dispatch(XrdSys::IOEvents::Channel*, unsigned int)
()
from /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdUtils.so.2
#14 0x00002ac888df4f62 in XrdSys::IOEvents::PollE::Begin(XrdSysSemaphore*,
int&, char const**) ()
from /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdUtils.so.2
#15 0x00002ac888df1655 in XrdSys::IOEvents::BootStrap::Start(void*) ()
from /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdUtils.so.2
#16 0x00002ac888dfa488 in XrdSysThread_Xeq ()
from /cvmfs/
atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdUtils.so.2
#17 0x00002ac85277aea5 in start_thread () from /lib64/libpthread.so.0
#18 0x00002ac85302e8dd in sysctl () from /lib64/libc.so.6
#19 0x0000000000000000 in ?? ()
Cheers,
Rod.
On Thu, 14 Jan 2021 at 14:20, Axel Naumann ***@***.***>
wrote:
> Can you submit the job with gdb -ex r -ex "thread apply all bt" --args
> python... instead of valgrind? That should give us a backtrace at the
> point of the crash.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#6948 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ABNFVRVOXKQAA66ZS4O7KSTSZ3VSLANCNFSM4U2MLUJA>
> .
>
--
Tel. +49 89 289 14152
--
Tel. +49 89 289 14152
|
@rodwalker : before examining the core dump could you install could you then print the value of |
@simonmichal |
Hi,
I can`t install things on lxplus. Is it enough for you to have the core and
exe?
/afs/cern.ch/user/w/walkerr/public/RD.core
/afs/cern.ch/user/w/walkerr/public/runH4lAnalRun2
?
Cheers,
Rod.
…On Thu, 14 Jan 2021 at 15:36, Axel Naumann ***@***.***> wrote:
@simonmichal <https://github.com/simonmichal> XrdNetUtils::MyHostName()
does getaddrinfo() for the local iface? Looks like that fails here? Okay
stepping back to the sideline to watch ;-)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#6948 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABNFVRXXG4ULSQJU6ZN4KBDSZ36NJANCNFSM4U2MLUJA>
.
--
Tel. +49 89 289 14152
|
@rodwalker : yes, it should be enough, thanks a lot |
@rodwalker : against which version of xrootd was the executable built? the one on lxplus or some version from cvmfs? |
Hi @simonmichal, We build against cvmfs. Attila gave the versions above, use XRootD 4.10.0, in his Dec 14th entry. |
-rwxr-xr-x 1 cvmfs cvmfs 13104 Sep 10 03:21 /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.139/InstallArea/x86_64-centos7-gcc8-opt/lib/libXrdXrootd-4.so (from above) |
Could it be that the iP obtained from the 'host name' which in the log seems to be: Attempting connection to [::ffff:10.42.38.55]:1096 will have iP == '[' ?? Looks like the name before the first ':' is iP, if I am not mistaken... |
it could be also the the call to @Axel-Naumann : are there debug symbols on cvmfs? were the libs rebuild or installed from EPEL? |
Right. It is trying to get the local host name, not for the file... |
Rob found: acas1035.usatlas.bnl.gov for gethostname in some c-code... Should he also try to call getaddrinfo? Not sure is hostHints is default or not. |
Well, it segvs in here: https://github.com/xrootd/xrootd/blob/stable-4.12.x/src/XrdNet/XrdNetAddr.cc#L268, right? For now, the only scenario where something could go wrong, that I see is when That said, there are not so many reasons for
|
Is there a way to see the variable values in the core with gdb? I don't think that we can understand this without seeing them. A simple gethostname works properly, as expected. |
Hi,
I would have thought the relevant hostname is that of the xroot door,
rather than the WN hostname. This one has a port and maybe the weird IPv6
form.
// Convert the address as appropriate. Note that we do accept RFC5156
deprecated
// IPV4 mapped IPV6 addresses(i.e. [::a.b.c.d]. This is historical.
but the BNL ones look like [ffff:a.b.c.d]. That ok too?
Cheers,
Rod.
…On Thu, 14 Jan 2021 at 19:22, rdschaffer ***@***.***> wrote:
Is there a way to see the variable values in the core with gdb? I don't
think that we can understand this without seeing them. A simple gethostname
works properly, as expected.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#6948 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABNFVRUCMYTVAMVVQ5YIULLSZ4Y67ANCNFSM4U2MLUJA>
.
--
Tel. +49 89 289 14152
|
To inspect the variables we need debug symbols, they need to come from the same build because gdb is validating the crc. |
@krasznaa, is it possible to make a dbg build for xrootd that would be available on cvmfs? Not sure if this is easy. |
Normally the release build is stripped from debug symbols and they are installed in a separate location (e.g. /usr/lib/debug), you guys don't do this for cvmfs builds? |
Hi,
1) Jobs at BNL fail regardless of the source file.
Backs up the gethostname theory.
2) I cannot reproduce with a simple C program containing a TFile::Open,
built after setting up the same release
g++ $(root-config --cflags --libs) -o main main.C
and running after the same setup.
Contradicts (1)
I have not tried to build the binary that fails, so cannot really say my
small C program is built in the same way. Maybe RD could do that. Or
brutally strip down your code until it either starts working or is a simple
TFile::)open.
I am really out of ideas unless the 1,2 contradiction can be removed.
Could we build an xroot lib with more debug statements, and pass that with
the job?
I could ask sites about worker node characteristics to find a pattern, but
I`m not sure what to ask about.
IPv6, nscd, dns?
Cheers,
Rod.
…On Fri, 15 Jan 2021 at 06:55, Axel Naumann ***@***.***> wrote:
@gganis <https://github.com/gganis> @peremato
<https://github.com/peremato> would you know whether the xrootd libraries
have their symbols stripped, or who might know?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#6948 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABNFVRQWK7YZM6KZ5REVCTLSZ7KGFANCNFSM4U2MLUJA>
.
--
Tel. +49 89 289 14152
|
Hi there, OK, I think that I have found the culprit, but I don't understand the reasons. The main difference of our current 'minitree production' jobs this round is that we included a build of the MCFM physics generator with the build of our analysis code and we use it while running to calculate physics matrix elements for each event. So to test this, I rebuilt our analysis code without building MCFM, and of course no longer calculate the matrix elements. I submitted this to the BNL site, and we read the files fine: H4lAnalRun2 INFO processEvents: try to open file: root://dcgftp.usatlas.bnl.gov:1096//pnfs/usatlas.bnl.gov/BNLT0D1/rucio/mc16_13TeV/84/1f/DAOD_HIGG2D1.23315636._000001.pool.root.1 which as you may remember is the TFile::Open on a file which would use xrootd access. The job continues fine, reading 6 files, as one would expect. Now there is no matrix element calculated before we start reading the events. So it must be that somehow linking in the MCFM libraries causes problems for calling the gethostname. I must admit that I have no idea how/why this would 'interfere', since MCFM is not run at all before the TFile::Open. So I think that we can let this bug report rest for now. If anyone might have ideas on how to check or fix the MCFM problem, suggestions are welcome. But I no longer think that xrootd has a problem. This is clearly a problem in how we have set up our client code. Thanks all for your time spent on this!
|
@simonmichal just FYI - it's not that @rdschaffer my first guess would be a stack exhaustion. You can check with changing the |
@Axel-Naumann : correct, just to clarify my theory, Anyway, I'm glad it has been sorted out :-) |
Re-joining the discussion a bit late... Installing debug symbols for our analysis releases on CVMFS would be pretty difficult. Our builds do produce a separate RPM for the debug symbols of our own code. (Though we didn't even use that machinery for the analysis releases yet.) But when we build XRootD for our standalone analysis release, we don't bother with the "RelWithDebInfo" CMake build mode. https://gitlab.cern.ch/atlas/atlasexternals/-/blob/1.0/External/XRootD/CMakeLists.txt#L55-60 This is because every "external project" has a different implementation for this. And coding up how we would produce just one "ATLAS RPM" that contains just the debug symbol files for all the externals seemed way too much trouble. For such debugging we would use a full-on Debug build instead. But the more relevant thing: Does XRootD, or any of the I/O libraries that it uses, make use of OpenMP? Putting aside all the weird linking issue possibilities, the one unusual thing that RD's MCFM build does is that it sets the following environment variable for the jobs: export OMP_STACKSIZE=16000 Since apparently MCFM does use OpenMP. (This I checked.) Though I don't know why this variable would need to be set manually. With a quick Google search I saw that for instance gfal, at least at one point, used OpenMP. So I wonder if maybe OpenMP is responsible for something here. It shouldn't interfere with ROOT's usage of TBB (at least I don't think so), but maybe with some I/O library? |
@krasznaa : no, we don't use OpenMP in xrootd. |
@rdschaffer, it could be interesting to try running your job at BNL with:
For both of these of course you need to edit in your submission directory. It's likely that OpenMP is just a red herring, but this way at least we would know for sure. |
@krasznaa neither removing |
Hi there,
Running root-based reading analysis jobs in ATLAS, we are having problems trying to understand why some jobs fail on certain sites at file open when reading remote files with xrootd. We are using ROOT version 6.18/04. (I don't think that we have problems with 6.16/00, and a few tests indicate that 6.20/06 also had this problem.)
What we see is that for a file open:
on a grid site node, the job exits with status code 139, which I believe is SIGURG - Urgent condition on socket (4.2BSD).
The status code from TApplication::HandleException is 128 + root enum, and 11 is kSigUrgent.
See:
https://root.cern.ch/doc/master/TApplication_8cxx_source.html#l00602
https://root.cern.ch/doc/master/TSysEvtHandler_8h_source.html#l00107
Running the same program interactively on the same file works fine. And it seems that only some sites with remote reading are failing. So we would like to ask for help in trying to track this down.
Currently, there is no stack trace to help understand things, and a simple 'print' just after TFile::Open is not printed.
I tried to add:
thinking that https://root.cern.ch/doc/master/TApplication_8cxx_source.html#l00602
void TApplication::HandleException might throw an exception, but this does not work.
So suggestions would be welcome. Is there a way to get a stack trace or more information on what is going on in the I/O part of this file open?
I don't know how to add in watchers for people in ATLAS, or a mailing list. But I did find @krasznaa.
The text was updated successfully, but these errors were encountered: