Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiler issues on newly reinstalled test-osuosl-aix72-ppc64-1 #2334

Closed
Haroon-Khel opened this issue Oct 4, 2021 · 24 comments
Closed

Compiler issues on newly reinstalled test-osuosl-aix72-ppc64-1 #2334

Haroon-Khel opened this issue Oct 4, 2021 · 24 comments

Comments

@Haroon-Khel
Copy link
Contributor

Building jdk16 I encountered this error

checking if static build is enabled... disabled, default
configure: error: xlclang++ version output check failed, output: /home/jenkins/temurin-build/workspace/build/src/build/.configure-support/generated-configure.sh: line 71904: xlclang++: command not found
configure exiting with result code 1
Error: No configurations found for /home/jenkins/temurin-build/workspace/build/src.

This may be accounted for by an improper PATH variable while building so I'll re test this, but I also his this error while building jdk11

Creating hotspot/variant-server/tools/adlc/adlc from 13 file(s)
Compiling 1 files for BUILD_JFR_TOOLS
/opt/freeware/bin/bash: fork: retry: Resource temporarily unavailable
/opt/freeware/bin/bash: fork: retry: Resource temporarily unavailable
/opt/freeware/bin/bash: fork: retry: Resource temporarily unavailable
/opt/freeware/bin/bash: fork: retry: Resource temporarily unavailable
gmake[3]: *** [gensrc/GensrcAdlc.gmk:71: /home/jenkins/temurin-build/workspace/build/src/build/aix-ppc64-normal-server-release/hotspot/variant-server/tools/adlc/objs/output_c.o] Error 127
gmake[3]: *** Waiting for unfinished jobs....
/opt/freeware/bin/bash: fork: retry: Resource temporarily unavailable
/opt/IBM/xlC/13.1.3/bin/.orig/xlC_r: 1501-222 (S) cannot fork process - Resource temporarily unavailable
@sxa
Copy link
Member

sxa commented Oct 4, 2021

configure: error: xlclang++ version output check failed, output: /home/jenkins/temurin-build/workspace/build/src/build/.configure-support/generated-configure.sh: line 71904: xlclang++: command not found

Does sound like a PATH issue although I thought that should be set correctly in aix.sh

/opt/freeware/bin/bash: fork: retry: Resource temporarily unavailable

Hmmm I've got a feeling we've seen that one before ... May be the same as #1320 although annoyingly I seemingly didn't cut & paste the error message in there.

@sxa sxa added this to the October 2021 milestone Oct 4, 2021
@sxa
Copy link
Member

sxa commented Oct 4, 2021

See also things like adoptium/temurin-build#1450 (comment) which mentions maxuproc as being a problem in the past, but there may be other things that could help in that issue's comments.

@Haroon-Khel
Copy link
Contributor Author

While using the correct PATH variable, there still seems to be a problem with the jdk16 build

/opt/freeware/bin/bash: fork: retry: Resource temporarily unavailable
/opt/freeware/bin/bash: fork: retry: Resource temporarily unavailable
/opt/freeware/bin/bash: fork: retry: Resource temporarily unavailable
/opt/IBM/xlC/16.1.0/bin/.orig/xlclang++: error: 1501-222 cannot fork process - Resource temporarily unavailable
/opt/freeware/bin/bash: fork: retry: Resource temporarily unavailable
/opt/freeware/bin/bash: fork: retry: Resource temporarily unavailable
/opt/IBM/xlC/16.1.0/bin/.orig/xlclang++: error: 1501-222 cannot fork process - Resource temporarily unavailable
gmake[3]: *** [gensrc/GensrcAdlc.gmk:63: /home/jenkins/temurin-build/workspace/build/src/build/aix-ppc64-server-release/hotspot/variant-server/tools/adlc/objs/opcodes.o] Error 127
gmake[3]: *** Waiting for unfinished jobs....
/opt/freeware/bin/bash: fork: retry: Resource temporarily unavailable

Digging around, some sources seem to attribute this to a ulimit issue. I'll see what I can find

@aixtools
Copy link
Contributor

aixtools commented Oct 5, 2021

I thought the playbooks did user creation/verification:

new build:

root@ojdk03:[/root]ulimit -a
time(seconds)        unlimited
file(blocks)         unlimited
data(kbytes)         unlimited
stack(kbytes)        4194304
memory(kbytes)       unlimited
coredump(blocks)     unlimited
nofiles(descriptors) 2000
threads(per process) unlimited
processes(per user)  unlimited
root@ojdk03:[/root]su - jenkins
jenkins@ojdk03:[/home/jenkins]ulimit -a
time(seconds)        unlimited
file(blocks)         unlimited
data(kbytes)         131072
stack(kbytes)        32768
memory(kbytes)       32768
coredump(blocks)     unlimited
nofiles(descriptors) unlimited
threads(per process) unlimited
processes(per user)  unlimited

older system:

# ulimit -a
time(seconds)        unlimited
file(blocks)         2097151
data(kbytes)         131072
stack(kbytes)        32768
memory(kbytes)       32768
coredump(blocks)     2097151
nofiles(descriptors) 2000
threads(per process) unlimited
processes(per user)  unlimited
# su - jenkins
-bash-5.0$ ulimit -a
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) 131072
file size               (blocks, -f) unlimited
max memory size         (kbytes, -m) 32768
open files                      (-n) unlimited
pipe size            (512 bytes, -p) 64
stack size              (kbytes, -s) 32768
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited
  • Personally, not happy that both systems are SO different. THUS, look at other systems, e.g., the build-osuosl-*
  • OJDK01: Almost identical (root vs jenkins)
root@p8-aix2-ojdk01:[/root]ulimit -a
time(seconds)        unlimited
file(blocks)         unlimited
data(kbytes)         unlimited
stack(kbytes)        32768
memory(kbytes)       32768
coredump(blocks)     unlimited
nofiles(descriptors) 2000
threads(per process) unlimited
processes(per user)  unlimited
root@p8-aix2-ojdk01:[/root]su - jenkins
jenkins@p8-aix2-ojdk01:[/home/jenkins]ulimit -a
time(seconds)        unlimited
file(blocks)         unlimited
data(kbytes)         unlimited
stack(kbytes)        32768
memory(kbytes)       32768
coredump(blocks)     unlimited
nofiles(descriptors) unlimited
threads(per process) unlimited
processes(per user)  unlimited
  • OJDK02: Different again...
root@p8-aix2-ojdk02:[/root]ulimit -a
time(seconds)        unlimited
file(blocks)         unlimited
data(kbytes)         unlimited
stack(kbytes)        4194304
memory(kbytes)       unlimited
coredump(blocks)     unlimited
nofiles(descriptors) 2000
threads(per process) unlimited
processes(per user)  unlimited
root@p8-aix2-ojdk02:[/root]su - jenkins
jenkins@p8-aix2-ojdk02:[/home/jenkins]ulimit -a
time(seconds)        unlimited
file(blocks)         unlimited
data(kbytes)         131072
stack(kbytes)        32768
memory(kbytes)       32768
coredump(blocks)     unlimited
nofiles(descriptors) unlimited
threads(per process) unlimited
processes(per user)  unlimited
  • In short, looks like the playbooks are either not being used, or not effective, or manual changes are being made. Sigh.

@aixtools
Copy link
Contributor

aixtools commented Oct 5, 2021

Looks like it is maxuproc did not get fixed by the playbooks: ojdk03 is still at 128.

10.1.0.7        ojdk01.bak      # build-osuosl-aix71-ppc64-1
maxuproc 512 Maximum number of PROCESSES allowed per user True
10.1.0.8        ojdk02.bak      # build-osuosl-aix71-ppc64-2
maxuproc 512 Maximum number of PROCESSES allowed per user True
10.1.0.12       ojdk03.bak      # test-osuosl-aix72-ppc64-1
maxuproc 128 Maximum number of PROCESSES allowed per user True
10.1.0.16       ojdk04.bak      # test-osuosl-aix72-ppc64-2
maxuproc 512 Maximum number of PROCESSES allowed per user True
10.1.0.4        ojdk05.bak      # test-osuosl-aix71-ppc64-1
maxuproc 512 Maximum number of PROCESSES allowed per user True
10.1.0.5        ojdk06.bak      # test-osuosl-aix71-ppc64-2
maxuproc 512 Maximum number of PROCESSES allowed per user True
  • Manual fix, or will the playbooks be tested again?

@Haroon-Khel
Copy link
Contributor Author

Haroon-Khel commented Oct 5, 2021

I did run the entire AIX playbook on ojdk03. Which role in the playbooks should have fixed this? If there isn't already something in the playbooks that should fix this, I'll add a line that changes maxuproc to 512 for the jenkins user

@aixtools
Copy link
Contributor

aixtools commented Oct 5, 2021

This is where is it has historically been done.

./playbooks/scripts/AIX_filesystem_config.sh: chdev -l sys0 -a maxuproc=512

And, iirc, this is processed/managed via: ./playbooks/AdoptOpenJDK_AIX_Playbook/roles/aixfs/tasks/main.yml

p.s. it is a system-wide setting, not a user setting that can be higher than the system setting.

@Haroon-Khel
Copy link
Contributor Author

I see. My bad, what I meant by 'entire playbook' is I often miss that role as it was suggested to me long ago by @sej-jackson that the script's, AIX_filesystem_config.sh, way of expanding each filesystem is unrecommended. iirc we have an issue in this repo addressing this

@aixtools
Copy link
Contributor

aixtools commented Oct 5, 2021

Yes, that was the first issue I opened, and due to many misunderstandings on my part (on how ansible works) plus we have moved on, I dropped it as not resolvable by me.

Perhaps I should start over again. :)

@sxa
Copy link
Member

sxa commented Oct 5, 2021

I've no idea why the maxuproc option ended up in the file system creation script but it should definitely be used elsewhere. This issue precisely demonstrates why :-)

@aixtools
Copy link
Contributor

aixtools commented Oct 5, 2021

That is LONG before my time. Probably because it was easier to add to a script (could it have even pre-dated the use of ansible?) than meet the requirements for ansible for command execution.

That was one of the reasons (i.e., more than just filesystem commands) I was triggered to open a issue #1547 about that file.

@aixtools
Copy link
Contributor

is this still open?

@sxa
Copy link
Member

sxa commented Nov 19, 2021

@Haroon-Khel What is the status of this - do we still have compiler issues on newly installed 7.2 machines?

@sxa sxa modified the milestones: October 2021, December 2021 Dec 1, 2021
@aixtools
Copy link
Contributor

@sxa @Haroon-Khel

The only 'issue' is that the compiler on adopt08 is more strict - as it is patched - than the compiler (GA version, no patches) - as far as xlCv16 is concerned.

Both are functioning as 'jenkins' nodes, and imho - this issue may be closed.

@aixtools
Copy link
Contributor

aixtools commented Jan 6, 2022

This was probably the maxuproc setting, that @Haroon-Khel fixed both manually and in the playbooks.

Still think it can be closed :p

@sxa sxa modified the milestones: December 2021, 2022-01 (January) Jan 6, 2022
@aixtools
Copy link
Contributor

aixtools commented Jan 9, 2022

OK. Found it.

  • AIX 7.2 (7200-04-02-2028) (I'll guess AIX 7.2 TL4) includes strftime_l() (a POSIX routine), but AIX 7.2 (7200-02-05-1938) does not.
  • The XLC16 include file: /opt/IBM/xlC/16.1.0/include2/c++/support/ibm/xlocale.h includes a code block to provide some inline code - as a static declaration. And this breaks the build.
In file included from /home/jenkins/temurin-build/workspace/build/src/omr/ddr/test/sample1.cpp:23:
In file included from /opt/IBM/xlC/16.1.0/include2/c++/iostream:38:
In file included from /opt/IBM/xlC/16.1.0/include2/c++/ios:216:
In file included from /opt/IBM/xlC/16.1.0/include2/c++/__locale:25:
/opt/IBM/xlC/16.1.0/include2/c++/support/ibm/xlocale.h:369:8: error: static declaration of 'strftime_l' follows non-static declaration
size_t strftime_l(char *__s, size_t __size, const char *__fmt,
       ^
/usr/include/time.h:318:23: note: previous declaration is here
  extern size_t       strftime_l(char *__restrict__, size_t, const char *__restrict__, const struct tm *__restrict__,locale_t loc);
                      ^
[  1%] Building CXX object runtime/omr/ddr/lib/ddr-ir/CMakeFiles/omr_ddr_ir.dir/EnumUDT.cpp.o
1 error generated.
Error while processing /home/jenkins/temurin-build/workspace/build/src/omr/ddr/test/sample1.cpp.
  • Not sure who to report this to - to get the message to the XLC developers. A bug - in that the include file needs to be looking at a specfic level of AIX 7.2 - when >= to skip the code block.
  • So, either we re-install adopt03 again - at a lower code level - or we wait for a fix - and apply.
  • For performance issues - perhaps reinstall at AIX 7.2 TL2 level.

@sxa
Copy link
Member

sxa commented Jan 10, 2022

The only 'issue' is that the compiler on adopt08 is more strict

Just to be clear from your last comment - is the above statement still true, or is that xlc include file that's trying to use strftime_l() the same for both compiler versions?

From an adopt perspective if it's only happening in the main openjdk build jobs and not when building the tests I'd be ok with not resolving this one (since we don't do any of the regular builds with AIX 7.2) ... Although there is an argument for querying it with the compiler team too as you say.

@aixtools
Copy link
Contributor

aixtools commented Jan 10, 2022

The above statement is true on adopt03, which we had upgraded the OS to AIX 7.2 TL4.

  • the issue is that AIX 7.2 TL4 (or even TL3) now includes support in /usr/lib/*.a - which previously, was not there. The xlC16 include file assumes the function is not there.
  • As the function exists, it is defined in the bos.adt.include fileset /usr/include/time.h and build's fail.
  • As such, disabling the xlc16 tag is still accurate, but otherwise the node should be useable.
  • update: adopt08 works as expected because it is AIX 7.1 TL5 - and does not have strftime_l() - as expected by xlC16

@aixtools
Copy link
Contributor

aixtools commented Jan 11, 2022

FYI: I found an AIX 7.2, TL3 system, and it has the function in it's rte as well: (ie, in the include files, so I expect it in the libraries)

  • 7200-03-06-2038
[/root]find /usr/include -name \*.h | xargs grep strftime_l
/usr/include/time.h:  extern size_t strftime_l();
/usr/include/time.h:  extern size_t       strftime_l(char *__restrict__, size_t, const char *__restrict__, const struct tm *__restrict__, locale_t);
  • Again, this does NOT impact aix715 systems - ONLY xlc16 tagging of adopt03.
  • imho: adopt03 should be fine with the ci.role.test tag.

@aixtools
Copy link
Contributor

  • new patch applied and fully functional.
Your final archive was created at /home/jenkins/temurin-build/workspace/build/src/build/aix-ppc64-server-release/images/OpenJDK.tar.gz
Moving the artifact to /home/jenkins/temurin-build/workspace/target/
build.sh : 17:12:22 : All done!

@sxa
Copy link
Member

sxa commented Jan 13, 2022

OK I've re-enabled ci.role.test on the machine - let's hope it performs ok with the pipelines this evening (Should be JDK11 and 17)

@aixtools
Copy link
Contributor

And close in the morning :)

@aixtools
Copy link
Contributor

aixtools commented Jan 14, 2022

@sxa sxa closed this as completed Jan 14, 2022
@aixtools
Copy link
Contributor

p.s. - only today changing the label on adopt03 to allow xlc16 again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants