Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race conditon when building with parallel make #200

Open
bbhtt opened this issue Oct 22, 2024 · 5 comments
Open

Race conditon when building with parallel make #200

bbhtt opened this issue Oct 22, 2024 · 5 comments

Comments

@bbhtt
Copy link

bbhtt commented Oct 22, 2024

This started happening after moving to parallel make from -j1. The relevant portion seems to be:

make[1]: [Makefile:365: install-doc-libs] Error 1 (ignored)
make[1]: Leaving directory '/buildstream-build/bst_build_dir/doc'
make[1]: Entering directory '/buildstream-build/bst_build_dir/lib/ss'
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/buildstream-build/bst_build_dir/lib/ss'
	LD symlinks
making all in lib/e2p
make[1]: Entering directory '/buildstream-build/bst_build_dir/lib/e2p'
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/buildstream-build/bst_build_dir/lib/e2p'
making all in lib/support
symlinks.o: file not recognized: file format not recognized
collect2: error: ld returned 1 exit status
make[2]: *** [Makefile:383: symlinks] Error 1
make[2]: Leaving directory '/buildstream-build/bst_build_dir/util'
make[1]: *** [Makefile:221: ../../util/symlinks] Error 2
make[1]: Leaving directory '/buildstream-build/bst_build_dir/lib/et'
make: *** [Makefile:457: install-libs-recursive] Error 1
make: *** Waiting for unfinished jobs....
make[1]: Entering directory '/buildstream-build/bst_build_dir/lib/support'
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/buildstream-build/bst_build_dir/lib/support'
making all in lib/ext2fs
make[1]: Entering directory '/buildstream-build/bst_build_dir/lib/ext2fs'
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/buildstream-build/bst_build_dir/lib/ext2fs'

Although it errors at various other places as well.

Full log: 3537ff57-build.20241022-000542.txt

@bbhtt
Copy link
Author

bbhtt commented Oct 22, 2024

Another instance of a different faliure but still due to parallel make:

make[1]: Entering directory '/buildstream-build/bst_build_dir/misc'
	MKDIR_P /usr/bin /usr/bin /usr/bin /usr/share/man/man1 /usr/share/man/man8 /usr/lib/x86_64-linux-gnu /etc
	SYMLINK /usr/lib/x86_64-linux-gnu/libcom_err.so
	INSTALL /usr/bin/mke2fs
	INSTALL /usr/bin/badblocks
	INSTALL /usr/bin/tune2fs
	INSTALL /usr/bin/dumpe2fs
/buildstream-install/usr/lib/x86_64-linux-gnu/libcom_err.so.2: file or directory does not exist
	INSTALL /usr/bin/logsave
make[1]: *** [Makefile:465: install-shlibs] Error 1
make[1]: Leaving directory '/buildstream-build/bst_build_dir/lib/et'
make: *** [Makefile:457: install-shlibs-libs-recursive] Error 1

@tytso
Copy link
Owner

tytso commented Oct 25, 2024

Looking at the makelog, it looks make is somehow thinking that the util/symlinks has gotten out of date, and multiple instances of make is trying to rebuild it, and they are stepping on each other. . It's not something I can replicate on my set up, and it's certainly not anything that is showing up on the Debian autobuilders.

What kind of file system is /buildstream-build? Would it happen to be nfs, or some kind of file system with one second granularity timestamps?

@bbhtt
Copy link
Author

bbhtt commented Oct 25, 2024

It's FUSE bases, more specifically uses buildbox-fuse https://gitlab.com/BuildGrid/buildbox/buildbox/-/tree/master/fuse?ref_type=heads

@tytso
Copy link
Owner

tytso commented Oct 25, 2024

Can you replicate the problem without using the FUSE bases? I have a feeling that it may be screwing up by incorrectly caching file timestamps, or something thing else which makes it a bug in your storage stack. As I said, it's not something I can replicate, nor has anyone else reported it to me using a standard Linux file system.

@bbhtt
Copy link
Author

bbhtt commented Oct 26, 2024

Buildstream is the primary development/build environment we use, testing outside that is probably not ideal.

Somehow I don't think it's an issue with the filesystem, considering we managed to switch several (20+) modules to parallel make https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/merge_requests/21041 and e2fsprogs was the only one showing various race conditions in CI.

The build even passed with parallel make once (although we had to revert it the next day since the race condition happened again).

I can ask someone more familiar with buildbox-fuse to follow up here if you suspect it's an issue with /buildstream-build fs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants