Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent rootfs used through SSTATE cache #130

Closed
deribaucourt opened this issue Apr 3, 2024 · 4 comments · Fixed by #137
Closed

Inconsistent rootfs used through SSTATE cache #130

deribaucourt opened this issue Apr 3, 2024 · 4 comments · Fixed by #137

Comments

@deribaucourt
Copy link

Hello,

I found a compilation inconsistency in the way genimage.bbclass uses the shared state for the deploy task.
The genimage class inherits deploy. This allows reusing previously built images if the recipe's hashes haven't changed. The genimage.bbclass documentation meanwhile also recommends to add a dependency to the underlying rootfs image recipe in the form:

do_genimage[depends] += "core-image-minimal:do_image_complete"

However, image recipes like core-image-minimal do not inherit deploy.bbclass. This means they don't generate and reuse sstate artifacts. Hence, if I compile my genimage image in a new build directory with a populated sstate cache that matches (which happens a lot in CI), a new rootfs for core-image-minimal is reassembled, but we use the genimage .img from the sstate. If the rootfs build steps are not reproducible, for instance if I want to have a built timestamp like bellow, then the timestamps present on the rootfs .tar.bz2 and inside the genimage .img do not match!

REPRODUCIBLE_TIMESTAMP_ROOTFS = ""

Steps to reproduce:

  1. Add REPRODUCIBLE_TIMESTAMP_ROOTFS = "" in the rootfs image recipe
  2. Configure an external SSTATE_CACHE directory
  3. Create a genimage recipe from that rootfs image recipe
  4. Run bitbake to compile the genimage image
  5. Remove the build directory, keep the SSTATE cache
  6. Run bitbake to compile the genimage image again, while using the SSTATE cache
  7. The /etc/version contained in the .tar.bz2 and .img do not match. This can also be seen easily from the symlink names in tmp/deploy

I noticed that other image bundling Yocto classes do not inherit the deploy class. It seems reserved for packages wich have a deploy output like kernels and bootloaders. For instance, image.bbclass or swupdate.bbclass do no inherit deploy. I question whether genimage should do it as well because of the bug explained above. I think we should directly write into the DEPLOY_DIR_IMAGE like those examples do. What is your opinion on this?

Thank you very much for your support!

@ejoerns
Copy link
Member

ejoerns commented Jun 4, 2024

@deribaucourt This is (more or less) a known general 'issue' with the image handling that you pointed out.

Sstate and stamp handling always make the assumption that if the input does not change the task does not need to be re-run and cache artifacts can be used.

If I get it right, the example you provided would be the same for all steps in the build chain. For example, if you have a custom package that puts the current timestamp into a file, then rebuilding the rootfs won't update this file either, correct?

The known mechanisms for fixing this or working around are to define explicit dependencies on the variables that change ([vardeps]) or to mark task that should always run as [nostamp] (with the known consequences of always enforcing a rebuild down the stream).

For most cases, it is not really an issue if the rootfs rebuilds but does not end up in an image. But for the use case you mentioned or for a trusted boot chain with generated verification information that must exactly match the rootfs archive/file system it is.
Having a reproducible image build is desired for the secure boot use case (and theoretically possible) but hard to gain and prone to breaking.

The underlying reason why genimage handling is a bit specific is of course because genimage is not implemented as an IMAGE_FSTYPE (like wic is). However, doing all in the do_image_* steps also does not resolve all issues, especially in the mentioned secure boot context.

So far my argument would have been: If one intentionally breaks assumptions of the bitbake task handling (reproducibility), then one also needs to explicitly handle this. However, I am not fully sure anymore if this is the best approach.

The image handling also uses sstate handling, but disables the actual artifact generation. A workaround one could add to a genimage recipe, too:

SSTATE_SKIP_CREATION:task-deploy = '1

The other way would be to set in the rootfs recipe that intends to always update the timestamp file:

do_rootfs[nostamp] = "1"

But as already mentioned, while I find this valid for the use case you mentioned, the problem for trusted boot remains.

Thus, further suggestions are welcome.

@deribaucourt
Copy link
Author

@deribaucourt This is (more or less) a known general 'issue' with the image handling that you pointed out.

Sstate and stamp handling always make the assumption that if the input does not change the task does not need to be re-run and cache artifacts can be used.

If I get it right, the example you provided would be the same for all steps in the build chain. For example, if you have a custom package that puts the current timestamp into a file, then rebuilding the rootfs won't update this file either, correct?

Yes but the observable difference would be found in the build/tmp/deploy/rpm generated package. Since packages don't usually inherit deploy.

The known mechanisms for fixing this or working around are to define explicit dependencies on the variables that change ([vardeps]) or to mark task that should always run as [nostamp] (with the known consequences of always enforcing a rebuild down the stream).

For most cases, it is not really an issue if the rootfs rebuilds but does not end up in an image. But for the use case you mentioned or for a trusted boot chain with generated verification information that must exactly match the rootfs archive/file system it is. Having a reproducible image build is desired for the secure boot use case (and theoretically possible) but hard to gain and prone to breaking.

For reference, I haven't encountered this problem because of secure boot, but for delta (rdiff) updates which also require exact equality between the images. I also have another use case which relies on non this timestamp for version management which also breaks with this issue.

The underlying reason why genimage handling is a bit specific is of course because genimage is not implemented as an IMAGE_FSTYPE (like wic is). However, doing all in the do_image_* steps also does not resolve all issues, especially in the mentioned secure boot context.

If this difference with other bundling solutions is justified, then I guess we can close this issue. We have documented plenty of workarounds now :)

So far my argument would have been: If one intentionally breaks assumptions of the bitbake task handling (reproducibility), then one also needs to explicitly handle this. However, I am not fully sure anymore if this is the best approach.

The image handling also uses sstate handling, but disables the actual artifact generation. A workaround one could add to a genimage recipe, too:

SSTATE_SKIP_CREATION:task-deploy = '1

The other way would be to set in the rootfs recipe that intends to always update the timestamp file:

do_rootfs[nostamp] = "1"

But as already mentioned, while I find this valid for the use case you mentioned, the problem for trusted boot remains.

Thus, further suggestions are welcome.

Currently to ensure I don't run into these errors, we do our release builds without an sstate cache. That way we ensure the build is fresh and not messed with. But I'm going to add those workarounds to explicitly not reuse my image's deploy in the sstate. It will be useful for developers.

Thanks a lot for your analysis and answer!

@ejoerns
Copy link
Member

ejoerns commented Jun 6, 2024

We have discussed this indirectly in another context where we had the issues with sstate artifact I described above, too.

We concluded that setting

SSTATE_SKIP_CREATION:task-deploy = '1'

should be part of the class by default. One reason is reproducibility, but another main one is that generating the artifacts for disk images consumes a notable amount of disk space while not speeding up the build significantly.

While testing I also came across another issue that, when fixed, will probably make the class more efficient anyway since it avoids unnecessary rebuilds at another level.

@deribaucourt Thank you for your input!

@deribaucourt
Copy link
Author

Yes I agree that sstate artifacts for image recipes are rarely useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants