-
Notifications
You must be signed in to change notification settings - Fork 578
RFC: TensorFlow Dockerfile Assembler #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
| Status | Proposed | | ||
:-------------- |:---------------------------------------------------- | | ||
| **ID** | <this will be allocated on approval> | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ID line can be deleted, it's left in the template in error. The ID is actually the filename of the RFC. I've fixed the template now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks!
cc @gunan |
It would be nice to mention:
|
...which means that you can dynamically set multiple FROM images. My first | ||
draft used ARGs and FROMs in a single Dockerfile to manipulate build stages. | ||
[The resulting | ||
Dockerfile](https://gist.github.com/angersson/3d2b5ae6a01de4064b1c3fe7a56e3821) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Dockerfile in this gist is truncated.
Also, I'm not sure why ARGs and multi-stage builds need to work together.
ARGs is for templating, allowing you to have a generic Dockerfile.
Multi-stage build is to avoid shipping with your build dependencies.
So I'm confused, for me they are totally different concepts.
Actually, when I took a quick look at the TF Dockerfiles, I thought multi-stage builds were not a good fit because:
- devel images need to maintain those build dependencies
- non-devel images already install from pip packages and don't have build dependencies (python deps excluded)
- many dependencies are installed through the package manager (
apt-get
) which doesn't work with multi-stage builds anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I may have been in the middle of updating the file when I decided to drop that approach. I fixed the line.
I've revised this entire section since your comment (which illustrated that this portion of the doc was poorly explained, thanks) to clarify why I included this explanation. Regular multi-stage builds wouldn't be very useful at all, but the support for multiple FROMs that they added would let us do very evil things that don't work very well in the long run. I included the gist to show how bad such a Dockerfile can get (it's not meant to be useful aside from that).
|
||
"Multi-stage Building" is a powerful new Dockerfile feature that supports | ||
multiple FROM statements in one Dockerfile. It is meant to be used for creating | ||
artifacts with one image before using those artifacts in another image, but you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why you say "but" here. Is the intended meaning "in addition"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a mixture of both -- I revised this whole section, so it should be better now. Thanks!
Is it a requirement of this proposal that the end-user sees full Dockerfiles and can do a single |
@flx42 Thanks for mentioning the tags. This design is meant to be a foundational change that lets us better approach discussions about what-tags-should-we-have; the images I've started with mirror the existing Dockerfiles and aren't trying to make great strides yet. I've updated the doc to better explain this. I've also revised some sections that you commented on.
I'm not sure how many users want this, but I've seen at least a handful of GitHub issues related to Dockerfile build issues -- because of that, and my own preferences working with Docker (I really like to see a simple Dockerfile, because it's easier to understand if all I want to do is build or learn), committing concrete files like this seems like a pretty good way to handle things. |
Thanks! Why I'm asking about the "single Dockerfile requirement" in #8 (comment) is because you can achieve the same kind of composition that you have with multiple small Dockerfiles. For instance, if you have a generic
This is similar to your approach, you have a generic component that adds Jupyter to an existing image. But the composition of these Dockerfiles is expressed through
At the CI level, this can be templated too, for the python version for instance. This doesn't introduce any other format/scripting on top of Docker tools, but you don't have this single generated Dockerfile. You can still achieve layer sharing by doing the builds in the right order, with a "tree" of Dockerfiles were only the leaves are pushed tags.
|
Ah, right, now I understand. I didn't really think about using multiple files with multiple FROMs. I'll add a new section to the doc that describes that method and some of the tradeoffs for it. One of the obvious downsides I can see with any process that offloads build complexity to the The two end results would be quite similar, only different in where complexity lives (and it also looks like the multi-stage builds you've described might build a bit faster because of definitive re-use of stages compared to the implied re-use here). I've also already written all of the logic for this design, so only maintenance costs factor in to the extra effort. Thanks for suggesting this! |
Thanks everyone for your review and contribution! |
Review is now closed for comments
TensorFlow Dockerfile Assembler
Summary
This document describes a new way to manage TensorFlow's dockerfiles. Instead of handling complexity via an on-demand build script, Dockerfile maintainers manage re-usable chunks called partials which are assembled into documented, standard, committed-to-repo Dockerfiles that don't need extra scripts to build. It is also decoupled from the system that builds and uploads the Docker images, which can be safely handled by separate CI scripts.
Important: This document is slim. The real meat of the design has already
been implemented in this PR to tensorflow/tensorflow.