-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean arttifacts before and after builds #5409
Conversation
@@ -845,9 +845,11 @@ def build_docs_html(self): | |||
if self.build_force: | |||
html_builder.force() | |||
html_builder.append_conf() | |||
html_builder.clean() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should refactor build_docs_html
to use build_docs_class
, but not so trivial code needs to be moved around, so keeping simple this for now. I'm creating a new issue to refactor this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
It does not seem to save too much space, but considering that we were cleaning before building, it doesn't make sense to keep that useless data until the next build.
A next step should be trying to remove the whole project after building, but this is first step to test how it goes.
If we go in this direction, this should be configurable since we can't do this in the corporate site.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think it might be easier that this approach. I believe we can avoid mounting a volume into docker and it will be the same effect as explicitly cleaning the files -- also, it treats storage as truly ephemeral if we don't mount a volume into docker.
Perhaps under feature flag as well, we probably want to test this approach more
Also also, it can't affect our commercial hosting, where some repos are very large.
We need that to do the clone step, at least you mean generating the artifacts inside the container, which I believe will complicate the process of copying the artifacts to the servers, also, the docker container will be running for more time, and maybe it afffects the performance. https://docs.docker.com/storage/
|
Do you mean with this approach? I don't think so, we already clean the builders for html and pdf (the main/bigger ones). There isn't much to lose there. We'll need to add a feature flag in the case we want to remove the whole repo after each build. |
Ah this is true, we still need the volume mount so the build servers can sync artifacts. I suppose we could alter the container creation to capture only the build output in a volume if we wanted. What opinions do you have on that? Is it worth doing that on top of this PR? I think the work here makes sense on it's own, if we get more aggressive we can consider a feature flag that changes the volume mount to just an output volume only. You bring up a good point about storage vs build time though. If we move too far away from using storage, we're just going to have longer build times, which means more build servers, and the savings from less disk space will slowly be lost. Archiving to blob storage, if performant enough, is probably the most balanced approach that we'll strike. This will greatly depend on the speed of VM <-> blob storage transfer of course. |
For me looks like the same, basically we are removing that data from inside the container, mounting another volume we still need to remove the data from outside. Or maybe do you mean only generate the build inside the container and the copy that to a volume? Because we do some kind of copy first of the data from the So, the next step that I wanted to take is refactor our builders to not do that extra copy from the With that I don't see any harm in this PR, and we can keep doing the things above in a later step ^ |
In case we want to change the volumes mounted on the container, we could mount the ones that we need using one per purpose (instead of one:
The rest of the operation can be done in any other directory inside the container and will be completely removed after the container gets killed. This approach, also gives us the flexibility of mounting 1) only for corporate site for example --where we need to keep the repos longer. |
If we do #5409 (comment), we still need to remove the generated file, which this PR is about :) |
We don't want to remove artifacts, we want to remove all of the build data (checkout, environment, artifacts) In fact, removing the artifacts is probably the last thing we want to do, since they need to be synced from the webs, and that should be done after syncing, not in the build logic. |
We copy those to another directory before doing the cleaning, so they are synced anyway. |
This feels like wasted effort if we're going to delete all the files from the server after the build. We should work on deleting all the checkout and environment, not trying to optimize small things that won't matter after doing that work. |
ok, I'm closing this PR, I think we are going for another path now. |
We were doing this for some builders like html and pdf, but not for all ot them.
We are going to save some space after this.
This doesn't affect the builders so much, due that we use several builders, not just one. So depending of the load balancer a user could be using an empty or new build.
A next step should be trying to remove the whole project after building, but this is first step to test how it goes.