-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a hint for indicating that a Tree is topologically sorted #230
Add a hint for indicating that a Tree is topologically sorted #230
Conversation
dac402b
to
0266a4b
Compare
0266a4b
to
e81e679
Compare
Also cc @peterebden, as he made changes in this area as well in #223 |
01a52cc
to
cd66598
Compare
cd66598
to
e37ee76
Compare
Thanks everyone for your time discussing this proposal today. I've made the change suggested by @sstriker, documenting that manual construction of Tree objects without using the Protobuf library is recommended. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mentioned that this implies a limit of 16(?) possible fields in this message- I think it might be worth mentioning that so we don't forget about it. Otherwise this lgtm.
I think this was @bergsieker asking whether or not we should require that the tag is encoded as a single byte. In my opinion this is acceptable, for the following reason. The tag is essentially just a variable length integer, having value I have just extended the phrasing to clarify this. |
I'm currently trying to improve the performance of handling of large output directories (Tree messages), having sizes in the order of hundreds of megabytes. In the process, I have realised that there is a lot of value in enforcing that the Directory messages contained in them are topologically sorted. Two practical use cases: - When instantiating the contents of a Tree on a local file system, having the Tree be topologically sorted allows you to immediately create files and directories in the right place. - When needing to resolve the properties of a single file by path, a topologically sorted Tree permits resolution by doing a simple forward scan. Especially when other features like compression are taken into account, it's useful if Tree messages can be processed in a streaming manner. One practical issue is that most Protobuf libraries don't offer APIs for processing messages in a streaming manner. This means that implementors who want to achive these optimisations will need to write their own message parsers; at least for the Tree tree itself. To make this as painless as possible, we also require that the Tree is stored in some normal form. Fixes: bazelbuild#229
e37ee76
to
c3e4d01
Compare
remote-apis PR 230 added a way where producers of Tree messages can indicate that the directories contained within are stored in topological order. The advantage of using such an ordering is that it permits instantiation of such objects onto a local file system in a streaming fashion. The same holds for lookups of individual paths. Even though Bazel currently does not gain from this, this change at least modifies Bazel's REv2 client to emit topologically sorted trees. This makes it possible for tools such as Buildbarn's bb-browser to process them more efficiently. More details: - bazelbuild/remote-apis#229 - bazelbuild/remote-apis#230
Draft PR against Bazel to start generating topologically sorted trees: bazelbuild/bazel#16463 I will mark it ready for review as soon as this PR lands. |
I have submitted the following PR against remote-apis, which adds a hint to ActionResult to indicate that the Directory messages contained in a Tree are topologically sorted: bazelbuild/remote-apis#230 The advantage of having Tree objects in this form is that it makes it possible by doing instantiation of Tree objects on a local file system using a simple forward scan. As this change hasn't been merged yet, this comment only adds the logic for generating topologically sorted Tree objects. I will only add the code for announcing the hint as soon as the linked PR gets merged.
I have submitted the following PR against remote-apis, which adds a hint to ActionResult to indicate that the Directory messages contained in a Tree are topologically sorted: bazelbuild/remote-apis#230 The advantage of having Tree objects in this form is that it makes it possible by doing instantiation of Tree objects on a local file system using a simple forward scan. As this change hasn't been merged yet, this comment only adds the logic for generating topologically sorted Tree objects. I will only add the code for announcing the hint as soon as the linked PR gets merged.
@bergsieker Would you be interested in merging this PR for me? Thanks! |
remote-apis PR 230 added a way where producers of Tree messages can indicate that the directories contained within are stored in topological order. The advantage of using such an ordering is that it permits instantiation of such objects onto a local file system in a streaming fashion. The same holds for lookups of individual paths. Even though Bazel currently does not gain from this, this change at least modifies Bazel's REv2 client to emit topologically sorted trees. This makes it possible for tools such as Buildbarn's bb-browser to process them more efficiently. More details: - bazelbuild/remote-apis#229 - bazelbuild/remote-apis#230
remote-apis PR 230 added a way where producers of Tree messages can indicate that the directories contained within are stored in topological order. The advantage of using such an ordering is that it permits instantiation of such objects onto a local file system in a streaming fashion. The same holds for lookups of individual paths. Even though Bazel currently does not gain from this, this change at least modifies Bazel's REv2 client to emit topologically sorted trees. This makes it possible for tools such as Buildbarn's bb-browser to process them more efficiently. More details: - bazelbuild/remote-apis#229 - bazelbuild/remote-apis#230
remote-apis PR 230 added a way where producers of Tree messages can indicate that the directories contained within are stored in topological order. The advantage of using such an ordering is that it permits instantiation of such objects onto a local file system in a streaming fashion. The same holds for lookups of individual paths. Even though Bazel currently does not gain from this, this change at least modifies Bazel's REv2 client to emit topologically sorted trees. This makes it possible for tools such as Buildbarn's bb-browser to process them more efficiently. More details: - bazelbuild/remote-apis#229 - bazelbuild/remote-apis#230 Partial commit for third_party/*, see #16463. Signed-off-by: Sunil Gowroji <sgowroji@google.com>
remote-apis PR 230 added a way where producers of Tree messages can indicate that the directories contained within are stored in topological order. The advantage of using such an ordering is that it permits instantiation of such objects onto a local file system in a streaming fashion. The same holds for lookups of individual paths. Even though Bazel currently does not gain from this, this change at least modifies Bazel's REv2 client to emit topologically sorted trees. This makes it possible for tools such as Buildbarn's bb-browser to process them more efficiently. More details: - bazelbuild/remote-apis#229 - bazelbuild/remote-apis#230 Closes #16463. PiperOrigin-RevId: 487196375 Change-Id: Iafcfd617fc101fec7bfa943552113ce57ab8041b
remote-apis PR 230 added a way where producers of Tree messages can indicate that the directories contained within are stored in topological order. The advantage of using such an ordering is that it permits instantiation of such objects onto a local file system in a streaming fashion. The same holds for lookups of individual paths. Even though Bazel currently does not gain from this, this change at least modifies Bazel's REv2 client to emit topologically sorted trees. This makes it possible for tools such as Buildbarn's bb-browser to process them more efficiently. More details: - bazelbuild/remote-apis#229 - bazelbuild/remote-apis#230 Closes #16463. PiperOrigin-RevId: 487196375 Change-Id: Iafcfd617fc101fec7bfa943552113ce57ab8041b
remote-apis PR 230 added a way where producers of Tree messages can indicate that the directories contained within are stored in topological order. The advantage of using such an ordering is that it permits instantiation of such objects onto a local file system in a streaming fashion. The same holds for lookups of individual paths. Even though Bazel currently does not gain from this, this change at least modifies Bazel's REv2 client to emit topologically sorted trees. This makes it possible for tools such as Buildbarn's bb-browser to process them more efficiently. More details: - bazelbuild/remote-apis#229 - bazelbuild/remote-apis#230 Partial commit for third_party/*, see #16463. Signed-off-by: Sunil Gowroji <sgowroji@google.com>
* Emit Tree objects in topological order remote-apis PR 230 added a way where producers of Tree messages can indicate that the directories contained within are stored in topological order. The advantage of using such an ordering is that it permits instantiation of such objects onto a local file system in a streaming fashion. The same holds for lookups of individual paths. Even though Bazel currently does not gain from this, this change at least modifies Bazel's REv2 client to emit topologically sorted trees. This makes it possible for tools such as Buildbarn's bb-browser to process them more efficiently. More details: - bazelbuild/remote-apis#229 - bazelbuild/remote-apis#230 Closes #16463. PiperOrigin-RevId: 487196375 Change-Id: Iafcfd617fc101fec7bfa943552113ce57ab8041b * Emit Tree objects in topological order remote-apis PR 230 added a way where producers of Tree messages can indicate that the directories contained within are stored in topological order. The advantage of using such an ordering is that it permits instantiation of such objects onto a local file system in a streaming fashion. The same holds for lookups of individual paths. Even though Bazel currently does not gain from this, this change at least modifies Bazel's REv2 client to emit topologically sorted trees. This makes it possible for tools such as Buildbarn's bb-browser to process them more efficiently. More details: - bazelbuild/remote-apis#229 - bazelbuild/remote-apis#230 Partial commit for third_party/*, see #16463. Signed-off-by: Sunil Gowroji <sgowroji@google.com> Signed-off-by: Sunil Gowroji <sgowroji@google.com> Co-authored-by: Ed Schouten <eschouten@apple.com>
I have submitted the following PR against remote-apis, which adds a hint to ActionResult to indicate that the Directory messages contained in a Tree are topologically sorted: bazelbuild/remote-apis#230 The advantage of having Tree objects in this form is that it makes it possible by doing instantiation of Tree objects on a local file system using a simple forward scan. As this change hasn't been merged yet, this comment only adds the logic for generating topologically sorted Tree objects. I will only add the code for announcing the hint as soon as the linked PR gets merged.
I'm currently trying to improve the performance of handling of large output directories (Tree messages), having sizes in the order of hundreds of megabytes. In the process, I have realised that there is a lot of value in enforcing that the Directory messages contained in them are topologically sorted. Two practical use cases:
When instantiating the contents of a Tree on a local file system, having the Tree be topologically sorted allows you to immediately create files and directories in the right place.
When needing to resolve the properties of a single file by path, a topologically sorted Tree permits resolution (of paths without ..) by doing a simple forward scan.
Especially when other features like compression are taken into account, it's useful if Tree messages can be processed in a streaming manner.
One practical issue is that most Protobuf libraries don't offer APIs for processing messages in a streaming manner. This means that implementors who want to achieve these optimisations will need to write their own message parsers; at least for the Tree itself. To make this as painless as possible, we also require that the Tree is stored in some normal form.
Fixes: #229