-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Adding Rewriter #709
Conversation
I think that you should go by the In theory, I think that for SQLite at least it shouldn't matter what order you write in, since it is sorted by the timestamp field. But, this is probably not a safe assumption to make for all possible storage implementations. |
class BaseWriterInterface; | ||
} // namespace writer_interfaces | ||
|
||
class ROSBAG2_CPP_PUBLIC Stitcher final |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring for all classes please. What exactly is the intended purpose of this class? That helps inform all following review
* \param storage_uris A vector of URI of the storage to open. | ||
* \param output_uri The uri of the storage to write to. | ||
**/ | ||
void open(const std::vector<std::string> & storage_uris, const std::string & output_uri); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API is "too low-level" - you should let existing classes do the things they already know how to do. E.g. if you are trying to stitch together "files within a single bag" - then you should point at the bag, not the files. This should probably be something like
void open(const std::vector<std::string> & storage_uris, const std::string & output_uri); | |
void open(rosbag2_storage::StorageOptions input_storage, rosbag2_storage::StorageOptions output_storage); |
This allows you to simply pass input_storage
to Reader::open()
(and output_storage
to Writer::open()
)
What this API also suggests is that this class is not just a "Stitcher" - it's a "BagReshaper" or a "BagConverter" - what it would actually do is
- Open an existing bag that was recorded with some configuration
- Write all messages from that bag to a new bag with some new configuration.
This should allow you to convert from one storage implementation to another one, split large bagfile(s) into more smaller ones, combine many small bagfiles into fewer larger ones. The "Stitch into one" assumption is too limited as an API!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok so do you imagine this API to really be doing the "reshaping" and then the "front-end" would be the system that decides what is to be shaped and how?
In other words, should I rename and redo this from stitcher
to reshaper
, with the details of how it is being reshaped (stitched, or split) to the consumer of the API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - I think that this would be more appropriate as a Reshaper
, and "stitching a multibag into a monobag" is just one possible behavior from given inputs.
Given your implementation of "read messages one at a time and send to writer" - this more general implementation should look exactly the same under the hood, but provide much more flexible functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To go one step further - I think this API is much too large - as far as I can tell given the feature description, the public API should be a single function, like follows:
class Rewriter
{
public:
Rewriter(reader, writer);
void rewrite(StorageOptions input_options, StorageOptions output_options);
};
Naming things is hard... but I'm almost thinking that, given the description of the functionality, that it's more of a "Rewriter", because it reads a bag, and writes it again with new settings to a new location. It doesn't have to change the shape, it may be changing the storage implementation, compression, or serialization format, without changing the shape.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok yeah, I think I mimicked the API around the Reader api. But I totally agree it can be stripped down.
However, Is there not a need to pass in ConverterOptions as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mimicked the API around the Reader api
It's better to think about what functionality you're trying to provide. Your implementation will call the Reader/Writer APIs, but this thing isn't the same type of thing, so there is no reason for it to have the same API.
However, Is there not a need to pass in ConverterOptions as well?
Yes, probably
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See discussion here: #709 (comment)
I think there is no way to do a many to one bagfile without this API knowing about multiple bags to read.
by the way, thanks for opening the draft - with the Galactic freeze on the horizon (May 17 absolute last date, preferring not to accept new features after May 1) it's best not to have any surprises |
Just checking in, are you planning to try and get this in before the May 1 date? |
I am not certain, I will definitely try my best. I plan to get you responses to this first round of comments by early next week. |
Reshaper/stitcher was too limiting, we need a rewriter instead.
* \param output_storage_options Storage options for the output bagfile | ||
* \param output_converter_options Converter options for the output bagfile | ||
**/ | ||
void rewrite_many( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think rewrite
would be used for "conversion" (#717 , #599) but since the rosbag2_cpp::Writer
cannot open a bag for rewriting , there had to be a way to only open the bag once for writing while opening multiple readers, and therefore I thought rewrite_many
was appropriate.
(
* Opens a new bagfile and prepare it for writing messages. The bagfile must not exist. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you trying to combine multiple "bags", or multiple "files"? This remains unclear from this comment. A single Reader can read a multi-file bag. Can you clarify what you're trying to do here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I'll note that "convert" is a heavily overloaded term that can refer to a variety of independent features, as we're learning in the conversation on #717, so I'd avoid the term right now in favor of more specific language.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emersonknapp I am sorry I am having a bit of trouble understand what you mean about "bags" v.s. "files".
Is there a minimal example you could point me to about reading multifiles with the reader? I found the test, but I am having trouble following
https://github.com/ros2/rosbag2/blob/master/rosbag2_cpp/test/rosbag2_cpp/test_multifile_reader.cpp
From my understanding, the reader takes in either a string URI or StorageOptions:
void open( |
struct StorageOptions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In rosbag2, a "bag" is not a single file on disk, it's a directory, potentially containing many individual files.
For example if I do a recording like the following, observe the contents of the bag from ls
$ ros2 bag record -a -o mybag
[INFO] [1618420012.176030068] [rosbag2_storage]: Opened database 'mybag/mybag_0.db3' for READ_WRITE.
[INFO] [1618420012.176144426] [rosbag2_recorder]: Listening for topics...
[INFO] [1618420012.178912237] [rosbag2_recorder]: Subscribed to topic '/rosout'
[INFO] [1618420012.180516824] [rosbag2_recorder]: Subscribed to topic '/parameter_events'
[INFO] [1618420024.978740077] [rosbag2_recorder]: Subscribed to topic '/chatter'
^C[INFO] [1618420027.180706513] [rclcpp]: signal_handler(signal_value=2)
[INFO] [1618420027.387157257] [rosbag2_cpp]: Writing remaining messages from cache to the bag. It may take a while
$ ls mybag
metadata.yaml mybag_0.db3
In this case, mybag
is the "ROS 2 Bag" - and mybag/mybag_0.db3
is a single file within that bag.
You can see in a case with splitting - (--max-bag-size
causes splitting when a file gets large enough) - that there are multiple files within the bag.
$ rm -rf mybag
$ ros2 bag record -a -o mybag --max-bag-size 86016
[INFO] [1618420123.290318343] [rosbag2_storage]: Opened database 'mybag/mybag_0.db3' for READ_WRITE.
[INFO] [1618420123.290484810] [rosbag2_recorder]: Listening for topics...
[INFO] [1618420123.292763350] [rosbag2_recorder]: Subscribed to topic '/rosout'
[INFO] [1618420123.294727032] [rosbag2_recorder]: Subscribed to topic '/parameter_events'
[INFO] [1618420123.296239889] [rosbag2_recorder]: Subscribed to topic '/chatter'
[INFO] [1618420127.495646435] [rosbag2_cpp]: Writing remaining messages from cache to the bag. It may take a while
[INFO] [1618420127.498400134] [rosbag2_storage]: Opened database 'mybag/mybag_1.db3' for READ_WRITE.
^C[INFO] [1618420129.424583789] [rclcpp]: signal_handler(signal_value=2)
[INFO] [1618420129.627688074] [rosbag2_cpp]: Writing remaining messages from cache to the bag. It may take a while
$ ls mybag
metadata.yaml mybag_0.db3 mybag_1.db3
When you pass a storage_uri
to rosbag2
- you pass the directory mybag
- e.g. ros2 bag play mybag
, you aren't passing it the path of the individual file. It then, using the metadata, knows to open the files in sequence automatically using the SequentialReader
.
StorageOptions
contains a storage_uri
string, so it's a superset of configuration for the reader - both of them are expecting this directory.
Given this context - are you trying to "combine files within a split bag into a new single-file bag"? Or are you trying to "combine multiple bags with their own metadata into a single bag"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah! ok I did not realize the storage_uri
was describing the directory. So essentially, I am trying to "combine files within a split bag into a new single-file bag" , so I do not need the rewrite_many
function yes.
@emersonknapp Hey, needless to say, I don't think I will have this done by the Code Freeze date. Some other priorities came up. |
These things happen - are you still interested in continuing to work on it? Galactic is only a short-term distribution - if we can get this into the H-turtle release next year, it will reach far more users anyways. It's a valuable feature. |
Yes, I am still interested in continuing this development :) |
Opening up a draft so I can make sure we are in communication / not straying too far from the original vision.
This addresses the "backend" portion of #668 the stitcher function.
I have several big TODOs here before this would be ready:
rosbag2_cpp
. This is where I am having trouble. How do I mock up files for testing the stitching here? Do I forego this test at this level and opt instead for doing full system test?rosbag2_tests/resources/reindex_test_bags/multiple_files/
or something similar to perform these tests)