Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Adding Rewriter #709

Closed
wants to merge 8 commits into from
Closed

Conversation

vinnnyr
Copy link

@vinnnyr vinnnyr commented Apr 5, 2021

Opening up a draft so I can make sure we are in communication / not straying too far from the original vision.

This addresses the "backend" portion of #668 the stitcher function.

I have several big TODOs here before this would be ready:

  • Sorting by chronology (is it safe to sort by alphabetical order on file name? Should I sort by the first time stamp on the first message of each bag?)
  • Testing in rosbag2_cpp. This is where I am having trouble. How do I mock up files for testing the stitching here? Do I forego this test at this level and opt instead for doing full system test?
  • System Level Testing (This I think I have a clear path forward for. I was going to use the rosbag2_tests/resources/reindex_test_bags/multiple_files/ or something similar to perform these tests)

@emersonknapp
Copy link
Collaborator

emersonknapp commented Apr 5, 2021

Sorting by chronology (is it safe to sort by alphabetical order on file name? Should I sort by the first time stamp on the first message of each bag?)

I think that you should go by the metadata.yaml list of files, in order. Alphabetical is not going to work, in the case that we have 10+ bagfiles in the bag (bag_3.db3 will come after bag_12.db3) in alphasort.

In theory, I think that for SQLite at least it shouldn't matter what order you write in, since it is sorted by the timestamp field. But, this is probably not a safe assumption to make for all possible storage implementations.

class BaseWriterInterface;
} // namespace writer_interfaces

class ROSBAG2_CPP_PUBLIC Stitcher final
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring for all classes please. What exactly is the intended purpose of this class? That helps inform all following review

* \param storage_uris A vector of URI of the storage to open.
* \param output_uri The uri of the storage to write to.
**/
void open(const std::vector<std::string> & storage_uris, const std::string & output_uri);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API is "too low-level" - you should let existing classes do the things they already know how to do. E.g. if you are trying to stitch together "files within a single bag" - then you should point at the bag, not the files. This should probably be something like

Suggested change
void open(const std::vector<std::string> & storage_uris, const std::string & output_uri);
void open(rosbag2_storage::StorageOptions input_storage, rosbag2_storage::StorageOptions output_storage);

This allows you to simply pass input_storage to Reader::open() (and output_storage to Writer::open())

What this API also suggests is that this class is not just a "Stitcher" - it's a "BagReshaper" or a "BagConverter" - what it would actually do is

  1. Open an existing bag that was recorded with some configuration
  2. Write all messages from that bag to a new bag with some new configuration.

This should allow you to convert from one storage implementation to another one, split large bagfile(s) into more smaller ones, combine many small bagfiles into fewer larger ones. The "Stitch into one" assumption is too limited as an API!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok so do you imagine this API to really be doing the "reshaping" and then the "front-end" would be the system that decides what is to be shaped and how?

In other words, should I rename and redo this from stitcher to reshaper, with the details of how it is being reshaped (stitched, or split) to the consumer of the API?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - I think that this would be more appropriate as a Reshaper, and "stitching a multibag into a monobag" is just one possible behavior from given inputs.

Given your implementation of "read messages one at a time and send to writer" - this more general implementation should look exactly the same under the hood, but provide much more flexible functionality.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To go one step further - I think this API is much too large - as far as I can tell given the feature description, the public API should be a single function, like follows:

class Rewriter
{
public:
Rewriter(reader, writer);
void rewrite(StorageOptions input_options, StorageOptions output_options);
};

Naming things is hard... but I'm almost thinking that, given the description of the functionality, that it's more of a "Rewriter", because it reads a bag, and writes it again with new settings to a new location. It doesn't have to change the shape, it may be changing the storage implementation, compression, or serialization format, without changing the shape.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok yeah, I think I mimicked the API around the Reader api. But I totally agree it can be stripped down.

However, Is there not a need to pass in ConverterOptions as well?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mimicked the API around the Reader api

It's better to think about what functionality you're trying to provide. Your implementation will call the Reader/Writer APIs, but this thing isn't the same type of thing, so there is no reason for it to have the same API.

However, Is there not a need to pass in ConverterOptions as well?

Yes, probably

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See discussion here: #709 (comment)

I think there is no way to do a many to one bagfile without this API knowing about multiple bags to read.

@emersonknapp
Copy link
Collaborator

by the way, thanks for opening the draft - with the Galactic freeze on the horizon (May 17 absolute last date, preferring not to accept new features after May 1) it's best not to have any surprises

@emersonknapp emersonknapp mentioned this pull request Apr 6, 2021
5 tasks
@emersonknapp
Copy link
Collaborator

Just checking in, are you planning to try and get this in before the May 1 date?

@vinnnyr
Copy link
Author

vinnnyr commented Apr 9, 2021

Just checking in, are you planning to try and get this in before the May 1 date?

I am not certain, I will definitely try my best. I plan to get you responses to this first round of comments by early next week.

Reshaper/stitcher was too limiting, we need a rewriter instead.
* \param output_storage_options Storage options for the output bagfile
* \param output_converter_options Converter options for the output bagfile
**/
void rewrite_many(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think rewrite would be used for "conversion" (#717 , #599) but since the rosbag2_cpp::Writer cannot open a bag for rewriting , there had to be a way to only open the bag once for writing while opening multiple readers, and therefore I thought rewrite_many was appropriate.

(

* Opens a new bagfile and prepare it for writing messages. The bagfile must not exist.
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you trying to combine multiple "bags", or multiple "files"? This remains unclear from this comment. A single Reader can read a multi-file bag. Can you clarify what you're trying to do here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'll note that "convert" is a heavily overloaded term that can refer to a variety of independent features, as we're learning in the conversation on #717, so I'd avoid the term right now in favor of more specific language.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emersonknapp I am sorry I am having a bit of trouble understand what you mean about "bags" v.s. "files".

Is there a minimal example you could point me to about reading multifiles with the reader? I found the test, but I am having trouble following
https://github.com/ros2/rosbag2/blob/master/rosbag2_cpp/test/rosbag2_cpp/test_multifile_reader.cpp

From my understanding, the reader takes in either a string URI or StorageOptions:


Copy link
Collaborator

@emersonknapp emersonknapp Apr 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In rosbag2, a "bag" is not a single file on disk, it's a directory, potentially containing many individual files.

For example if I do a recording like the following, observe the contents of the bag from ls

$ ros2 bag record -a -o mybag
[INFO] [1618420012.176030068] [rosbag2_storage]: Opened database 'mybag/mybag_0.db3' for READ_WRITE.
[INFO] [1618420012.176144426] [rosbag2_recorder]: Listening for topics...
[INFO] [1618420012.178912237] [rosbag2_recorder]: Subscribed to topic '/rosout'
[INFO] [1618420012.180516824] [rosbag2_recorder]: Subscribed to topic '/parameter_events'
[INFO] [1618420024.978740077] [rosbag2_recorder]: Subscribed to topic '/chatter'
^C[INFO] [1618420027.180706513] [rclcpp]: signal_handler(signal_value=2)
[INFO] [1618420027.387157257] [rosbag2_cpp]: Writing remaining messages from cache to the bag. It may take a while

$ ls mybag
metadata.yaml  mybag_0.db3

In this case, mybag is the "ROS 2 Bag" - and mybag/mybag_0.db3 is a single file within that bag.

You can see in a case with splitting - (--max-bag-size causes splitting when a file gets large enough) - that there are multiple files within the bag.

$ rm -rf mybag

$ ros2 bag record -a -o mybag --max-bag-size 86016
[INFO] [1618420123.290318343] [rosbag2_storage]: Opened database 'mybag/mybag_0.db3' for READ_WRITE.
[INFO] [1618420123.290484810] [rosbag2_recorder]: Listening for topics...
[INFO] [1618420123.292763350] [rosbag2_recorder]: Subscribed to topic '/rosout'
[INFO] [1618420123.294727032] [rosbag2_recorder]: Subscribed to topic '/parameter_events'
[INFO] [1618420123.296239889] [rosbag2_recorder]: Subscribed to topic '/chatter'
[INFO] [1618420127.495646435] [rosbag2_cpp]: Writing remaining messages from cache to the bag. It may take a while
[INFO] [1618420127.498400134] [rosbag2_storage]: Opened database 'mybag/mybag_1.db3' for READ_WRITE.
^C[INFO] [1618420129.424583789] [rclcpp]: signal_handler(signal_value=2)
[INFO] [1618420129.627688074] [rosbag2_cpp]: Writing remaining messages from cache to the bag. It may take a while

$ ls mybag
metadata.yaml  mybag_0.db3  mybag_1.db3

When you pass a storage_uri to rosbag2 - you pass the directory mybag - e.g. ros2 bag play mybag, you aren't passing it the path of the individual file. It then, using the metadata, knows to open the files in sequence automatically using the SequentialReader.

StorageOptions contains a storage_uri string, so it's a superset of configuration for the reader - both of them are expecting this directory.

Given this context - are you trying to "combine files within a split bag into a new single-file bag"? Or are you trying to "combine multiple bags with their own metadata into a single bag"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah! ok I did not realize the storage_uri was describing the directory. So essentially, I am trying to "combine files within a split bag into a new single-file bag" , so I do not need the rewrite_many function yes.

@vinnnyr vinnnyr changed the title [WIP] 688/add stitcher [WIP] Adding Rewriter Apr 10, 2021
@vinnnyr
Copy link
Author

vinnnyr commented Apr 26, 2021

@emersonknapp Hey, needless to say, I don't think I will have this done by the Code Freeze date. Some other priorities came up.

@emersonknapp
Copy link
Collaborator

emersonknapp commented Apr 26, 2021

These things happen - are you still interested in continuing to work on it?

Galactic is only a short-term distribution - if we can get this into the H-turtle release next year, it will reach far more users anyways. It's a valuable feature.

@vinnnyr
Copy link
Author

vinnnyr commented Apr 26, 2021

Yes, I am still interested in continuing this development :)

@emersonknapp
Copy link
Collaborator

emersonknapp commented Sep 24, 2021

I'm going to close this as stale for now, please feel free to reopen if you have new material to review. I will note that @gbiggs is working on this feature right now for #831

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants