-
Notifications
You must be signed in to change notification settings - Fork 106
Description
Abstract
The ObjectStore trait -- as designed currently -- is a middle ground of somewhat competing design goals. I think we can do better than that.
Requirements
The trait serves two groups of API users:
Object Store Users
Humans and machines that want to use an object store through a unified interface.
They usually like to write only the relevant parts, e.g.:
- "get object
foo/bar.txt" ==>store.get("foo/bar.txt").await? - "get first 10 bytes of
foo/bar.txt" ==>store.get("foo/bar.txt").with_range(..10).await?
Object Store Implementations
Parties that implement a new object store or wrap an existing one. They usually want to implement the methods that determine the behavior of the object store, ideally without surprises (like opinionated default implementations).
Status Quo
Due to this dual nature the trait has accumulated an increasing number of methods, a bunch of them with default implementation. Let's have a look at the "get" methods:
arrow-rs-object-store/src/lib.rs
Lines 633 to 662 in 0c3152c
| /// Return the bytes that are stored at the specified location. | |
| async fn get(&self, location: &Path) -> Result<GetResult> { | |
| self.get_opts(location, GetOptions::default()).await | |
| } | |
| /// Perform a get request with options | |
| async fn get_opts(&self, location: &Path, options: GetOptions) -> Result<GetResult>; | |
| /// Return the bytes that are stored at the specified location | |
| /// in the given byte range. | |
| /// | |
| /// See [`GetRange::Bounded`] for more details on how `range` gets interpreted | |
| async fn get_range(&self, location: &Path, range: Range<u64>) -> Result<Bytes> { | |
| let options = GetOptions { | |
| range: Some(range.into()), | |
| ..Default::default() | |
| }; | |
| self.get_opts(location, options).await?.bytes().await | |
| } | |
| /// Return the bytes that are stored at the specified location | |
| /// in the given byte ranges | |
| async fn get_ranges(&self, location: &Path, ranges: &[Range<u64>]) -> Result<Vec<Bytes>> { | |
| coalesce_ranges( | |
| ranges, | |
| |range| self.get_range(location, range), | |
| OBJECT_STORE_COALESCE_DEFAULT, | |
| ) | |
| .await | |
| } |
All except for get_ranges basically map to get_opts and there is no reason for a store to override any of the defaults. And even for get_ranges we could come up with a sensible mapping to get_opts if the range parameter would support multi-ranges similar to HTTP.
Now let's look at "rename":
arrow-rs-object-store/src/lib.rs
Lines 773 to 782 in 0c3152c
| /// Move an object from one path to another in the same object store. | |
| /// | |
| /// By default, this is implemented as a copy and then delete source. It may not | |
| /// check when deleting source that it was the same object that was originally copied. | |
| /// | |
| /// If there exists an object at the destination, it will be overwritten. | |
| async fn rename(&self, from: &Path, to: &Path) -> Result<()> { | |
| self.copy(from, to).await?; | |
| self.delete(from).await | |
| } |
arrow-rs-object-store/src/lib.rs
Lines 793 to 799 in 0c3152c
| /// Move an object from one path to another in the same object store. | |
| /// | |
| /// Will return an error if the destination already has an object. | |
| async fn rename_if_not_exists(&self, from: &Path, to: &Path) -> Result<()> { | |
| self.copy_if_not_exists(from, to).await?; | |
| self.delete(from).await | |
| } |
I think it is out of question that a store implementation should definitely override these if there's any way to perform key renames without a full "get + put".
Proposal
I propose to remove all default implementations from the trait and only have a single method per operation, so "get" would look like this:
async fn get(&self, location: &Path, options: GetOptions) -> Result<GetResult>;Note that the location is NOT part of options so that GetOptions can still implement Default and a user only needs to specify the options of interest:
// get bytes of file
store.get(
&location,
Default::default(),
).await?.bytes().await?;
// get range
store.get(
&location,
GetOptions{
range: (..10).into(),
..Default::default()
},
).await?.bytes().await?;A similar mapping can be done for "rename".
I think that strikes a good balance between boilerplate / verbosity of the user API and the clarity of the interface.