-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
can we let z5 write to object storage directly #129
Comments
Great!
What exactly do you have in mind? aws-s3?
z5 does not support cloud object stores yet. However, there is an n5-java implementation for aws-s3 and google buckets. Zarr-python also supports some cloud stores I am very interested in supporting this directly in z5 as well and would be happy to help out if you or @weilewei wanted to contribute to this. It would be good if you could elaborate on your use-case a bit more. What exactly do you need? Would an implementation along the lines of n5-aws serve your purposes? |
I guess the easiest MVP would be to factor the C++ end to handle the chunking and compression,, communicating with python with lists of However, with that in place, you could pretty rapidly expand into any object storage python supports with optional dependencies. I suspect you could even use some of the utilities in zarr-python for handling them. |
I am thinking about aws-s3. I hope that Z5 can have the feature directly. I’d like to help but I don’t know how to help yet. For details, I need to look and discuss with other people that are familiar with object storages.
…Sent from my iPhone
On Aug 17, 2019, at 8:39 AM, Chris Barnes ***@***.***> wrote:
I guess the easiest MVP would be to factor the C++ end to handle the chunking and compression,, communicating with python with lists of (block_index, bytes). That would work for block-aligned reads and writes; for non-aligned IO you'd have to negotiate with python to get the edge blocks (index-bytes tuples) to pass in to C++.
However, with that in place, you could pretty rapidly expand into any object storage python supports with optional dependencies. I suspect you could even use some of the utilities in zarr-python for handling them.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Yes, that would probably be the fastest way to get something working, but I think it would add more value if we implemented a complete C++ solution (that can then be wrapped to python). This way z5 would allow access to zarr/n5 cloud storage from C++, which is currently not available.
Ok, I also think that aws-s3 is the first cloud storage that should be implemented.
From the implementation perspective, looking into the aws c++ sdk is a good starting point.
Great, please share any feedback that you get. |
On Sat, Aug 17, 2019 at 11:18 AM Constantin Pape ***@***.***> wrote:
@clbarnes <https://github.com/clbarnes>
I guess the easiest MVP would be to factor the C++ end to handle the
chunking and compression,, communicating with python with lists of (block_index,
bytes).
Yes, that would probably be the fastest way to get something working, but
I think it would add more value if we implemented a complete C++ solution
(that can then be wrapped to python). This way z5 would allow access to
zarr/n5 cloud storage from C++, which is currently not available.
@halehawk <https://github.com/halehawk>
I am thinking about aws-s3. I hope that Z5 can have the feature directly.
Ok, I also think that aws-s3 is the first cloud storage that should be
implemented.
I’d like to help but I don’t know how to help yet.
From the implementation perspective, looking into the aws c++ sdk
<https://docs.aws.amazon.com/sdk-for-cpp/v1/developer-guide/welcome.html>
is a good starting point.
That's a good start point, I will check on it.
… For details, I need to look and discuss with other people that are
familiar with object storages.
Great, please share any feedback that you get.
In the meantime I will have a look into how to integrate cloud storage
support in the z5 c++ codebase.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#129?email_source=notifications&email_token=ACAPEFBVO32UPEXQ7EPVBXTQFAXHVA5CNFSM4IMMQDP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4QPVGY#issuecomment-522255003>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACAPEFDCNKXL5HUOKWJM2W3QFAXHVANCNFSM4IMMQDPQ>
.
|
I had a look into how to integrate an AWS (or other cloud storage) backend into the z5 c++ API. The main idea is to separate the backend implementations into separate namespaces. You can find the implementation for the default filesystem backend and a mock-up implementation for aws. Now we would need to actually implement the AWS part using the AWSSDK. Any help here would be very welcome! Let me know if there are any questions. Note that the changes in the C++ API are breaking, so merging this would imply bumping the version to 2. Also, I haven't adapted the python bindings yet, but that should be fairly straightforward. |
Looks reasonable to me. We just use s3 instead of boost file system in your
mock-up implementation for aws.
I need to figure out how to get an Amazon s3 account now.
…On Mon, Aug 19, 2019 at 3:00 PM Constantin Pape ***@***.***> wrote:
I had a look into how to integrate an AWS (or other cloud storage) backend
into the z5 c++ backend.
It took some refactoring, but I arrived at an implementation that should
work, see #130 <#130>.
The main idea is to separate the backend implementations into separate
namespaces. You can find the implementation for the default filesystem
backend
<https://github.com/constantinpape/z5/tree/cloud-storage/include/z5/filesystem>
and a mock-up implementation for aws
<https://github.com/constantinpape/z5/tree/cloud-storage/include/z5/s3>.
Now we would need to actually implement the AWS part using the AWSSDK. Any
help here would be very welcome! Let me know if there are any questions.
Note that the changes in the C++ API are breaking, so merging this would
imply bumping the version to 2.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#129?email_source=notifications&email_token=ACAPEFB425HLI37XENAHVRTQFMCVFA5CNFSM4IMMQDP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4UI5RA#issuecomment-522751684>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACAPEFFWTC5SFECKEHVFL33QFMCVFANCNFSM4IMMQDPQ>
.
|
Exactly.
I will have a look into getting an AWS account and setting up some test data too. |
Short Update on this: |
Now we integrated z5 into CESM (an earth system model). We want to test it on clouds with writing out to object storages directly. Do you know if we can do it by using current z5 or we need additional setup to write to object storages?
The text was updated successfully, but these errors were encountered: