-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Backblaze B2 integration for backups #134014
base: dev
Are you sure you want to change the base?
Conversation
await hass.async_add_executor_job( | ||
backblaze.authorize_account, | ||
"production", | ||
entry.data[CONF_APPLICATION_KEY_ID], | ||
entry.data[CONF_APPLICATION_KEY], | ||
) | ||
bucket = await hass.async_add_executor_job( | ||
backblaze.get_bucket_by_id, entry.data[CONF_BUCKET] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrap in a sync function and do this with 1 executor call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did in other places, not sure why I've skipped this one. Will do 👍
try: | ||
await self._hass.async_add_executor_job( | ||
self._bucket.upload_bytes, | ||
b"".join([chunk async for chunk in stream]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this means you're pushing the whole backup into memory. This won't work for bigger backups.
Does upload_bytes
take an iterator? If so, you could write a sync iterator that uses run_coroutine_threadsafe
to get a megabyte of data from the async iterator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that is the bad side here... and no, it doesn't take an iterator.
There is a method that takes any stream, which makes a lot of sense... (I expect that to be more common than an iterator for this)
But I have no clue on how to wrap an async iterator into a stream. A BufferedReader (or BufferedWriter) could maybe work? Dunno.
Working on this, made we wonder why we made this an iterator to begin with 😬 although nice in many cases, the case I working with in this PR isn't uncommon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I have an idea how to wrap it. Will whip something up during the day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrote a BufferSyncIteratorToSyncStream on top of the Python io stream base to handle this case. Used your suggestion, including a buffer to ready up until a set buffer size each time we run the coroutine once the buffer is depleted.
Set the buffer to use in this case to 8 megabytes.
I have tested it by dropping a few gigabytes of binary files into my Home Assistant configuration folder. It passed nicely, but a little bit on the slow end for my feeling (nothing to back up this with any reasoning, just a feeling). Indicating there might be room for improvement in terms of efficiency.
HI @frenck, I just took your code and replaced backblaze client with boto3 client to support any kind of S3 buckets. I tested it with Idrive e2, azure storage and minio. Is this something for the current beta - or for later? What do you think? |
"bucket": "Bucket" | ||
}, | ||
"data_description": { | ||
"bucket": "Select the bucked to store backups in." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"bucket": "Select the bucked to store backups in." | |
"bucket": "Select the bucket to store backups in." |
Could you please also test to download a backup? I am getting the error (in my S3 implementation) TypeError: object async_generator can't be used in 'await' expression I can not test this with Backblaze as I do not have an account. But this line here: stream = await agent.async_download_backup(backup_id) (in http.py:66) Seems to be wrong. Without the await - it works. |
B2 might already be covered by this integration |
At this point, it works, it is using the SDK provided by Backblaze themselves, which is all super nice.
However, their library is sync-only, and built on top of requests. This makes it less ideal for our backup agent implementation. I do want to test this on some bigger backup sizes.
Additionally, there is a
mypy
error left, that I've not resolved. Dunno why, I'm probably overlooking something simple at this point.🤗 Feel free to jump in and push improvements to this branch directly ❤️
Proposed change
This PR adds the first steps in adding an integration for Backblaze B2.
The integration provides a backup agent that works with the Home Assistant backup solution introduced in Home Assistant 2025.1.
Type of change
Additional information
Checklist
ruff format homeassistant tests
)If user exposed functionality or configuration variables are added/changed:
If the code communicates with devices, web services, or third-party tools:
Updated and included derived files by running:
python3 -m script.hassfest
.requirements_all.txt
.Updated by running
python3 -m script.gen_requirements_all
.To help with the load of incoming pull requests: