Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Backblaze B2 integration for backups #134014

Draft
wants to merge 2 commits into
base: dev
Choose a base branch
from
Draft

Add Backblaze B2 integration for backups #134014

wants to merge 2 commits into from

Conversation

frenck
Copy link
Member

@frenck frenck commented Dec 25, 2024

⚠️ This PR/integration isn't ready yet.

At this point, it works, it is using the SDK provided by Backblaze themselves, which is all super nice.

However, their library is sync-only, and built on top of requests. This makes it less ideal for our backup agent implementation. I do want to test this on some bigger backup sizes.

Additionally, there is a mypy error left, that I've not resolved. Dunno why, I'm probably overlooking something simple at this point.

🤗 Feel free to jump in and push improvements to this branch directly ❤️

Proposed change

This PR adds the first steps in adding an integration for Backblaze B2.

The integration provides a backup agent that works with the Home Assistant backup solution introduced in Home Assistant 2025.1.

CleanShot 2024-12-25 at 23 43 53@2x

CleanShot 2024-12-25 at 23 44 28@2x

CleanShot 2024-12-25 at 23 44 48@2x

CleanShot 2024-12-25 at 23 41 13@2x

CleanShot 2024-12-25 at 23 41 49@2x

CleanShot 2024-12-25 at 23 42 33@2x

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New integration (thank you!)
  • New feature (which adds functionality to an existing integration)
  • Deprecation (breaking change to happen in the future)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

  • This PR fixes or closes issue: fixes #
  • This PR is related to issue:
  • Link to documentation pull request:

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • I have followed the perfect PR recommendations
  • The code has been formatted using Ruff (ruff format homeassistant tests)
  • Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • The manifest file has all fields filled out correctly.
    Updated and included derived files by running: python3 -m script.hassfest.
  • New or updated dependencies have been added to requirements_all.txt.
    Updated by running python3 -m script.gen_requirements_all.
  • For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.

To help with the load of incoming pull requests:

Comment on lines +38 to +46
await hass.async_add_executor_job(
backblaze.authorize_account,
"production",
entry.data[CONF_APPLICATION_KEY_ID],
entry.data[CONF_APPLICATION_KEY],
)
bucket = await hass.async_add_executor_job(
backblaze.get_bucket_by_id, entry.data[CONF_BUCKET]
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap in a sync function and do this with 1 executor call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did in other places, not sure why I've skipped this one. Will do 👍

try:
await self._hass.async_add_executor_job(
self._bucket.upload_bytes,
b"".join([chunk async for chunk in stream]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this means you're pushing the whole backup into memory. This won't work for bigger backups.

Does upload_bytes take an iterator? If so, you could write a sync iterator that uses run_coroutine_threadsafe to get a megabyte of data from the async iterator.

Copy link
Member Author

@frenck frenck Dec 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that is the bad side here... and no, it doesn't take an iterator.

There is a method that takes any stream, which makes a lot of sense... (I expect that to be more common than an iterator for this)

But I have no clue on how to wrap an async iterator into a stream. A BufferedReader (or BufferedWriter) could maybe work? Dunno.

Working on this, made we wonder why we made this an iterator to begin with 😬 although nice in many cases, the case I working with in this PR isn't uncommon.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have an idea how to wrap it. Will whip something up during the day.

Copy link
Member Author

@frenck frenck Dec 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrote a BufferSyncIteratorToSyncStream on top of the Python io stream base to handle this case. Used your suggestion, including a buffer to ready up until a set buffer size each time we run the coroutine once the buffer is depleted.

Set the buffer to use in this case to 8 megabytes.

I have tested it by dropping a few gigabytes of binary files into my Home Assistant configuration folder. It passed nicely, but a little bit on the slow end for my feeling (nothing to back up this with any reasoning, just a feeling). Indicating there might be room for improvement in terms of efficiency.

@mkohns
Copy link

mkohns commented Dec 27, 2024

HI @frenck, I just took your code and replaced backblaze client with boto3 client to support any kind of S3 buckets. I tested it with Idrive e2, azure storage and minio. Is this something for the current beta - or for later? What do you think?

"bucket": "Bucket"
},
"data_description": {
"bucket": "Select the bucked to store backups in."

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"bucket": "Select the bucked to store backups in."
"bucket": "Select the bucket to store backups in."

@mkohns
Copy link

mkohns commented Dec 27, 2024

Could you please also test to download a backup?
Reproduction:
-> select a backup
-> click on the three dots at backblaze
-> select "download from this location"

I am getting the error (in my S3 implementation)

TypeError: object async_generator can't be used in 'await' expression
in line: "/workspaces/ha-core/homeassistant/components/backup/http.py", line 66, in get stream = await agent.async_download_backup(backup_id)

I can not test this with Backblaze as I do not have an account.

But this line here:

stream = await agent.async_download_backup(backup_id) (in http.py:66)

Seems to be wrong. Without the await - it works.

@mbrevda
Copy link

mbrevda commented Jan 7, 2025

B2 might already be covered by this integration

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants