Skip to content

Conversation

Standard8
Copy link
Member

This fixes downloading & reading of reading of attachments wrt the remote settings dumps. The example code, and the attachment reading code were using parts of the record for the file name on disk, this makes them consistent.

I have tested locally that loading the attachments from the packaging works correctly with this change.

Pull Request checklist

  • Breaking changes: This PR follows our breaking change policy
    • This PR follows the breaking change policy:
      • This PR has no breaking API changes, or
      • There are corresponding PRs for our consumer applications that resolve the breaking changes and have been approved
  • Quality: This PR builds and tests run cleanly
    • Note:
      • For changes that need extra cross-platform testing, consider adding [ci full] to the PR title.
      • If this pull request includes a breaking change, consider cutting a new release after merging.
  • Tests: This PR includes thorough tests or an explanation of why it does not
  • Changelog: This PR includes a changelog entry in CHANGELOG.md or an explanation of why it does not need one
    • Any breaking changes to Swift or Kotlin binding APIs are noted explicitly
  • Dependencies: This PR follows our dependency management guidelines
    • Any new dependencies are accompanied by a summary of the due diligence applied in selecting them.

Branch builds: add [firefox-android: branch-name] to the PR title.

@Standard8 Standard8 requested a review from gruberb April 8, 2025 21:24
@Standard8
Copy link
Member Author

@gruberb Something I noticed after uploading this, I'm wondering if it would be better to use the record_id rather than the filename. For example, I noticed the addons-bloomfilter uses the same filename in two different records.

Using the record_id would also provide stability in the filename when the record is updated, which might be useful when looking at git history.

@Standard8 Standard8 changed the title Fix downloading and reading of attachments from remote settings dumps, and add search-config-icons dumps Bug 1955390 - Fix downloading and reading of attachments from remote settings dumps, and add search-config-icons dumps Apr 8, 2025
@gruberb
Copy link
Member

gruberb commented Apr 9, 2025

Using the record_id would also provide stability in the filename when the record is updated, which might be useful when looking at git history.

I have no strong opinion about this. Whatever works best. Would this mean we also use the record_id as the filename in the tree? This might make it harder to know what the file should actually represent went manually going through. Instead of names, we would have just random numbers?

Copy link
Member

@gruberb gruberb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I like the consistency, thanks for fixing this!

@Standard8
Copy link
Member Author

Using the record_id would also provide stability in the filename when the record is updated, which might be useful when looking at git history.

I have no strong opinion about this. Whatever works best. Would this mean we also use the record_id as the filename in the tree? This might make it harder to know what the file should actually represent went manually going through. Instead of names, we would have just random numbers?

Yeah, we'd just have random numbers, similar to this.

I think my concern is that filenames do have the potential to clash - as far as I know, we don't enforce them being distinct. So we could get to the stage where there's a published collection with clashing filenames, and we don't see a problem until we try and generate the dumps.

@leplatrem @bendk Any thoughts/preferences here?

@bendk
Copy link
Contributor

bendk commented Apr 9, 2025

Using the record_id would also provide stability in the filename when the record is updated, which might be useful when looking at git history.

I have no strong opinion about this. Whatever works best. Would this mean we also use the record_id as the filename in the tree? This might make it harder to know what the file should actually represent went manually going through. Instead of names, we would have just random numbers?

Yeah, we'd just have random numbers, similar to this.

I think my concern is that filenames do have the potential to clash - as far as I know, we don't enforce them being distinct. So we could get to the stage where there's a published collection with clashing filenames, and we don't see a problem until we try and generate the dumps.

@leplatrem @bendk Any thoughts/preferences here?

If filename is not guaranteed to be unique, then I think record_id is better. I'd defer to @leplatrem though.

Another option would be some combination ([record_id].[filename]). I personally don't think it's worth the extra complexity, but if seeing the filename in the repo was important to you or some other devs then this could be an option.

@leplatrem
Copy link
Contributor

leplatrem commented Apr 10, 2025

I will try to answer this clearly, but the way it's done on desktop was heavily driven by the addons blocklist use-case (done by Rob Wu a few years ago 🙏).

In short: on Desktop, the attachment ID is "configurable", and it is record.id by default. I would use record IDs, and leave this customization for the future if we need.

Context

The addons blocklist v3 is AFAIK the only use-case that uses the filename as the attachmentId (source), and probably the only collection that has different records with the same filename.
As you can see, on desktop, in the dumps folder, the filenames are used:
https://searchfox.org/mozilla-central/source/services/settings/dumps/blocklists/addons-bloomfilters

A .meta.json file is used to store the attributes of the record associated with the attachment file.
This allows the client to lookup and compare the data in the dump when calling .download() (source)

Their collection is a bloomfilter where there is a base file, that gets completed with stashes. We would have to ask Rob if we would need details, but I think that for now we can ignore this specificity, and go with record.id.

Would this mean we also use the record_id as the filename in the tree? This might make it harder to know what the file should actually represent went manually going through.

Not all collections use randomly assigned UUIDs for record IDs, so I don't think it would be an issue!

@Standard8 Standard8 force-pushed the bug-1955390-icon-dumps branch from df697a0 to fa9d630 Compare April 11, 2025 10:53
@Standard8
Copy link
Member Author

Thank you everyone for the comments. I've gone with the record id, since that seems to be better for the long-run. This also simplifies the patch, undoing the changes to examples/remote-settings-cli/src/dump/client.rs and the existing files from the regions collection.

I'll merge this later once tests have run.

@Standard8 Standard8 added this pull request to the merge queue Apr 11, 2025
Merged via the queue into mozilla:main with commit a2c71d1 Apr 11, 2025
15 checks passed
@Standard8 Standard8 deleted the bug-1955390-icon-dumps branch April 11, 2025 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants