-
Notifications
You must be signed in to change notification settings - Fork 22.7k
Expand Compression Dictionary format description #39441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand Compression Dictionary format description #39441
Conversation
Preview URLs Flaws (1)URL:
External URLs (1)URL:
(comment last updated: 2025-05-13 00:09:46) |
@@ -76,7 +76,11 @@ Compression Dictionary Transport can achieve an order of magnitude more compress | |||
|
|||
## Dictionary format | |||
|
|||
A compression dictionary is a "raw" file that does not follow any specific format, nor have a specific {{Glossary("MIME type")}}. They are regular files that can be used to compress other files with similar content and so can be text files or even binary. For example, [WASM](/en-US/docs/WebAssembly) binary files are large resources that can also benefit from delta compression. | |||
A compression dictionary is a "raw" file that does not follow any specific format, nor have a specific {{Glossary("MIME type")}}. They are regular files that can be used to compress other files with similar content. This is why previous versions of files can be used as in delta compression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tunetheweb . It is difficult to know if this addresses the original issue without knowing what problems developers had with the text. Can you expand?
Every sentence in the first paragraph is problematic:
- A compression dictionary is a "raw" file that does not follow any specific format, nor have a specific {{Glossary("MIME type")}}.
- Just to be clear, you mean "in the general case right".
- In other words it can literally be any file, you just need some mechanism in your toolchain to reference a part of it and then you can have a second file that says (say) "new version is everything up to line
- They are regular files that can be used to compress other files with similar content.
- Files don't compress files. Tools compress files. They "can be used in the compression of other files"
- See comment below this list
- This is why previous versions of files can be used as in delta compression.
- You can't draw that from the previous statement
What I think you are getting at here is that the dictionary can be any file at all. The algorithm decides how the new file is built from the dictionary.
You also say "this is a regular file". Is the implication that this is not a special dictionary file? I..e. you might use version1.js as the dictionary for version2.js. Or is the implication is that it must not be a special file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tunetheweb . It is difficult to know if this addresses the original issue without knowing what problems developers had with the text. Can you expand?
Ah that's fair. I did tag in @rmarx as he was the one that raised the issue to me. The basic concern is that it's not well understood that there is no "dictionary format" and people either use previous files (delta-compression) or a list of common text/symbols.
Every sentence in the first paragraph is problematic:
Yeah I agree.
In other words it can literally be any file, you just need some mechanism in your toolchain to reference a part of it and then you can have a second file that says (say) "new version is everything up to line
Well if "in your toolchain" you mean to compress your files, and then for the browser to decompress them, then yes.
This is why previous versions of files can be used as in delta compression.
You can't draw that from the previous statement
What I meant was "because dictionaries don't follow a specific format are are just a list of bytes to be referenced, even previous files can be used as dictionaries (known as delta-compression, since you effectively only provide the delta)."
What I think you are getting at here is that the dictionary can be any file at all. The algorithm decides how the new file is built from the dictionary.
Yes. This is exactly what I'm trying to say. Though in practice it's usual to use the previous version, or a custom-built dictionary of commonly-used symbols.
You also say "this is a regular file". Is the implication that this is not a special dictionary file? I..e. you might use version1.js as the dictionary for version2.js. Or is the implication is that it must not be a special file?
The implication is there is no special dictionary file format. So anything can be used, and the best dictionaries have the a lot of overlap to allow optimal compression. And previous versions of a file typically has a huge amount of overlap so make great dictionaries.
Let me take another stab at re-writing this to make these points better.
A compression dictionary is a "raw" file that does not follow any specific format, nor have a specific {{Glossary("MIME type")}}. They are regular files that can be used to compress other files with similar content and so can be text files or even binary. For example, [WASM](/en-US/docs/WebAssembly) binary files are large resources that can also benefit from delta compression. | ||
A compression dictionary is a "raw" file that does not follow any specific format, nor have a specific {{Glossary("MIME type")}}. They are regular files that can be used to compress other files with similar content. This is why previous versions of files can be used as in delta compression. | ||
|
||
Another typical approach is to list common strings (for example your HTML templates) together into a new `dictionary.txt` file so it can be used to compress HTML pages on the website. You can optimize this further by using specialized tooling, for example [Brotli's dictionary generator](https://github.com/google/brotli/blob/master/research/dictionary_generator.cc). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, this is an example of how you might construct a compression dictionary. I ask because "Another typical" needs to be relative to something?
I also think you're saying that specialized tooling can create this dictionary for your more efficiently - e.g. by working out which strings are highest use or in some other way best for compression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took another stab at this.
To be clear, this is an example of how you might construct a compression dictionary. I ask because "Another typical" needs to be relative to something?
By "Another typical" I meant "an alternative to using previous versions is..."
I also think you're saying that specialized tooling can create this dictionary for your more efficiently - e.g. by working out which strings are highest use or in some other way best for compression?
If you just list strings you think are needed, then you'll probably end up listing duplicates. E.g. if you were a fruit store you might list: "apple", "pineapple", "orange"...etc. as common word. But "apple" is contained in "pineapple". So specialized tooling can spot these things and create a more optimal list.
files/en-us/web/http/guides/compression_dictionary_transport/index.md
Outdated
Show resolved
Hide resolved
…ndex.md Co-authored-by: Hamish Willee <hamishwillee@gmail.com>
@hamishwillee thanks for your feedback. I think you raised lots of valid points. I've taken another stab at it. PTAL. |
files/en-us/web/http/guides/compression_dictionary_transport/index.md
Outdated
Show resolved
Hide resolved
files/en-us/web/http/guides/compression_dictionary_transport/index.md
Outdated
Show resolved
Hide resolved
files/en-us/web/http/guides/compression_dictionary_transport/index.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification and updates @tunetheweb - helps a lot.
I've done minor tweaks but this is good.
Description
I was asked to expand upon the Compression Dictionary format section since there is some developer confusion about this.
Motivation
Clarify format. FYI @rmarx
Additional details
Related issues and pull requests
Originally added in #38974