Skip to content

Decompress object streams asynchronously when it's possible#20601

Open
calixteman wants to merge 1 commit intomozilla:masterfrom
calixteman:decompress_obj_stream
Open

Decompress object streams asynchronously when it's possible#20601
calixteman wants to merge 1 commit intomozilla:masterfrom
calixteman:decompress_obj_stream

Conversation

@calixteman
Copy link
Contributor

Most of the time, the object streams are compressed using FlateDecode (and in future with BrotliDecode).
So in order to improve the performances we can decompress those streams with a built-in decompressor but it has to be done asynchronously. Since it cannot be done when fetching which is synchronous, we need to do it as part of the PDF parsing process.
The drawback is that this requires more memory since we need to keep both the compressed and uncompressed versions of the object streams in memory until the parsing is done.

@calixteman
Copy link
Contributor Author

/botio test

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Linux m4)


Received

Command cmd_test from @calixteman received. Current queue size: 0

Live output at: http://54.241.84.105:8877/3e2fd092ee6f1f3/output.txt

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Received

Command cmd_test from @calixteman received. Current queue size: 0

Live output at: http://54.193.163.58:8877/e7b88f8a76dd3a1/output.txt

Copy link
Contributor

@timvandermeij timvandermeij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r=me, with the comment addressed and passing tests. Thanks!

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/3e2fd092ee6f1f3/output.txt

Total script time: 42.07 mins

  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: FAILED
  different ref/snapshot: 1

Image differences available at: http://54.241.84.105:8877/3e2fd092ee6f1f3/reftest-analyzer.html#web=eq.log

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/e7b88f8a76dd3a1/output.txt

Total script time: 146.76 mins

  • Unit tests: FAILED
  • Integration Tests: FAILED
  • Regression tests: FAILED
  errors: 1055

Image differences available at: http://54.193.163.58:8877/e7b88f8a76dd3a1/reftest-analyzer.html#web=eq.log

@calixteman calixteman force-pushed the decompress_obj_stream branch 2 times, most recently from 7399e09 to a60f53f Compare February 2, 2026 18:28
@calixteman
Copy link
Contributor Author

/botio test

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Received

Command cmd_test from @calixteman received. Current queue size: 0

Live output at: http://54.193.163.58:8877/4d22a147cc5d465/output.txt

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Linux m4)


Received

Command cmd_test from @calixteman received. Current queue size: 0

Live output at: http://54.241.84.105:8877/3def7312905b772/output.txt

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/3def7312905b772/output.txt

Total script time: 43.01 mins

  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: FAILED
  different ref/snapshot: 1

Image differences available at: http://54.241.84.105:8877/3def7312905b772/reftest-analyzer.html#web=eq.log

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/4d22a147cc5d465/output.txt

Total script time: 56.13 mins

  • Unit tests: FAILED
  • Integration Tests: FAILED
  • Regression tests: FAILED
  errors: 1218

Image differences available at: http://54.193.163.58:8877/4d22a147cc5d465/reftest-analyzer.html#web=eq.log

Most of the time, the object streams are compressed using FlateDecode (and in
future with BrotliDecode).
So in order to improve the performances we can decompress those streams
with a built-in decompressor but it has to be done asynchronously.
Since it cannot be done when fetching which is synchronous, we need to do
it as part of the PDF parsing process.
The drawback is that this requires more memory since we need to keep both
the compressed and uncompressed versions of the object streams in memory
until the parsing is done.
@calixteman calixteman force-pushed the decompress_obj_stream branch from a60f53f to e7900fc Compare February 4, 2026 14:29
@calixteman
Copy link
Contributor Author

/botio test

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Linux m4)


Received

Command cmd_test from @calixteman received. Current queue size: 0

Live output at: http://54.241.84.105:8877/4828dfa2ddb9125/output.txt

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Received

Command cmd_test from @calixteman received. Current queue size: 0

Live output at: http://54.193.163.58:8877/b235e05cf63ad98/output.txt

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/b235e05cf63ad98/output.txt

Total script time: 59.57 mins

  • Unit tests: FAILED
  • Integration Tests: FAILED
  • Regression tests: FAILED
  errors: 1149

Image differences available at: http://54.193.163.58:8877/b235e05cf63ad98/reftest-analyzer.html#web=eq.log

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/4828dfa2ddb9125/output.txt

Total script time: 60.00 mins

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants