Skip to content

Experimental/dcz #91

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 12, 2025
Merged

Experimental/dcz #91

merged 4 commits into from
Aug 12, 2025

Conversation

kjdev
Copy link
Owner

@kjdev kjdev commented Aug 8, 2025

Summary by CodeRabbit

  • New Features

    • Added a new "dcz" dictionary-based output compression mode; responses reflect dictionary-aware encoding.
  • Bug Fixes

    • Improved handling when requested dictionaries are missing or mismatched; emits warnings and falls back safely.
  • Tests

    • Added several PHPT tests validating dictionary-enabled and error scenarios for zstd/dcz output compression.
  • Chores

    • Updated repository attributes to treat new binary extensions appropriately.

Copy link

coderabbitai bot commented Aug 8, 2025

Walkthrough

Adds DCZ (dictionary-aware zstd) encoding to the PHP zstd extension: detects "dcz" Accept-Encoding, computes/verifies dictionary SHA-256 via Available-Dictionary header, prefixes DCZ outputs with a digest header, updates response headers, and adds four PHPT tests plus .gitattributes entries for .zstd and .dic files.

Changes

Cohort / File(s) Change Summary
Compression Handler and Encoding Support
zstd.c
Adds DCZ encoding support alongside zstd, new encoding flags, a dict_digest[32] in the context, dictionary SHA-256 computation and base64 verification against HTTP_AVAILABLE_DICTIONARY, DCZ output header prefixing, adjustments to Content-Encoding/Vary handling, and relevant includes.
Test Cases for DCZ and Dictionary Handling
tests/ob_dcz_001.phpt, tests/ob_dcz_002.phpt, tests/ob_dcz_003.phpt, tests/ob_dcz_004.phpt
Adds four PHPT tests covering successful DCZ use, unavailable/invalid dictionary scenarios, warnings for mismatches, and header/content expectations under CGI SAPI.
Git Attributes Update
.gitattributes
Marks *.zstd and *.dic as binary so Git treats those file types as binary (no line-ending/diff text handling).

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant PHP_CGI
    participant ZSTD_Handler

    Client->>PHP_CGI: HTTP Request (Accept-Encoding includes zstd/dcz, Available-Dictionary?)
    PHP_CGI->>ZSTD_Handler: start compression with configured dictionary
    ZSTD_Handler->>ZSTD_Handler: detect encodings (zstd/dcz)
    alt DCZ requested and dictionary present
        ZSTD_Handler->>ZSTD_Handler: load dict, compute SHA-256, base64-encode
        ZSTD_Handler->>PHP_CGI: compare with Available-Dictionary header
        alt digest matches
            ZSTD_Handler->>PHP_CGI: compress, prefix 40-byte DCZ header
            PHP_CGI->>Client: Response (Content-Encoding: dcz, Vary: Accept-Encoding, Available-Dictionary)
        else mismatch or missing
            ZSTD_Handler->>PHP_CGI: emit warning, disable DCZ
            PHP_CGI->>Client: Response (Content-Encoding: zstd, Vary: Accept-Encoding)
        end
    else only zstd
        ZSTD_Handler->>PHP_CGI: compress normally (zstd)
        PHP_CGI->>Client: Response (Content-Encoding: zstd, Vary: Accept-Encoding)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • Refactor #86 — Modifies zstd.c dictionary loading and context creation; likely overlaps with DCZ dictionary handling and context changes.
  • Fead/add dict args #90 — Adds dictionary support and passes dictionary args into context creation; related to the new dict_digest and dictionary verification flow.

Poem

In my burrow I nibble bytes and play,
I stash a digest to guide the way.
dcz and zstd now hop in line,
Tests keep them honest, headers fine.
A rabbit's little patch — compress away! 🐇✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 43bff60 and ff9a729.

📒 Files selected for processing (6)
  • .gitattributes (1 hunks)
  • tests/ob_dcz_001.phpt (1 hunks)
  • tests/ob_dcz_002.phpt (1 hunks)
  • tests/ob_dcz_003.phpt (1 hunks)
  • tests/ob_dcz_004.phpt (1 hunks)
  • zstd.c (10 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • .gitattributes
  • tests/ob_dcz_004.phpt
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-04-20T19:39:56.258Z
Learnt from: thekid
PR: kjdev/php-ext-zstd#77
File: zstd.stub.php:13-22
Timestamp: 2025-04-20T19:39:56.258Z
Learning: The php-ext-zstd library aims to maintain compatibility with PHP 7, so PHP 8-specific syntax features like union types should be avoided in favor of PHPDoc annotations.

Applied to files:

  • tests/ob_dcz_003.phpt
  • tests/ob_dcz_002.phpt
  • tests/ob_dcz_001.phpt
  • zstd.c
🔇 Additional comments (9)
zstd.c (9)

31-33: LGTM! Essential includes for DCZ functionality.

The new includes are necessary for the DCZ implementation:

  • php_hash.h and php_hash_sha.h for computing dictionary SHA-256 digest
  • base64.h for base64 encoding the digest for header comparison

86-86: LGTM! Dictionary digest storage added to context.

The 32-byte dict_digest field is properly sized for SHA-256 and will store the computed dictionary hash for DCZ validation.


114-114: LGTM! Proper digest initialization.

The dictionary digest is correctly initialized to zero bytes in the context initialization function.


1261-1262: LGTM! Well-designed encoding flag constants.

The bit flag design allows for independent control of ZSTD and DCZ encodings, enabling proper fallback behavior when DCZ requirements aren't met.


1280-1285: LGTM! Proper dual encoding detection.

The Accept-Encoding parsing correctly detects both "zstd" and "dcz" encodings independently, allowing the DCZ flag to be set alongside ZSTD when both are present in the client request.


1322-1364: LGTM! Comprehensive DCZ dictionary validation with proper memory management.

The DCZ validation logic correctly:

  • Computes SHA-256 digest of the dictionary
  • Base64-encodes the digest for comparison
  • Validates against the HTTP_AVAILABLE_DICTIONARY header (accounting for the colon-wrapped format)
  • Properly releases memory on failure paths (as fixed in the previous review)
  • Disables DCZ encoding when validation fails

The implementation properly handles the Available-Dictionary header format with leading/trailing colons.


1378-1380: LGTM! Proper encoding validation.

The context start function correctly fails when no encoding is enabled, preventing compression when neither ZSTD nor DCZ requirements are met.


1406-1416: LGTM! Correct DCZ header prefixing.

The DCZ implementation properly prefixes the output with the required 40-byte header:

  • 8-byte magic marker (\x5e\x2a\x4d\x18\x20\x00\x00\x00)
  • 32-byte dictionary digest

The header is only added on the start operation, ensuring it appears once at the beginning of the compressed stream.


1498-1534: LGTM! Proper HTTP response header management.

The header logic correctly differentiates between DCZ and standard ZSTD:

  • DCZ: Content-Encoding: dcz + Vary: Accept-Encoding, Available-Dictionary
  • ZSTD: Content-Encoding: zstd + Vary: Accept-Encoding

This ensures clients can properly identify the encoding type and understand which headers affect caching behavior.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch experimental/dcz

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@kjdev
Copy link
Owner Author

kjdev commented Aug 8, 2025

@coderabbitai review

Copy link

coderabbitai bot commented Aug 8, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (3)
tests/ob_dcz_001.phpt (1)

14-15: Minor: Prefer DIR for readability

Equivalent on PHP 7+, consistent with modern PHPT style.

-ini_set('zstd.output_compression_dict', dirname(__FILE__) . '/data.dic');
+ini_set('zstd.output_compression_dict', __DIR__ . '/data.dic');
tests/ob_dcz_002.phpt (1)

1-2: Clarify test title

Both 002 and 003 use the same title. Consider making this say “not found available-dictionary” to distinguish from the mismatch case.

tests/ob_dcz_003.phpt (1)

1-2: Clarify test title

Rename to reflect the mismatch case (requested vs actual digest).

---TEST--
-output handler: dcz: invalid available-dictionary
+--TEST--
+output handler: dcz: invalid available-dictionary (mismatch)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b480694 and 43bff60.

📒 Files selected for processing (6)
  • .gitattributes (1 hunks)
  • tests/ob_dcz_001.phpt (1 hunks)
  • tests/ob_dcz_002.phpt (1 hunks)
  • tests/ob_dcz_003.phpt (1 hunks)
  • tests/ob_dcz_004.phpt (1 hunks)
  • zstd.c (10 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: thekid
PR: kjdev/php-ext-zstd#77
File: zstd.stub.php:13-22
Timestamp: 2025-04-20T19:39:56.258Z
Learning: The php-ext-zstd library aims to maintain compatibility with PHP 7, so PHP 8-specific syntax features like union types should be avoided in favor of PHPDoc annotations.
📚 Learning: 2025-04-20T19:39:56.258Z
Learnt from: thekid
PR: kjdev/php-ext-zstd#77
File: zstd.stub.php:13-22
Timestamp: 2025-04-20T19:39:56.258Z
Learning: The php-ext-zstd library aims to maintain compatibility with PHP 7, so PHP 8-specific syntax features like union types should be avoided in favor of PHPDoc annotations.

Applied to files:

  • .gitattributes
  • tests/ob_dcz_003.phpt
  • tests/ob_dcz_004.phpt
  • tests/ob_dcz_002.phpt
  • zstd.c
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: ci (8.1, x86, nts, vs16, false, apcu)
  • GitHub Check: ci (8.1, x86, ts, vs16, true)
  • GitHub Check: ci (8.1, x86, nts, vs16, true)
  • GitHub Check: ci (8.2, x86, ts, vs16, true)
  • GitHub Check: ci (8.2, x86, nts, vs16, true, apcu)
  • GitHub Check: ci (8.1, x64, ts, vs16, false, apcu)
  • GitHub Check: ci (8.1, x64, nts, vs16, true)
  • GitHub Check: ci (8.2, x64, ts, vs16, true)
  • GitHub Check: ci (8.2, x86, nts, vs16, false, apcu)
  • GitHub Check: ci (8.2, x64, ts, vs16, false)
  • GitHub Check: ci (8.2, x64, nts, vs16, true)
  • GitHub Check: ci (8.3, x86, nts, vs16, false, apcu)
  • GitHub Check: ci (8.3, x64, nts, vs16, true, apcu)
  • GitHub Check: ci (8.3-zts, php)
  • GitHub Check: ci (8.3, apcu, php)
  • GitHub Check: ci (8.3, php)
  • GitHub Check: ci (8.4, apcu, php)
🔇 Additional comments (9)
tests/ob_dcz_002.phpt (1)

19-20: LGTM: Expected warning message

Message pattern matches the “no Available-Dictionary header” scenario while requesting dcz.

tests/ob_dcz_003.phpt (1)

21-22: LGTM: Warning message covers digest mismatch

Format string with %s for actual digest is appropriate.

tests/ob_dcz_004.phpt (1)

20-21: External fixture confirmed present
The file tests/files/ob_data.zstd exists in the repository; no further action is required.

zstd.c (6)

31-33: LGTM!

The additional includes are appropriate for the SHA-256 hashing and base64 encoding functionality required for DCZ dictionary verification.


86-86: LGTM!

The 32-byte digest field correctly matches the SHA-256 output size, and proper initialization with memset ensures no uninitialized memory.

Also applies to: 114-114


1261-1262: LGTM!

Well-designed bit flags that allow for tracking multiple encoding types simultaneously.


1281-1284: LGTM!

The encoding detection correctly identifies both "zstd" and "dcz" encodings from the Accept-Encoding header and properly sets the corresponding bit flags.


1376-1380: LGTM!

The output handler correctly:

  • Validates that at least one encoding is enabled before proceeding
  • Properly constructs the 36-byte DCZ header with marker byte, "DCZ" identifier, and dictionary digest
  • Allocates the correct buffer size accounting for the header overhead

Also applies to: 1403-1414


1496-1502: LGTM!

The HTTP response headers are correctly configured:

  • Proper Content-Encoding header based on the active encoding
  • Appropriate Vary headers to ensure correct cache behavior, including Available-Dictionary for DCZ

Also applies to: 1521-1532

@kjdev kjdev force-pushed the experimental/dcz branch 2 times, most recently from fa55acc to 780319a Compare August 8, 2025 03:40
@kjdev kjdev force-pushed the experimental/dcz branch from 780319a to ff9a729 Compare August 8, 2025 04:00
@kjdev
Copy link
Owner Author

kjdev commented Aug 12, 2025

@coderabbitai review

Copy link

coderabbitai bot commented Aug 12, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@kjdev kjdev merged commit ff9a729 into master Aug 12, 2025
98 checks passed
@kjdev kjdev deleted the experimental/dcz branch August 12, 2025 01:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant