Add MVP LZ4 component #661

whyitfor · 2025-10-23T02:33:11Z

I have reviewed the OFRAK contributor guide and attest that this pull request is in accordance with it.
I have made or updated a changelog entry for the changes in this pull request.

One sentence summary of this PR (This should go in the CHANGELOG!)

Link to Related Issue(s)

Please describe the changes in your request.

Anyone you think should look at this, specifically?

whyitfor · 2025-10-23T02:39:20Z

ofrak_core/src/ofrak/core/lz4.py

+    Supports all LZ4 frame formats:
+    - LZ4 default frame (modern format with metadata)
+    - LZ4 legacy frame (older format for backward compatibility)
+    - LZ4 skippable frames (metadata containers)
+    """


Details about type of LZ4 compression need to be captured during unpacking and stored as attributes.

Alternatively, is this information that can be gathered as part of identification?

ofrak_core/src/ofrak/core/lz4.py

rbs-jacob · 2025-10-23T14:08:13Z

ofrak_core/src/ofrak/core/lz4.py

+class Lz4Identifier(Identifier):
+    """
+    Identify LZ4 compressed data by checking magic bytes.
+
+    Recognizes all LZ4 frame types:
+    - Modern/default frames (0x184D2204)
+    - Legacy frames (0x184C2102)
+    - Skippable frames (0x184D2A50-0x184D2A5F)
+    """
+
+    id = b"Lz4Identifier"
+    targets = (GenericBinary,)
+
+    async def identify(self, resource: Resource, config=None) -> None:
+        data = await resource.get_data(Range(0, 4))
+
+        if len(data) < 4:
+            return
+
+        # Check for modern frame
+        if data == LZ4_MODERN_MAGIC:
+            resource.add_tag(Lz4ModernData)
+            return
+
+        # Check for legacy frame
+        if data == LZ4_LEGACY_MAGIC:
+            resource.add_tag(Lz4LegacyData)
+            return
+
+        # Check for skippable frames
+        # Format: 0x5X 0x2A 0x4D 0x18 where X is 0-F
+        if data[1:4] == b"\x2a\x4d\x18" and 0x50 <= data[0] <= 0x5F:
+            resource.add_tag(Lz4SkippableData)
+            return


It registers magic mime identifiers, so we don't also need this identifier.

We actually do need these -- without them images are not tagged correctly. The Magic mime ones can probably be removed.

rbs-jacob · 2025-10-23T14:09:08Z

ofrak_core/src/ofrak/core/lz4.py

+@dataclass
+class Lz4ModernData(Lz4Data):
+    """
+    LZ4 modern frame format (default).
+
+    The modern LZ4 frame format includes:
+    - Frame descriptor with flags
+    - Optional content size and dictionary ID
+    - Block independence flags
+    - Optional checksums (content and block)
+    - End mark
+    """
+
+
+@dataclass
+class Lz4LegacyData(Lz4Data):
+    """
+    LZ4 legacy frame format.
+
+    Older LZ4 format predating the frame specification:
+    - Simpler structure
+    - No checksums or metadata
+    - Fixed 8MB max block size
+    - Deprecated but still encountered in the wild
+    """
+
+
+@dataclass
+class Lz4SkippableData(Lz4Data):
+    """
+    LZ4 skippable frame.
+
+    Special frame type for embedding metadata or application-specific data:
+    - Not compressed data
+    - Contains arbitrary bytes
+    - LZ4 parsers can safely skip these frames
+    - Typically used alongside regular frames
+    """


Having these other types isn't actually helpful. We just want one tag for regular Lz4Data.

rbs-jacob · 2025-10-23T14:10:56Z

ofrak_core/tests/components/test_lz4_component.py

+    """
+
+    def write_lz4(self, lz4_path: Path):
+        compressed_data = lz4.frame.compress(self.INITIAL_DATA)


We probably want to have static test files instead of doing compression on the fly. That way we're not testing the LZ4 library against itself, but rather testing it against real LZ4 instances found in the wild.

Yup, this was the next step

ofrak_core/CHANGELOG.md

ofrak_core/requirements.txt

Co-authored-by: Jacob Strieb <99368685+rbs-jacob@users.noreply.github.com>

Add MVP LZ4 component

ba46b60

whyitfor commented Oct 23, 2025

View reviewed changes

ofrak_core/src/ofrak/core/lz4.py Show resolved Hide resolved

Update lz4 components

70fed98

rbs-jacob requested changes Oct 23, 2025

View reviewed changes

Apply suggestions from code review

c371c7d

Co-authored-by: Jacob Strieb <99368685+rbs-jacob@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add MVP LZ4 component #661

Add MVP LZ4 component #661

Uh oh!

whyitfor commented Oct 23, 2025

Uh oh!

whyitfor Oct 23, 2025

Uh oh!

whyitfor Oct 23, 2025

Uh oh!

Uh oh!

rbs-jacob Oct 23, 2025

Uh oh!

whyitfor Oct 26, 2025

Uh oh!

rbs-jacob Oct 23, 2025

Uh oh!

rbs-jacob Oct 23, 2025

Uh oh!

whyitfor Oct 23, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Add MVP LZ4 component #661

Are you sure you want to change the base?

Add MVP LZ4 component #661

Uh oh!

Conversation

whyitfor commented Oct 23, 2025

Uh oh!

whyitfor Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

whyitfor Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rbs-jacob Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

whyitfor Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

rbs-jacob Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

rbs-jacob Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

whyitfor Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants