Skip to content

Conversation

@whyitfor
Copy link
Contributor

  • I have reviewed the OFRAK contributor guide and attest that this pull request is in accordance with it.
  • I have made or updated a changelog entry for the changes in this pull request.

One sentence summary of this PR (This should go in the CHANGELOG!)

Link to Related Issue(s)

Please describe the changes in your request.

Anyone you think should look at this, specifically?

Comment on lines 32 to 36
Supports all LZ4 frame formats:
- LZ4 default frame (modern format with metadata)
- LZ4 legacy frame (older format for backward compatibility)
- LZ4 skippable frames (metadata containers)
"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Details about type of LZ4 compression need to be captured during unpacking and stored as attributes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, is this information that can be gathered as part of identification?

Comment on lines +73 to +106
class Lz4Identifier(Identifier):
"""
Identify LZ4 compressed data by checking magic bytes.
Recognizes all LZ4 frame types:
- Modern/default frames (0x184D2204)
- Legacy frames (0x184C2102)
- Skippable frames (0x184D2A50-0x184D2A5F)
"""

id = b"Lz4Identifier"
targets = (GenericBinary,)

async def identify(self, resource: Resource, config=None) -> None:
data = await resource.get_data(Range(0, 4))

if len(data) < 4:
return

# Check for modern frame
if data == LZ4_MODERN_MAGIC:
resource.add_tag(Lz4ModernData)
return

# Check for legacy frame
if data == LZ4_LEGACY_MAGIC:
resource.add_tag(Lz4LegacyData)
return

# Check for skippable frames
# Format: 0x5X 0x2A 0x4D 0x18 where X is 0-F
if data[1:4] == b"\x2a\x4d\x18" and 0x50 <= data[0] <= 0x5F:
resource.add_tag(Lz4SkippableData)
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It registers magic mime identifiers, so we don't also need this identifier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually do need these -- without them images are not tagged correctly. The Magic mime ones can probably be removed.

Comment on lines +33 to +70
@dataclass
class Lz4ModernData(Lz4Data):
"""
LZ4 modern frame format (default).
The modern LZ4 frame format includes:
- Frame descriptor with flags
- Optional content size and dictionary ID
- Block independence flags
- Optional checksums (content and block)
- End mark
"""


@dataclass
class Lz4LegacyData(Lz4Data):
"""
LZ4 legacy frame format.
Older LZ4 format predating the frame specification:
- Simpler structure
- No checksums or metadata
- Fixed 8MB max block size
- Deprecated but still encountered in the wild
"""


@dataclass
class Lz4SkippableData(Lz4Data):
"""
LZ4 skippable frame.
Special frame type for embedding metadata or application-specific data:
- Not compressed data
- Contains arbitrary bytes
- LZ4 parsers can safely skip these frames
- Typically used alongside regular frames
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having these other types isn't actually helpful. We just want one tag for regular Lz4Data.

"""

def write_lz4(self, lz4_path: Path):
compressed_data = lz4.frame.compress(self.INITIAL_DATA)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to have static test files instead of doing compression on the fly. That way we're not testing the LZ4 library against itself, but rather testing it against real LZ4 instances found in the wild.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, this was the next step

Co-authored-by: Jacob Strieb <99368685+rbs-jacob@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants