Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DEV] Actions, Message Digests, Documentation Updates #17

Merged
merged 17 commits into from
Dec 30, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Split 'basics' tutorial
---

+ FIxes compile errors with layer.c
  • Loading branch information
MatrixEditor committed Dec 30, 2024
commit db031b412e658151a455929605525d3d117bb310
24 changes: 23 additions & 1 deletion docs/sphinx/source/library/fields/common.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ Numeric Structs
.. autoclass:: caterpillar.py.PyStructFormattedField
:members:

.. versionchanged:: 2.4.0
:code:`FormatField` renamed to :code:`PyStructFormattedField`

.. autoattribute:: caterpillar.py.uint8

.. autoattribute:: caterpillar.py.int8
Expand Down Expand Up @@ -61,6 +64,9 @@ Bytes, Strings
.. autoclass:: caterpillar.py.Memory
:members:

.. versionchanged:: 2.4.0
Removed :code:`encoding` argument

.. autoclass:: caterpillar.py.Bytes
:members:

Expand All @@ -70,6 +76,9 @@ Bytes, Strings
.. autoclass:: caterpillar.py.Prefixed
:members:

.. versionadded:: 2.4.0
Added support for arbitrary structs

.. autoclass:: caterpillar.py.CString
:members:

Expand All @@ -91,8 +100,12 @@ Special Structs
.. autoclass:: caterpillar.py.Aligned
:members:

.. versionadded:: 2.4.0

.. autofunction:: caterpillar.py.align

.. versionadded:: 2.4.0

.. autoclass:: caterpillar.py.Computed
:members:

Expand All @@ -103,4 +116,13 @@ Special Structs
:members:

.. autoclass:: caterpillar.py.Const
:members:
:members:

.. autoclass:: caterpillar.py.Lazy
:members:

.. autoclass:: caterpillar.py.Uuid
:members:

.. versionchanged:: 2.4.0
:code:`uuid` renamed to :code:`Uuid`
97 changes: 97 additions & 0 deletions docs/sphinx/source/tutorial/basics/bytes.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
.. _tutorial-basics-bytes:

**************
Byte Sequences
**************

When working with binary data, sometimes you need to deal with raw byte
sequences. *Caterpillar* provides several structs to handle these byte
sequences efficiently, whether they are stored in memory, byte arrays,
or prefixed with length information.

Memory
~~~~~~

The :class:`~caterpillar.py.Memory` struct is ideal when you need to handle
data that can be wrapped by a :code:`memoryview`. It allows you to define
fields with a specified size (static or dynamic) and is especially useful
for printing out unpacked objects in a readable way.


.. tab-set::

.. tab-item:: Python

>>> m = F(Memory(5)) # static size; dynamic size is allowed too
>>> pack(bytes([i for i in range(5)], m))
b'\x00\x01\x02\x03\x04'
>>> unpack(m, _)
<memory at 0x00000204FDFA4411>

Bytes
~~~~~

If you need direct access to byte sequences, the :class:`~caterpillar.py.Bytes`
struct is the solution. This struct converts a :code:`memoryview` to :code:`bytes`
for easy manipulation. You can define fields with static, dynamic, or greedy
sizes based on your needs.

.. tab-set::

.. tab-item:: Python

>>> bytes_obj = Bytes(5) # static, dynamic and greedy size allowed


.. tab-item:: Caterpillar C

>>> b = octetstring(5) # static, dynamic size allowed

Let's implement a struct for the `fDAT <https://www.w3.org/TR/png/#fdAT-chunk>`_ chunk
of the PNG format, which stores frame data. In this case, we use the :code:`Memory`
struct to handle the frame data.

.. tab-set::

.. tab-item:: Python

.. code-block:: python
:caption: Implementation for the frame data chunk

@struct(order=BigEndian) # <-- endianess as usual
class FDATChunk:
sequence_number: uint32
# We rather use a memory instance here instead of Bytes()
frame_data: Memory(parent.length - 4)

.. tab-item:: Caterpillar C

.. code-block:: python
:caption: Implementation for the frame data chunk

parent = ContextPath("parent.obj")

@struct(endian=BIG_ENDIAN)
class FDATChunk:
sequence_number: u32
frame_data: octetstring(parent.length - 4)

.. admonition:: Challenge

If you feel ready for a more advanced structure, try implementing the
`zTXt <https://www.w3.org/TR/png/#11zTXt>`_ chunk for compressed textual data.

.. dropdown:: Solution

Python API only:

.. code-block:: python
:caption: Sample implementation of the *zTXt* chunk

@struct # <-- actually, we don't need a specific byteorder
class ZTXTChunk:
keyword: CString(...) # <-- variable length
compression_method: uint8
# Okay, we haven't introduced this struct yet, but Memory() or Bytes()
# would heve been okay, too.
text: ZLibCompressed(parent.length - lenof(this.keyword) - 1)
58 changes: 58 additions & 0 deletions docs/sphinx/source/tutorial/basics/context.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
.. _tutorial-basics_context:

*******
Context
*******

In *Caterpillar*, the context is a special feature that keeps track of the current
packing or unpacking process. The context allows you to reference the current object
being packed or parsed using the special variable :code:`this`. It also provides
access to the parent object, if applicable, through the variable :code:`parent`.

The context is useful when defining structs that depend on other fields' values.
For example, you can reference the length of a string field and use it to define
the length of another field dynamically.

.. tab-set::

.. tab-item:: Python

.. code-block:: python
:caption: Understanding the *context*

@struct
class Format:
length: uint8
foo: CString(this.length) # <-- just reference the length field

.. tab-item:: Caterpillar C

.. code-block:: python
:caption: Understanding the *context*

this = ContextPath("obj")

@struct
class Format:
length: u8
foo: cstring(this.length)


Runtime Length of Objects
~~~~~~~~~~~~~~~~~~~~~~~~~

In certain cases, you may need to retrieve the runtime length of a variable within
the context of the current object. The special class :code:`lenof` provides this
functionality. It applies the :code:`len()` function to the object you're referencing
and returns the length.

Context Paths
~~~~~~~~~~~~~

You can use context paths to access elements of a sequence or nested structures. For
example, if you have a field :code:`foobar` that is a sequence, you can access its
elements like this:

>>> path = this.foobar[0] # Access elements of a sequence within the current context
>>> path(context)
...
35 changes: 35 additions & 0 deletions docs/sphinx/source/tutorial/basics/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
.. _turotial-basics:

**************
Basic Concepts
**************

In this section, we will introduce some fundamental techniques commonly used when working
with binary file formats. These concepts lay the foundation for understanding more advanced
topics covered in the next chapter.

.. note::
To simplify field definitions in your structs, we can use shortcuts. For instance, instead
of manually specifying field types, we can leverage the :code:`F` function from the
:code:`caterpillar.shortcuts` module to create fields. However, if your don't want to
wrap everything within a :class:`~caterpillar.py.Field`, you can use the :code:`as_field`
option when packing or unpacking.

>>> from caterpillar.shortcuts import F
>>> field = F(uint8)

or wrap the struct directly

>>> pack(0xFF, uint8, as_field=True)


.. toctree::
:maxdepth: 2
:caption: Basic Concepts

stdtypes
string
bytes
padding
context
other
83 changes: 83 additions & 0 deletions docs/sphinx/source/tutorial/basics/other.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
.. _tutorial-basic_other:

=================
Common Structures
=================

In addition to the basic field types we've already covered, *Caterpillar* offers more
advanced struct types for handling complex data structures. These types can simplify
parsing and packing operations, especially when dealing with constants, compression,
or specialized data handling.

Constants
---------

In many binary formats, constants or "magic bytes" are used to identify the start
of a file or data stream. *Caterpillar* allows you to define and automatically
validate these constants against the parsed data, saving you from manually adding
them in every time.

For instance, a PNG file starts with a known sequence of magic bytes:
:code:`\x89PNG\x0D\x0A\x1A\x0A`. You can define these constants directly in your
struct like so:

.. code-block:: python
:caption: Starting the *main* PNG struct

@struct(order=BigEndian) # <-- will be relevant later on
class PNG:
magic: b"\x89PNG\x0D\x0A\x1A\x0A"
# other fields will be defined at the end of this tutorial.


For raw constant values, *Caterpillar* provides the :class:`~caterpillar.py.Const`
struct, which allows you to define constant values that need to be packed or
unpacked.

>>> const = Const(0xbeef, uint32)

Compression
-----------

*Caterpillar* also supports common compression formats such as `zlib`, `lzma`, `bz2`,
and, if the library is installed, `lzo`. This allows you to handle compressed data
within your struct definitions easily.

>>> compressed = ZLibCompressed(100) # length or struct here applicable

Specials
--------

There are several special structs for handling more advanced or less common scenarios.

Computed
~~~~~~~~

The `Computed` struct allows you to define a runtime computed variable that doesn't
actually pack any data. While you could use a :code:`@property` or method to represent
this, :code:`Computed` is useful when you need to calculate a value during the packing
or unpacking process.

You might want to compute the real gamma value for a PNG chunk, based on another field
in the struct:

.. code-block:: python
:caption: Example implementation of the *gAMA* chunk

@struct(order=BigEndian) # <-- same as usual
class GAMAChunk:
gamma: uint32
gamma_value: Computed(this.gamma / 100000)

Pass
~~~~

The :code:`Pass` struct is used when no action should be taken during the packing or unpacking
process. It doesn't affect the data stream in any way.

You can use `Pass` when you simply need to skip over certain parts of the data without modifying them:

>>> @struct
... class Format:
... foo: Pass # This won't affect the stream and will store None
...
22 changes: 22 additions & 0 deletions docs/sphinx/source/tutorial/basics/padding.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
.. _tutorial-basics_padding:

*******
Padding
*******

.. attention::
This section is subject to change if :code:`Padding` is implemented.


In binary file formats, padding is often used to align data to certain byte
boundaries. *Caterpillar* provides a way to handle padding within structs.
However, it is important to note that *caterpillar* doesn't store any data
associated with the padding itself unless explicitly defined. If you need
to retain or manipulate the padding content, you can use the :code:`Bytes` or
:code:`Memory` field types.

If you want to apply padding to a struct, you can simply specify the padding
length using the `padding` keyword. This is useful when you need to ensure
that certain fields are aligned or when the structure requires reserved spaces.

>>> field = padding[10] # greedy or dynamic size
Loading
Loading