Skip to content

Incomplete functionality #3

@snake-biscuits

Description

@snake-biscuits

I was looking at using this library for converting some python objects that use the struct module into C++ definitions

Unfortunately I found that this repo only handles a small subset of valid struct format strings

struct allows spaces in format strings, as well as integers preceding an identifier.
"I 128s 2h" is a valid format for the struct module, but not for this repo

Instead it fails on the first space, since it isn't listed as a valid character, rather than skipping it
The same for any integer.

I've made my own implementation to handle this, rather than a fork, since I also have to handle my custom objects
(The binary data is represented in nested structures, so my definitions also have to take that into account)

One common special case to look out for is strings:
With struct both "5c" & "5s" represent the equivalent of char[5] in C
However, the objects returned are different:

>>> import struct
>>> struct.unpack("5c", b"\x00\x01\x02\x03\x04")
(0, 1, 2, 3, 4)
>>> struct.unpack("5s", b"words")
b"words"

This is an important restriction, in writing my own function to standardise the format strings,
I had to include a case for strings, to ensure I recognised it as the single object struct would expect

Unfortunately my solution is a little clunky, but I'll share it here if you think you'd find it useful:

def simplify(_format):
	"""Converts a format string for the struct module into a list of type chars"""
    _format = _format.replace(" ", "")
    if _format[-1].isnumeric():
        raise RuntimeError("_format is invalid (ends in a number)")
    out, i = list(), 0
    while i < len(_format):
        char = _format[i]
        if char.isnumeric(): # find the whole number
            j = 1
            while _format[i:i+j].isnumeric():
                j += 1
            j -= 1
            count = int(_format[i:i + j])
            i += j
            f = _format[i]
            # NOTE: f is only assumed to be valid, validity is not checked here
            if f == "s":  # string
                out.append(f"{count}s")
            else:
                out.extend([f] * count)  # "2I" -> ["I", "I"]
        elif char.isalpha():  # lazily collect type
            out.append(_format[i])
        elif char == "?":  # bool is not an alphabetical character, but is still valid
            out.append("?")
        else:
            raise RuntimeError(f"Invalid character '{char}' in _format") 
        i += 1
    return out

This could probably be tidier but in my testing it seems to work with the format strings I'm using
(no pointer types, padding, size_t or 0 length types)

Using this with the dictionary lookups in struct_parse should handle both spaces and types preceeded by counts
Strings are passed through as "128s" etc., so a special case would need to be added
Presumably noting the type as string and also the size

I hope this is useful in expanding this repo to support all valid format strings the struct module accepts

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions