-
Hi, I'm trying to parse an image file, but the question is general. So far I am opening the file and reading the first four bytes as: with open(test_image, "r") as f:
file_contents = f.read_bytes(4)
What would the Mojonic (Mojical?) way be to interpret this data as an unsigned 32 bit integer instead? Should I even be using this list, or should I rather take a step back and be dealing with the |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 1 reply
-
For reference, this seems to do the trick, but there must be a better way: fn chunk_to_int32(chunk: List[SIMD[DType.int8, 1]]) raises -> UInt32:
if len(chunk) != 4:
raise Error("Chunk must be 4 bytes long")
var chunk_sum: UInt32 = 0
chunk_sum += chunk[3].cast[DType.uint8]().cast[DType.uint32]()
chunk_sum += chunk[2].cast[DType.uint8]().cast[DType.uint32]() << 8
chunk_sum += chunk[1].cast[DType.uint8]().cast[DType.uint32]() << 16
chunk_sum += chunk[0].cast[DType.uint8]().cast[DType.uint32]() << 24
return chunk_sum So that: var numbers = List[SIMD[DType.int8, 1]](0xff, 0xff, 0xff, 0xff)
print(chunk_to_int32(numbers))
4294967295 Also, this assumes Is there an example of low-level byte magic in Mojo somewhere? |
Beta Was this translation helpful? Give feedback.
-
I think bitcasting the pointer is the normal way to do this. This is unlikely to remain the "recommended" way for long as special handling for bytes is likely to be added. See this or this A code example: from testing import assert_true
from collections import List
fn bytes_to_uint32(owned list: List[Int8]) raises -> List[UInt32]:
assert_true(len(list) % 4 == 0, "List[Int8] length must be a multiple of 4 to convert to List[Int32]")
var result_length = len(list) // 4
# get the data pointer with ownership.
# This avoids copying and makes sure only one List owns a pointer to the underlying address.
var ptr_to_int8 = list.steal_data()
var ptr_to_uint32 = ptr_to_int8.bitcast[UInt32]()
# with nightly branch of Mojo stdlib we just call:
# return List[UInt32](ptr_to_uint32, result_length, result_length)
var result = List[UInt32]()
result.data = ptr_to_uint32
result.capacity = result_length
result.size = result_length
return result
fn main() raises:
var bytes = List[Int8](
0x0, 0x0, 0x0, 0x0,
0x1, 0x0, 0x0, 0x0,
0xff, 0x0, 0x0, 0x0,
0x0, 0x1, 0x0, 0x0,
0x0, 0xff, 0x0, 0x0,
0x0, 0x0, 0x1, 0x0,
0x0, 0x0, 0xff, 0x0,
0x0, 0x0, 0x0, 0x1,
0x0, 0x0, 0x0, 0xff,
0xff, 0xff, 0xff, 0xff
)
var uint32 = bytes_to_uint32(bytes)
for i in uint32:
print(i[]) |
Beta Was this translation helpful? Give feedback.
-
Hi, Thanks for the response @mikowals ! Your solution also doesn't work as I would (naively?) expect: var numbers_128 = List[SIMD[DType.int8, 1]](0x0, 0x0, 0x0, 0x80)
var numbers_255 = List[SIMD[DType.int8, 1]](0x0, 0x0, 0x0, 0xff)
var numbers_65535 = List[SIMD[DType.int8, 1]](0x0, 0x0, 0xff, 0xff)
# My initial take:
print(chunk_to_int32(numbers_128))
print(chunk_to_int32(numbers_255))
print(chunk_to_int32(numbers_65535))
128
255
65535
# Your solution:
print(bytes_to_uint32(numbers_128)[0])
print(bytes_to_uint32(numbers_255)[0])
print(bytes_to_uint32(numbers_65535)[0])
2147483648
4278190080
4294901760 However, I realized this might be an endian thing, so with a little byteswapping: from math.bit import bswap
print(bswap(bytes_to_uint32(numbers_128)[0]))
print(bswap(bytes_to_uint32(numbers_255)[0]))
print(bswap(bytes_to_uint32(numbers_65535)[0]))
128
255
65535 Which is what I would expect (and what parses my image as expected). Any idea why this would be? I'll admit I've only been working in Python for the last 2.5 years, so I feel like I'm having to re-learn manual memory management, so my expectations might be wrong here. |
Beta Was this translation helpful? Give feedback.
-
Yes, you wanted big endian but from your question above I thought you wanted little endian bytes to UInt32. So my implementation is little endian. Using In big endian the earliest bytes contribute the biggest values, so |
Beta Was this translation helpful? Give feedback.
-
If your data is big enough to be worthwhile it would also be easy to add another function wrapper around |
Beta Was this translation helpful? Give feedback.
I think bitcasting the pointer is the normal way to do this. This is unlikely to remain the "recommended" way for long as special handling for bytes is likely to be added. See this or this
A code example: