Summary
When processing large data in chunks, launch_amplitude_encode() is called with state_len (full state vector size) instead of chunk_len (actual chunk size). This causes the kernel to write state_len elements starting from the chunk offset, which exceeds bounds for chunks after the first one.