Memory Layout of CompMatr in v4 #540
-
Is the internal cpu buffer |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
CurrentlyCurrently no. While the GPU memory is flat, the CPU memory is allocated as non-contiguous mallocs here: qcomp** cpu_allocMatrix(qindex dim) {
// TODO:
// the design of storing the CPU matrix elements as a 2D structure will impede
// performance for many qubits; the allocated heap memories for each row
// have no gaurantee to reside near other, so that their access/iteration in
// hot loops may incur unnecessary caching penalties. Consider storing the
// elements as a flat array, like we do for the GPU memory. This makes manual
// modification by the user trivially harder (changing [r][c] to [r*n+c]),
// but should improve caching, and significantly simplify allocation and its
// validation; no more enumerating nested pointers! Benchmark this scenario.
// allocate outer array
qcomp** rows = (qcomp**) malloc(dim * sizeof *rows); // nullptr if failed
// if that did not fail, allocate each inner array
if (rows != nullptr)
for (qindex r=0; r<dim; r++)
rows[r] = cpu_allocArray(dim); // nullptr if failed
// caller will validate whether mallocs were successful
return rows;
} This was done mostly out of uncertainty about the intended user interface to an existing CompMatr m = createCompMatr(4);
m.cpuElems[0][1] = 3;
m.cpuElems[10][10] = 5;
syncCompMatr(); ProblemThe previous use-case is affected by the underlying matrix format. Flattening m.cpuElems[0*(1<<4)+1] = 3;
m.cpuElems[10*(1<<4)+10] = 5;
syncCompMatr(); We cannot simply provide a bespoke function to do the algebra for them, like... setCompMatrrElem(m, 10, 10, 5); since such a function would necessarily sync to GPU-memory to be consistent with the postcondition of the other API setters. And that would involve many gratuitous syncs when the user updates elements iteratively (which is slow even when we only sync each set amplitude). Solution?We could make a new additional field That way, users can still modify The main apparent drawback - that we're needlessly maintaining another heap array I'm persuaded to implement this right away, giving
|
Beta Was this translation helpful? Give feedback.
Currently
Currently no. While the GPU memory is flat, the CPU memory is allocated as non-contiguous mallocs here: