Closed

Description
iotop and a simple-mined c program indicates we're nowhere
near IO-bound in df.to_csv, at about ~10-15x.
It might be possible to speed things up considerably with a fast path
for special cases (numerical only) that don't need fancy quoting and other
bells and whistles provided by the underlying csv python module.
#include <stdio.h>
#include <stdlib.h>
int main(int argc,char **argv)
{
int i;
FILE *f;
char fmt[] = "%f,%f,%f,%f,%f\n";
while (1) {
f = fopen("out.csv","wb");
for(i=0;i<1000000;i++) {
fprintf(f,fmt, 1.0,2.0,3.0,4.0,5.0);
}
fclose(f);
}
}
sustains about 30MB/s on my machine (without even batching writes)
vs ~2-3MB/s for the new (0.11.0) cython df.to_csv().
need to check if it's the stringifying, quoting logic, memory layout, or something
else that constitutes the difference.
Should also yield insights for any future binary serialization format
implemented.