serialize function (serialize()) struggling with larger data/unoptimized for it #2135
Description
I've been attempting to create a graph with multiple millions of triples and serialize into a turtle file. However, the time it takes to serialize is very strange when attempting to utilize .serialize(destination = "file.ttl"). As an example, one file (which is 28.4 mb and has approx 450k lines/triples) took approximately 3.5 mins, however, something with a file 5x as large (142mb and 2.2 million lines/triples) took 86.5 mins. This was tested using basic python's time.time() function for timing before and after execution.
I would expect at the very least serialization would scale linearly, however it does not seem to at larger serializations. Additionally, it suggests to me there may be a problem with I/O calls. Perhaps this is not optimised or be making too many calls. In theory, serialization should only be limited by hardware. Any solutions/potential fixes would be great.
As a reference, here is running the serialize code with the 28.4 mb file on a profiler (cProfile)
2151663231 function calls (2151492056 primitive calls) in 493.452 seconds
Ordered by: cumulative time
List reduced from 332 to 15 due to restriction <15>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.001 0.001 493.452 493.452 graph.py:1137(serialize)
1 0.067 0.067 493.427 493.427 turtle.py:226(serialize)
1844631 2.419 0.000 479.026 0.000 turtle.py:272(getQName)
1659054 0.781 0.000 473.365 0.000 graph.py:1047(compute_qname)
1659054 2.130 0.000 472.393 0.000 init.py:489(compute_qname)
119998 126.341 0.001 440.080 0.004 init.py:842(get_longest_namespace)
1 0.197 0.197 383.829 383.829 turtle.py:100(preprocess)
400029 1.088 0.000 382.030 0.001 turtle.py:257(preprocessTriple)
955226858 239.332 0.000 314.192 0.000 term.py:223(startswith)
58983 0.051 0.000 108.685 0.002 turtle.py:318(statement)
58983 0.101 0.000 108.576 0.002 turtle.py:322(s_default)
838640/776999 0.382 0.000 104.461 0.000 turtle.py:337(path)
828239 0.507 0.000 103.381 0.000 turtle.py:344(p_default)
828239 1.219 0.000 101.938 0.000 turtle.py:350(label)
1005472237 79.328 0.000 79.328 0.000 {method 'startswith' of 'str' objects}
Cheers,
Peter