You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
8x threads results in only a ~2x speedup for containment; 1.1x speedup for ray intersections.
Off the top of my head, there's probably some combination of 3 sources:
The python-rust bridge. This would manifest as the query spending a significant portion of its time thrashing a single CPU at the start and end of the query, and possibly the multithreading gains getting worse with the number of queries, but improving with the number of rays cast for each containment query. It would be improved by Use rust-numpy for data IO #1
The constant startup cost of rayon. This would manifest as the multithreading gains improving with the number of queries. Unavoidable, unless there's a lighter runtime available (smol?). Could automatically switch to single-threaded for small number of queries when threads=True?
The N = number of queries cost of the work-stealing job scheduler, if the ray casts are very cheap. Would be improved by chunking the queries so that N = number of chunks.
Some combination of 2 and 3 are certainly certainly already a problem: benchmarked containment checks are ~2.5x faster on 0 threads than on 1.
The text was updated successfully, but these errors were encountered:
The fact that containment checks (multiple ray casts per task) get a better speedup than ray intersections (1 ray cast per task) implies that 3 is definitely a factor.
Gains are still not great with #28 , which eliminates the python-rust bridge (although I think there's still a copy involved within the rust side) and at least some of rayon's startup cost (because it uses the global thread pool rather than building a new one every query). So I guess it's the job scheduler sapping our gains.
However, reorganising to use chunks would be a massive faff, unlikely to fix this any time soon.
8x threads results in only a ~2x speedup for containment; 1.1x speedup for ray intersections.
Off the top of my head, there's probably some combination of 3 sources:
threads=True
?N = number of queries
cost of the work-stealing job scheduler, if the ray casts are very cheap. Would be improved by chunking the queries so thatN = number of chunks
.Some combination of 2 and 3 are certainly certainly already a problem: benchmarked containment checks are ~2.5x faster on 0 threads than on 1.
The text was updated successfully, but these errors were encountered: