-
Notifications
You must be signed in to change notification settings - Fork 6
Closed
Description
- C++, 16 threads: ~2.2s -> ~2.1s, ~4.76% faster
- C++, 4 threads: ~4.4s -> ~3.8s, ~15.8% faster
- C++, 1 thread: ~15.55s -> ~13.2s, ~17.8% faster
There are a few potential bottlenecks affecting scaling to more threads:
- disk access
- more competition for the
strtok_mtx
mutex - execution units, because hyperthreading
- edit: can't forget memory bandwidth & cache.
# 2 is the most interesting here. I can remove the need for a mutex by using strcspn
instead of strtrok
, with no performance hit.
The effect may not even be noticeable, but it's still worth checking out.
A new lab server with 1 or 2 Xeon E5-2420s would make an interesting testbed:
- decreased disk bottleneck
- RAID array?
- solid state drive?
- increased processing bottleneck
- less CPU speed
less memory bandwidthNo, wait...less cacheI'd wanna keep this out of the equation...
- with 2 CPUs, we can see how it scales up to 12 or 24 threads
WRT # 3, I can already take a closer look on lab2 at how it scales from 4->8 threads and from 8->16 threads.
Also, how did things change between 8b2c5cd & f12c775?
- The outer loop, splitting .list files into lines, was converted to use
strcspn
instead ofstrtok
. No noticeable speed change, but only tested on BiggaTomato (4 cores) IIRC. - The more intensive inner loop, splitting lines into fields, was left alone. This could potentially have more effect.