Skip to content

TravelerList ctor: strtok_mtx #274

@yakra

Description

@yakra

#272 (comment):

  • C++, 16 threads: ~2.2s -> ~2.1s, ~4.76% faster
  • C++, 4 threads: ~4.4s -> ~3.8s, ~15.8% faster
  • C++, 1 thread: ~15.55s -> ~13.2s, ~17.8% faster

There are a few potential bottlenecks affecting scaling to more threads:

  1. disk access
  2. more competition for the strtok_mtx mutex
  3. execution units, because hyperthreading
  4. edit: can't forget memory bandwidth & cache.

# 2 is the most interesting here. I can remove the need for a mutex by using strcspn instead of strtrok, with no performance hit.
The effect may not even be noticeable, but it's still worth checking out.

A new lab server with 1 or 2 Xeon E5-2420s would make an interesting testbed:

  • decreased disk bottleneck
    • RAID array?
    • solid state drive?
  • increased processing bottleneck
    • less CPU speed
    • less memory bandwidth No, wait...
    • less cache I'd wanna keep this out of the equation...
  • with 2 CPUs, we can see how it scales up to 12 or 24 threads

WRT # 3, I can already take a closer look on lab2 at how it scales from 4->8 threads and from 8->16 threads.

Also, how did things change between 8b2c5cd & f12c775?

  • The outer loop, splitting .list files into lines, was converted to use strcspn instead of strtok. No noticeable speed change, but only tested on BiggaTomato (4 cores) IIRC.
  • The more intensive inner loop, splitting lines into fields, was left alone. This could potentially have more effect.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions