Skip to content

furdarius/gofalsesharing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

False Sharing with Go

False sharing is a common problem in shared memory parallel processing. It occurs when two or more cores hold a copy of the same memory cache line.

If one core writes, the cache line holding the memory line is invalidated on other cores. Even though another core may not be using that data (reading or writing), it may be using another element of data on the same cache line. The second core will need to reload the line before it can access its own data again.

The cache hardware ensures data coherency, but at a potentially high performance cost if false sharing is frequent. A good technique to identify false sharing problems is to catch unexpected sharp increases in last-level cache misses using hardware counters or other performance tools.

Benchmark

Summarize elements of array (size 10^8)

go test -run=XXX -bench=. -cpu=1,2,4,6,8,12,16,24,32,56 -benchtime=10s

Results

BenchmarkSum/Linear            	                     100	 118133518 ns/op
BenchmarkSum/Linear-2          	                     100	 123964604 ns/op
BenchmarkSum/Linear-4          	                     100	 112477528 ns/op
BenchmarkSum/Linear-6          	                     100	 123335032 ns/op
BenchmarkSum/Linear-8          	                     100	 123343898 ns/op
BenchmarkSum/Linear-12          	             100	 110501346 ns/op
BenchmarkSum/Linear-16          	             100	 120919665 ns/op
BenchmarkSum/Linear-24          	             100	 120565232 ns/op
BenchmarkSum/Linear-32          	             100	 116581446 ns/op
BenchmarkSum/Linear-56          	             100	 108527032 ns/op
BenchmarkSum/ParallelFalseSharing            	     100	 231289258 ns/op
BenchmarkSum/ParallelFalseSharing-2          	     100	 117786360 ns/op
BenchmarkSum/ParallelFalseSharing-4          	     200	  64357195 ns/op
BenchmarkSum/ParallelFalseSharing-6          	     300	  47391438 ns/op
BenchmarkSum/ParallelFalseSharing-8          	     500	  37229853 ns/op
BenchmarkSum/ParallelFalseSharing-12         	     500	  27098008 ns/op
BenchmarkSum/ParallelFalseSharing-16         	    1000	  22183358 ns/op
BenchmarkSum/ParallelFalseSharing-24         	    1000	  18418561 ns/op
BenchmarkSum/ParallelFalseSharing-32         	    1000	  16435079 ns/op
BenchmarkSum/ParallelFalseSharing-56         	    1000	  14559299 ns/op
BenchmarkSum/ParallelWithPadding             	     100	 229699936 ns/op
BenchmarkSum/ParallelWithPadding-2           	     100	 118146717 ns/op
BenchmarkSum/ParallelWithPadding-4           	     200	  59917481 ns/op
BenchmarkSum/ParallelWithPadding-6           	     300	  42033348 ns/op
BenchmarkSum/ParallelWithPadding-8           	     500	  30706079 ns/op
BenchmarkSum/ParallelWithPadding-12          	    1000	  21592191 ns/op
BenchmarkSum/ParallelWithPadding-16          	    1000	  17484888 ns/op
BenchmarkSum/ParallelWithPadding-24          	    1000	  13178152 ns/op
BenchmarkSum/ParallelWithPadding-32          	    2000	   9742292 ns/op
BenchmarkSum/ParallelWithPadding-56          	    2000	   9075207 ns/op
BenchmarkSum/ParallelLocalVariable            	     300	  47909122 ns/op
BenchmarkSum/ParallelLocalVariable-2          	     500	  24776753 ns/op
BenchmarkSum/ParallelLocalVariable-4          	    1000	  12702117 ns/op
BenchmarkSum/ParallelLocalVariable-6          	    2000	   8882727 ns/op
BenchmarkSum/ParallelLocalVariable-8          	    2000	   6479442 ns/op
BenchmarkSum/ParallelLocalVariable-12         	    3000	   4589380 ns/op
BenchmarkSum/ParallelLocalVariable-16         	    5000	   3779905 ns/op
BenchmarkSum/ParallelLocalVariable-24         	    5000	   3476589 ns/op
BenchmarkSum/ParallelLocalVariable-32         	    5000	   3301371 ns/op
BenchmarkSum/ParallelLocalVariable-56         	    5000	   2962546 ns/op

Go version and CPU

$ go version
go version go1.12 linux/amd64

$ lscpu 
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                56
On-line CPU(s) list:   0-55
Thread(s) per core:    2
Core(s) per socket:    14
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz
Stepping:              2
CPU MHz:               2600.000
CPU max MHz:           2600.0000
CPU min MHz:           1200.0000
BogoMIPS:              5193.50
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              35840K
NUMA node0 CPU(s):     0-13,28-41
NUMA node1 CPU(s):     14-27,42-55
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb invpcid_single kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts

False Sharing detection

Using linux perf perf c2c

Setup max sample rate

# echo 100000 > /proc/sys/kernel/perf_event_max_sample_rate

Record BenchmarkSum/ParallelFalseSharing

# perf c2c record -F 60000 -a --all-user go test -run=XXX -bench=BenchmarkSum/ParallelFalseSharing -cpu=4 -benchtime=5s
# perf c2c report -NN --stdio

=================================================
            Trace Event Information              
=================================================
  Total records                     :     591181
  Locked Load/Store Operations      :      56871
  Load Operations                   :     260941
  Loads - uncacheable               :          0
  Loads - IO                        :          0
  Loads - Miss                      :        452
  Loads - no mapping                :        718
  Load Fill Buffer Hit              :      65782
  Load L1D hit                      :     188522
  Load L2D hit                      :       1363
  Load LLC hit                      :       2167
  Load Local HITM                   :         43
  Load Remote HITM                  :          0
  Load Remote HIT                   :          0
  Load Local DRAM                   :       1937
  Load Remote DRAM                  :          0
  Load MESI State Exclusive         :       1937
  Load MESI State Shared            :          0
  Load LLC Misses                   :       1937
  LLC Misses to Local DRAM          :      100.0%
  LLC Misses to Remote DRAM         :        0.0%
  LLC Misses to Remote cache (HIT)  :        0.0%
  LLC Misses to Remote cache (HITM) :        0.0%
  Store Operations                  :     330240
  Store - uncacheable               :          0
  Store - no mapping                :        187
  Store L1D Hit                     :     315973
  Store L1D Miss                    :      14080
  No Page Map Rejects               :        688
  Unable to parse data source       :          0

=================================================
    Global Shared Cache Line Event Information   
=================================================
  Total Shared Cache Lines          :         35
  Load HITs on shared lines         :      34424
  Fill Buffer Hits on shared lines  :      10786
  L1D hits on shared lines          :      23583
  L2D hits on shared lines          :          2
  LLC hits on shared lines          :         49
  Locked Access on shared lines     :         63
  Store HITs on shared lines        :      38356
  Store L1D hits on shared lines    :      35934
  Total Merged records              :      38399

=================================================

Reference

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published