Skip to content

Latest commit

 

History

History
131 lines (100 loc) · 6.11 KB

hardware.md

File metadata and controls

131 lines (100 loc) · 6.11 KB

Hardware

Instructions

Division operations are expensive (up to 92 cycles on 64bit x86) 1

Instructions Retired

https://software.intel.com/en-us/forums/intel-vtune-amplifier-xe/topic/311170 Ignores branch misspredictions, when stalled you are not retiring instructions, aim to maximize when reducing cache misses

Stack

Scheduling

Context Switching

Memory

See: Caching In: Understand, Measure, and Use Your CPU Cache More Effectively

  • TLAB (page table cache)
    • multi level (TLB for L1, L2, L3)
    • measurable via CPU perf counters (miss/hit rate, walk duration - how long lookup takes)
    • on older hardware flushed on process context switch
    • on newer hardware ASI (Address Space Identifier) is added to not flush all TLBs
  • Page Table (virtual to physical memory address)
    • OS managed
    • page fault when address not in memory (need to bring in from disk)
  • Page Size
    • tradeoff when using big pages: less pages + quick lookups vs wasted memory space + more disk paging
    • page size is adjustable (often requires reboot when changed)
      • JVM flag: -XX:+UseLargePages
      • linux: cat /proc/meminfo
  • http://minnie.tuhs.org/CompArch/Lectures/week06.html
  • http://landley.net/writing/memory-faq.txt

Readings

Cache

Coherency

Misses

  • 3 kinds of misses 1
    • Compulsory miss: Desired data was never in the cache and therefor must be paged in

Prefetching

Prefetch = Eagerly load of data Adjacent cache lines Prefetching works best when data is alligned sequential and access patterns are predictable/sequential When pre-fetching works and when it does not: https://t.co/SBzIKrD3wS https://mechanical-sympathy.blogspot.co.at/2012/08/memory-access-patterns-are-important.html

Predictable Access Patterns

  • Temporal Locality: Refering to same data within a short time span
  • Spacial Locality: Refering to data that is close together (cohesion)
  • Sequential Locality: Refering to data which is alligned linearly in memory (array)

Tooling

perf, likwid, lmbench, toplev

Groups

https://groups.google.com/forum/#!forum/mechanical-sympathy

Videos

Name Recorded At Speaker Language/Platform Rating Description
A Crash Course in Modern Hardware Devoxx Cliff Click HW,OS,JVM,Java 8 Really a crash course but still quite good
CPU caches and why you care code::dive conference 2014 Scott Meyers HW,C++ 9 Classic one about caches, must watch
History of Memory Models MIT course ? Theory ? Not complete watched yet
Caching in: understand, measure, and use your CPU Cache more effectively JavaOne 2015 Richard Warburton HW 9 Easy intro
Writing Fast Code I Code::dive 2015 Andrei Alexandrescu C++/HW 9 Low level
Writing Fast Code 2 Code::dive 2015 Andrei Alexandrescu C++/HW 9 Low level
Fastware ACCU 2016 Andrei Alexandrescu - -
Writing Quick Code in C++, Quickly GoingNative 2015 Andrei Alexandrescu - -
Optimization Tips - Mo' Hustle Mo' Problems CooCon 2014 Andrei Alexandrescu 8 Very low level
Data Oriented Design and C++ CppCon 2014 Mike Acton 9 Low level and interesting but very limited use