Reasonably-written C++ code will be naturally fast. This is to C++'s excellent low-penalty abstractions and a memory model close to the machine.
However, a large category of applications have no boundaries on desired speed, meaning there's no point of diminishing returns in making code faster. Better speed means less power consumed for the same work, more workload with the same data center expense, better features for the end user, more features for machine learning, better analytics, and more.
Optimizing has always been an art, and in particular optimizing C++ on contemporary hardware has become a task of formidable complexity. This is because modern hardware has a few peculiarities about it that are not
sufficiently understood and explored. This talk discusses a few such effects, and guides the attendee on how to navigate design and implementation options in search for better performance.
- Inlining: dos, don'ts, and surprising behaviors
- I-Cache and D-Cache impact
- Atomic matters
- Prefer one value of all
- To Elide or to Move: are move constructors all they're cracked up to be?
This is of interest to C++ programmers who are particularly interested in writing high performance software.
- Beware Compiler's Most Vexing Inlining
- Cost and benefit considerations
- Dark Matter: cdtors
- Beware Inline Destructors
- Controling inlining
- Case Study: Custom shared_ptr
- Atomics Matter
- Unwitting Sharing
- Classic Implementation
- Lazy Refcount Allocation
- Skip Last Decrement
- Prefer Zero of All
- Use Dedicated Allocators
- Use Smaller Counters
- To Elide or to Move: That Is the Question
- Conventional Wisdom
- The Efficiency Argument
- The Composability Argument
- The Measurements Argument
- Recap: How to Elide