A really interesting talk. As usual, metrics are important - but also as usual, the right metrics are important. If you think metric X relates to algorithmic performance, you need to actually show that’s true. And just because it used to be true, times change and maybe it’s not true anymore.
In this case the idea that counting compares and swaps would predict sort algorithm performance was wrong because branch prediction, and to a lesser extent cache, in modern processors has a noticeable effect of time. Compares that are branch predictor friendly are better than those which are not. Better still math or knowledge of initial state to replace compares.
And not all swaps/compares are the same anymore. The time to access data in L1 vs L2 is different - and both are far superior to accessing data in RAM.
It’s been clear for a long time that a lot of algorithm and data structure books don’t quite tie into the real world, but it was interesting to see a walk through of it.
Just a reminder about how long things take to happen on a computer and a scaled up version in time units we can reason about. So for instance, spending the time on Zippy when writing/reading to disk can be a cost worth paying.
|Access||Latency (ns)||Scaled Latency|
|L1 cache reference||0.5||= 1 second|
|Branch mispredict||5.0||= 10 seconds|
|L2 cache reference||7.0||= 14 seconds|
|Mutex lock/unlock||25.0||= 1 minute|
|Main memory reference||100.0||= 4 minutes|
|Compress 1K bytes with Zippy||3,000.0||= 1.5 hours|
|Send 1K bytes over 1 Gbps network||10,000.0||= 5.5 hours|
|Read 4K randomly from SSD*||150,000.0||= 3.5 days|
|Read 1 MB sequentially from memory||250,000.0||= 6 days|
|Round trip within same datacenter||500,000.0||= 11.5 days|
|Read 1 MB sequentially from SSD*||1,000,000.0||= 3 weeks|
|Disk seek||10,000,000.0||= 7.5 months|
|Read 1 MB sequentially from disk||20,000,000.0||= 1 year & 3 months|