Performance | Brain Phrye

A really interesting talk. As usual, metrics are important - but also as usual, the right metrics are important. If you think metric X relates to algorithmic performance, you need to actually show that’s true. And just because it used to be true, times change and maybe it’s not true anymore.

In this case the idea that counting compares and swaps would predict sort algorithm performance was wrong because branch prediction, and to a lesser extent cache, in modern processors has a noticeable effect of time. Compares that are branch predictor friendly are better than those which are not. Better still math or knowledge of initial state to replace compares.

And not all swaps/compares are the same anymore. The time to access data in L1 vs L2 is different - and both are far superior to accessing data in RAM.

It’s been clear for a long time that a lot of algorithm and data structure books don’t quite tie into the real world, but it was interesting to see a walk through of it.

Just a reminder about how long things take to happen on a computer and a scaled up version in time units we can reason about. So for instance, spending the time on Zippy when writing/reading to disk can be a cost worth paying.

Access	Latency (ns)	Scaled Latency
L1 cache reference	0.5	= 1 second
Branch mispredict	5.0	= 10 seconds
L2 cache reference	7.0	= 14 seconds
Mutex lock/unlock	25.0	= 1 minute
Main memory reference	100.0	= 4 minutes
Compress 1K bytes with Zippy	3,000.0	= 1.5 hours
Send 1K bytes over 1 Gbps network	10,000.0	= 5.5 hours
Read 4K randomly from SSD*	150,000.0	= 3.5 days
Read 1 MB sequentially from memory	250,000.0	= 6 days
Round trip within same datacenter	500,000.0	= 11.5 days
Read 1 MB sequentially from SSD*	1,000,000.0	= 3 weeks
Disk seek	10,000,000.0	= 7.5 months
Read 1 MB sequentially from disk	20,000,000.0	= 1 year & 3 months