Brain Phrye

code cooking diy fiction personal photos politics reviews tools 


Performance

[ Listen to article ]

A really interesting talk. As usual, metrics are important - but also as usual, the right metrics are important. If you think metric X relates to algorithmic performance, you need to actually show that’s true. And just because it used to be true, times change and maybe it’s not true anymore.

In this case the idea that counting compares and swaps would predict sort algorithm performance was wrong because branch prediction, and to a lesser extent cache, in modern processors has a noticeable effect of time. Compares that are branch predictor friendly are better than those which are not. Better still math or knowledge of initial state to replace compares.

And not all swaps/compares are the same anymore. The time to access data in L1 vs L2 is different - and both are far superior to accessing data in RAM.

It’s been clear for a long time that a lot of algorithm and data structure books don’t quite tie into the real world, but it was interesting to see a walk through of it.

Just a reminder about how long things take to happen on a computer and a scaled up version in time units we can reason about. So for instance, spending the time on Zippy when writing/reading to disk can be a cost worth paying.

AccessLatency (ns)Scaled Latency
L1 cache reference0.5= 1 second
Branch mispredict5.0= 10 seconds
L2 cache reference7.0= 14 seconds
Mutex lock/unlock25.0= 1 minute
Main memory reference100.0= 4 minutes
Compress 1K bytes with Zippy3,000.0= 1.5 hours
Send 1K bytes over 1 Gbps network10,000.0= 5.5 hours
Read 4K randomly from SSD*150,000.0= 3.5 days
Read 1 MB sequentially from memory250,000.0= 6 days
Round trip within same datacenter500,000.0= 11.5 days
Read 1 MB sequentially from SSD*1,000,000.0= 3 weeks
Disk seek10,000,000.0= 7.5 months
Read 1 MB sequentially from disk20,000,000.0= 1 year & 3 months