Cache Issues (time)

Minimise conditionals and branches in inner loops

Estimate your system’s `working set’

Mind your memory hierarchy

Understand caching models and how they affect you

Be careful of data layout in memory

Is language’s uniform memory model optimal?

Previous slide Next slide Back to first slide View graphic version

Notes:

EXAMPLE: quantlib correlated random number generation for financial models: rearranging loops (reducing stride) made about 20x--40x speed boost.

CHECKLIST:

Do you have good tools to measure effect of changes (in CPU cycles vs real time)?

Can you estimate working set size?

Do you know anything about target hardware/OS and its interaction with your memory-access patterns?

Will you have to compromise because you have more than one hardware target, for example?

Modern languages have a more-or-less uniform memory model---are there target-specific improvements to can make use of? Is ROMised data and code faster or slower than that loaded into RAM, for example?