Cache Issues (time)

First

Previous

Next

Last

Index

Home

Text

Slide 7 of 16

Notes:

EXAMPLE: quantlib correlated random number generation for financial models: rearranging loops (reducing stride) made about 20x--40x speed boost.

CHECKLIST:

Do you have good tools to measure effect of changes (in CPU cycles vs real time)?

Can you estimate working set size?

Do you know anything about target hardware/OS and its interaction with your memory-access patterns?

Will you have to compromise because you have more than one hardware target, for example?

Modern languages have a more-or-less uniform memory model---are there target-specific improvements to can make use of? Is ROMised data and code faster or slower than that loaded into RAM, for example?