Imperial College London > Talks@ee.imperial > CAS Talks > Slow and Steady: Measuring and Tuning Multicore Interference

Slow and Steady: Measuring and Tuning Multicore Interference

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact John Wickerson.

Practice Talk for RTAS

Now ubiquitous, multicore processors provide replicated compute cores that allow independent programs to run in parallel. However, shared resources, such as last-level caches, can cause otherwise-independent programs to interfere with one another, leading to significant and unpredictable effects on their execution time. Indeed, prior work has shown that specially crafted enemy programs can cause software systems of interest to experience orders-of-magnitude slowdowns when both are run in parallel on a multicore processor. This undermines the suitability of these processors for tasks that have real-time constraints.

In this work, we explore the design and evaluation of techniques for empirically testing interference using enemy programs, with an eye towards reliability (how reproducible the interference results are) and portability (how interference testing can be effective across chips). We first show that different methods of measurement yield significantly different magnitudes of, and variation in, observed interference effects when applied to an enemy process that was shown to be particularly effective in prior work. We propose a method of measurement based on percentiles and confidence intervals, and show that it provides both competitive and reproducible observations. The reliability of our measurements allows us to explore auto-tuning, where enemy programs are further specialised per architecture. We evaluate three different tuning approaches (random search, simulated annealing, and Bayesian optimisation) on five different multicore chips, spanning x86 and ARM architectures. To show that our tuned enemy programs generalise to applications, we evaluate the slowdowns caused by our approach on the AutoBench and CoreMark benchmark suites. We observe statistically larger slowdowns compared to those from prior work in 35 out of 105 benchmark/board combinations, and our method achieves a slowdown factor increase of 3.8x compared with prior work the best case.

Ultimately, we anticipate that our approach will be valuable for `first pass’ evaluation when investigating which multicore processors are suitable for real-time tasks.

This talk is part of the CAS Talks series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

Changes to Talks@imperial | Privacy and Publicity