Imperial College London > Talks@ee.imperial > CAS Talks > Scheduling weakly consistency C concurrency for reconfigurable hardware

Scheduling weakly consistency C concurrency for reconfigurable hardware

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact George A Constantinides.

Lock-free algorithms, in which threads synchronise not via coarse-grained mutual exclusion but via fine-grained atomic operations (atomics), have been shown empirically to be the fastest class of multi-threaded algorithms in the realm of conventional processors. This talk explores how these algorithms can be compiled from C to reconfigurable hardware via high-level synthesis (HLS).

We focus on posing atomics as the scheduling problem, in which software instructions are assigned to hardware clock cycles. We first show that typical HLS scheduling constraints are insufficient to implement atomics, because they permit some instruction reorderings that, though sound in a single-threaded context, demonstrably cause erroneous results when synthesising multi-threaded programs. We then show that correct behaviour can be restored by imposing additional intra-thread constraints among the memory operations. In addition, we show that we can support the pipelining of loops containing atomics by injecting further inter-iteration constraints.

We extend two LegUp HLS tool support two memory models that are capable of synthesising both sequentially-consistent (SC) and weakly-consistent (weak) atomics correctly as defined by the 2011 revision of the C standard. Weak atomics necessitate fewer constraints than SC atomics, but suffice for many multi-threaded algorithms. We confirm, via automatic model-checking, that we correctly implement the semantics in accordance with the C standard. A case study on a circular buffer suggests that on average circuits synthesised from programs that schedule atomics correctly can be 6x faster than an existing lock-based implementation of atomics, that weak atomics can yield a further 1.3x speedup, and that pipelining can yield a further 1.3x speedup.

This talk is part of the CAS Talks series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

Changes to Talks@imperial | Privacy and Publicity