[Barrelfish-users] [Barrelfish] Intel SCC latency measurements for MPB operations

Tue Mar 1 16:59:52 CET 2011

Konstantin,

I'm not sure what communication stack your graphs are using (whether 
this is bare-metal RCCE, RCCE over Linux TCP, etc.).  I'm also assuming 
this isn't using Barrelfish (as we've only released SCC support an hour 
or two ago).

On Barrelfish we require a trap to kernel mode for inter-core messages, 
which in practice dominates the time taken to access the MPB (once 
you're in the kernel, we can transfer a cache line to another core's 
on-time MPB in a hundred clocks or so as Intel advertise).  Also, due to 
queuing issues, Barrelfish's interconnect driver for the SCC passes 
message payloads in DDR3, and uses the MPB for the metadata.

  -- Mothy

On 03/01/2011 04:53 PM, Haas, Werner wrote:
> Konstantin,
>
> I work at Intel Labs so let me try answering the RCCE-related part: The
> latency table reflects the numbers from looking at the actual hardware,
> i.e. without taking software operation into account. The RCCE round-trip
> times, however, were measured by running an actual application, i.e.
> they rather reflect the efficiency of one particular communication
> algorithm than hardware properties. I do not know the precise number but
> there are actually several MPB accesses involved in passing data via RCCE.
>
> Please also note that the bypass mode should _/not/_ be used as we have
> a hardware bug which can lead to reading incorrect data. Unfortunately
> this greatly reduces the benefit of using the on-die SRAM vs. off-die DDR3.
>
> Best regards,
>
> Werner
>
> *From:*Konstantin Zertsekel [mailto:zertsekel at gmail.com]
> *Sent:* Tuesday, March 01, 2011 4:32 PM
> *To:* barrelfish-users at lists.inf.ethz.ch
> *Cc:* Dan Tsafrir; Konstantin Zertsekel; Roei; Ido Shamay; Avi
> Mendelson; Prof. Assaf Schuster
> *Subject:* [Barrelfish-users] Intel SCC latency measurements for MPB
> operations
>
> Hi all,
> I am engaged in the project with Intel SCC chip where communication
> latency is important factor.
> Now, according to RCCE inter-core Ping-Pong test, the minimum latency is
> 5 microseconds (see this graph
> <https://picasaweb.google.com/lh/photo/DP8vsfafDDHqcf_SZGzZMg?feat=directlink>:
> https://picasaweb.google.com/lh/photo/DP8vsfafDDHqcf_SZGzZMg?feat=directlink).
> But according to latency table for various memory accesses by Intel (see
> this table
> <https://picasaweb.google.com/lh/photo/j-m4PkXxumRCoCQy3jmAuQ?feat=directlink>:
> https://picasaweb.google.com/lh/photo/j-m4PkXxumRCoCQy3jmAuQ?feat=directlink),
> the minimum latency (when accessing the local MPB with bypass) is
> measured in ~100 clocks, not microseconds.
> What is the reason for such a wide gap in RCCE implementation and
> hardware latency table?
> I guess all kinds of memory access latency measurements is a must-know
> stuff for porting Barrelfish to SCC...
> Thanks, KostaZ.
>
> --------------------------------------------------------------------------------------
> Intel GmbH
> Dornacher Strasse 1
> 85622 Feldkirchen/Muenchen, Deutschland
> Sitz der Gesellschaft: Feldkirchen bei Muenchen
> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> Registergericht: Muenchen HRB 47456
> Ust.-IdNr./VAT Registration No.: DE129385895
> Citibank Frankfurt a.M. (BLZ 502 109 00) 600119052