[Barrelfish-users] Intel SCC latency measurements for MPB operations

Wed Mar 2 13:30:52 CET 2011

Konstantin,

To the best of my knowledge we do not have any SW test routines to measure these latencies. You may want to ask in the MARC forum (http://communities.intel.com/community/marc), though.

I consider the numbers as highly trustworthy as we got them through simulating the logic we built. If software is used to derive these results one has to take uncertainties in program execution into account. Even in a BareMetal environment and with this simple in-order core I doubt that you can measure times with single clock cycle precision because there is jitter among pairs of RDTSC if there are outstanding memory operations in the pipeline.

Regarding your latency-related questions, please note that all times in the latency table are measured from the output of the core, i.e. this implies a L1 miss. For memory accesses with the MPBT attribute bit set the L2 cache is transparent and as far I can see in the implementation it does not matter whether the L2 is enabled or not. So the times for your red and orange scenarios are the ones from the table, i.e. either 15 core or 45 core + 8 mesh clock cycles. I never thought about the impact on access latencies if either paging in the MMU or caching in L1 was disabled. Such time savings occur inside the P54C core, i.e. they do not affect the latencies listed in the table.

Best regards,
Werner

From: Konstantin Zertsekel [mailto:zertsekel at gmail.com]
Sent: Wednesday, March 02, 2011 9:43 AM
To: Haas, Werner
Cc: barrelfish-users at lists.inf.ethz.ch; Dan Tsafrir; Roei; Ido Shamay; Avi Mendelson; Prof. Assaf Schuster
Subject: Re: [Barrelfish-users] Intel SCC latency measurements for MPB operations

Werner, thanks for the answer.
Assuming we won't use bypass mode to access the local MPB, is there any software that tests the latency of accessing the local MPB. Our first-step goal is to measure the latency in this simple case and see that it is 45 core clock + 8 mesh clock as is stated in the graph or at least could be measured in clocks, not microseconds.
Can you please relate to this picture: [https://docs.google.com/drawings/edit?id=1X-U10YjKvFQ22sdsKcNFnpHKVLyhR8_6YhJv6tiySUo&hl=en]? Did anyone measured latency of those paths?
Thanks again, KostaZ.
On Tue, Mar 1, 2011 at 5:53 PM, Haas, Werner <werner.haas at intel.com<mailto:werner.haas at intel.com>> wrote:
Konstantin,

I work at Intel Labs so let me try answering the RCCE-related part: The latency table reflects the numbers from looking at the actual hardware, i.e. without taking software operation into account. The RCCE round-trip times, however, were measured by running an actual application, i.e. they rather reflect the efficiency of one particular communication algorithm than hardware properties. I do not know the precise number but there are actually several MPB accesses involved in passing data via RCCE.

Please also note that the bypass mode should _not_ be used as we have a hardware bug which can lead to reading incorrect data. Unfortunately this greatly reduces the benefit of using the on-die SRAM vs. off-die DDR3.

Best regards,
Werner

--------------------------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen, Deutschland 
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 
Ust.-IdNr./VAT Registration No.: DE129385895
Citibank Frankfurt a.M. (BLZ 502 109 00) 600119052
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.inf.ethz.ch/pipermail/barrelfish-users/attachments/20110302/a6d21042/attachment.html