<div dir="ltr">Werner,<br>Thanks a bunch for the deep explanation.<br>Kosta.<br><br><div class="gmail_quote">On Wed, Mar 2, 2011 at 2:30 PM, Haas, Werner <span dir="ltr">&lt;<a href="mailto:werner.haas@intel.com">werner.haas@intel.com</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">

<div link="blue" vlink="purple" lang="DE">

<div>

<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);">Konstantin,</span></p>

<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);"> </span></p>

<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);" lang="EN-US">To the best of my knowledge we do not have any SW test routines to

measure these latencies. You may want to ask in the MARC forum (<a href="http://communities.intel.com/community/marc" target="_blank">http://communities.intel.com/community/marc</a>),

though.</span></p>

<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);" lang="EN-US"> </span></p>

<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);" lang="EN-US">I consider the numbers as highly trustworthy as we got them through

simulating the logic we built. If software is used to derive these results one

has to take uncertainties in program execution into account. Even in a

BareMetal environment and with this simple in-order core I doubt that you can

measure times with single clock cycle precision because there is jitter among

pairs of RDTSC if there are outstanding memory operations in the pipeline.</span></p>

<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);" lang="EN-US"> </span></p>

<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);" lang="EN-US">Regarding your latency-related questions, please note that all

times in the latency table are measured from the output of the core, i.e. this

implies a L1 miss. For memory accesses with the MPBT attribute bit set the L2

cache is transparent and as far I can see in the implementation it does not

matter whether the L2 is enabled or not. So the times for your red and orange

scenarios are the ones from the table, i.e. either 15 core or 45 core + 8 mesh clock

cycles. I never thought about the impact on access latencies if either paging in

the MMU or caching in L1 was disabled. Such time savings occur inside the P54C

core, i.e. they do not affect the latencies listed in the table.</span></p>

<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);" lang="EN-US"> </span></p>

<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);" lang="EN-US">Best regards,</span></p>

<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);" lang="EN-US">Werner</span></p>

<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);" lang="EN-US"> </span></p>

<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);" lang="EN-US"> </span></p>

<p class="MsoNormal"><span lang="EN-US"> </span></p>

<div style="border-width: medium medium medium 1.5pt; border-style: none none none solid; border-color: -moz-use-text-color -moz-use-text-color -moz-use-text-color blue; padding: 0cm 0cm 0cm 4pt;">

<div>

<div style="border-width: 1pt medium medium; border-style: solid none none; border-color: rgb(181, 196, 223) -moz-use-text-color -moz-use-text-color; padding: 3pt 0cm 0cm;">

<p class="MsoNormal"><b><span style="font-size: 10pt;" lang="EN-US">From:</span></b><span style="font-size: 10pt;" lang="EN-US"> Konstantin Zertsekel

[mailto:<a href="mailto:zertsekel@gmail.com" target="_blank">zertsekel@gmail.com</a>] <br>

<b>Sent:</b> Wednesday, March 02, 2011 9:43 AM<br>

<b>To:</b> Haas, Werner<br>

<b>Cc:</b> <a href="mailto:barrelfish-users@lists.inf.ethz.ch" target="_blank">barrelfish-users@lists.inf.ethz.ch</a>; Dan Tsafrir; Roei; Ido Shamay;

Avi Mendelson; Prof. Assaf Schuster<br>

<b>Subject:</b> Re: [Barrelfish-users] Intel SCC latency measurements for MPB

operations</span></p>

</div>

</div><div><div></div><div class="h5">

<p class="MsoNormal"> </p>

<div>

<p class="MsoNormal" style="margin-bottom: 12pt;">Werner, thanks for the answer.<br>

Assuming we won&#39;t use bypass mode to access the local MPB, is there any

software that tests the latency of accessing the local MPB. Our first-step goal

is to measure the latency in this simple case and see that it is 45 core clock

+ 8 mesh clock as is stated in the graph or at least could be measured in

clocks, not microseconds.<br>

Can you please relate to this picture: [<a href="https://docs.google.com/drawings/edit?id=1X-U10YjKvFQ22sdsKcNFnpHKVLyhR8_6YhJv6tiySUo&amp;hl=en" target="_blank">https://docs.google.com/drawings/edit?id=1X-U10YjKvFQ22sdsKcNFnpHKVLyhR8_6YhJv6tiySUo&amp;hl=en</a>]?

Did anyone measured latency of those paths?<br>

Thanks again, KostaZ.</p>

<div>

<p class="MsoNormal">On Tue, Mar 1, 2011 at 5:53 PM, Haas, Werner &lt;<a href="mailto:werner.haas@intel.com" target="_blank">werner.haas@intel.com</a>&gt; wrote:</p>

<div>

<div>

<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);" lang="EN-US">Konstantin,</span></p>

<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);" lang="EN-US"> </span></p>

<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);" lang="EN-US">I work at Intel Labs so let

me try answering the RCCE-related part: The latency table reflects the numbers

from looking at the actual hardware, i.e. without taking software operation

into account. The RCCE round-trip times, however, were measured by running an

actual application, i.e. they rather reflect the efficiency of one particular

communication algorithm than hardware properties. I do not know the precise

number but there are actually several MPB accesses involved in passing data via

RCCE. </span></p>

<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);" lang="EN-US"> </span></p>

<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);" lang="EN-US">Please also note that the

bypass mode should _<i>not</i>_ be used as we have a hardware bug which can

lead to reading incorrect data. Unfortunately this greatly reduces the benefit

of using the on-die SRAM vs. off-die DDR3.</span></p>

<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);" lang="EN-US"> </span></p>

<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);" lang="EN-US">Best regards,</span></p>

<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);" lang="EN-US">Werner</span></p>

<p class="MsoNormal"> </p>

</div>

</div>

</div>

<p class="MsoNormal"> </p>

</div>

</div></div></div>

</div>

<font face="monospace">--------------------------------------------------------------------------------------<div class="im"><br>

Intel GmbH<br>

Dornacher Strasse 1<br>

85622 Feldkirchen/Muenchen, Deutschland <br>

Sitz der Gesellschaft: Feldkirchen bei Muenchen<br>

Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer<br>

Registergericht: Muenchen HRB 47456 <br>

Ust.-IdNr./VAT Registration No.: DE129385895<br>

Citibank Frankfurt a.M. (BLZ 502 109 00) 600119052</div></font></div>

</blockquote></div><br></div>