<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 14 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
        {font-family:Consolas;
        panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";
        color:black;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
pre
        {mso-style-priority:99;
        mso-style-link:"HTML Preformatted Char";
        margin:0cm;
        margin-bottom:.0001pt;
        font-size:10.0pt;
        font-family:"Courier New";
        color:black;}
span.HTMLPreformattedChar
        {mso-style-name:"HTML Preformatted Char";
        mso-style-priority:99;
        mso-style-link:"HTML Preformatted";
        font-family:"Consolas","serif";
        color:black;}
span.EmailStyle19
        {mso-style-type:personal-reply;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body bgcolor="white" lang="EN-AU" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Hi Matt,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Answers inline below…<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Cheers,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Andrew<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span lang="EN-US" style="font-size:10.0pt;font-family:"Tahoma","sans-serif";color:windowtext">From:</span></b><span lang="EN-US" style="font-size:10.0pt;font-family:"Tahoma","sans-serif";color:windowtext"> Matt Johnson [mailto:johnso87@crhc.illinois.edu]
<br>
<b>Sent:</b> Wednesday, 14 March 2012 22:45<br>
<b>To:</b> Baumann Andrew<br>
<b>Cc:</b> tharris@microsoft.com; barrelfish-users@lists.inf.ethz.ch<br>
<b>Subject:</b> Re: [Barrelfish-users] Porting Barrelfish to a new architecture?<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Thanks Andrew and Tim! That's very helpful. A couple more questions below.<br>
-Matt<br>
<br>
On 03/14/2012 05:14 PM, Tim Harris (RESEARCH) wrote: <o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<pre>Essentially we'd like a remote write operation that pushes some data close to a <o:p></o:p></pre>
<pre>remote core, and an asynchronous notification that causes a lightweight <o:p></o:p></pre>
<pre>exception/control transfer on the target. Previous versions of <o:p></o:p></pre>
<pre>hardware-supported message passing have provided some version <o:p></o:p></pre>
<pre>of the former, but neglected the latter.<o:p></o:p></pre>
</blockquote>
<pre>BTW, we have a fairly detailed design for this now, plus implementation in M5. <o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>I'll try to bring it up to date with the current tree, so it doesn't keep bit-rotting.<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>Hoping to get a write-up ready to submit to ASPLOS 2013 -- either just for use with message passing, or for accelerating a work-stealing system too (along the lines of the Stanford work from a few ASPLOS ago).<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre> --Tim<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre><o:p> </o:p></pre>
<p class="MsoNormal"><br>
On 03/14/2012 05:06 PM, Baumann Andrew wrote: <o:p></o:p></p>
<pre>Hi Matt,<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>I've just been skimming over the Rigel and Cohesion papers -- interesting stuff!<o:p></o:p></pre>
<p class="MsoNormal">Thanks! And thanks for the pointer to the HotOS paper; my groupmates and I often make analogous arguments about different layers of the stack (architecture and software, architecture and compiler, memory model and programming model), so
it's good to know there are kindred spirits out there who also think researchers should do twice as much work!<br>
<br>
<o:p></o:p></p>
<pre><o:p> </o:p></pre>
<pre><o:p> </o:p></pre>
<pre>To answer your questions:<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<pre>* What primitives does Barrelfish *require* of the underlying CPU <o:p></o:p></pre>
<pre>(atomic instructions,<o:p></o:p></pre>
<pre> interrupts, MMU features, etc.)?<o:p></o:p></pre>
</blockquote>
<pre><o:p> </o:p></pre>
<pre>We don't require atomic instructions. <o:p></o:p></pre>
<p class="MsoNormal">I'm guessing Barrelfish still needs some kind of synchronization primitive (e.g., to construct a lock), right? Some memory consistency models allow you to build some kind of synchronization primitives with normal loads and stores -- are
you saying that whatever you can build under the x86 and ARM consistency models is sufficient? Do you need any kind of memory barriers? For what it's worth, I've found Concurrency Kit's platform independence layer to provide a great language for talking
about exactly what a platform provides and what a piece of software requires when it comes to these kinds of things. Unfortunately, the doc page doesn't tell you what the acronyms stand for, and the implementation files are too laden with preprocessor metaprogramming
to be easily parsed :( <a href="http://concurrencykit.org/doc/appendixA.html">http://concurrencykit.org/doc/appendixA.html</a>
<a href="http://concurrencykit.org/cgit/cgit.cgi/ck/tree/include/gcc/x86_64/ck_pr.h">
http://concurrencykit.org/cgit/cgit.cgi/ck/tree/include/gcc/x86_64/ck_pr.h</a> .<br>
<br>
<o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Actually the base Barrelfish system (by design) really does not require any synchronisation primitives. We use spinlocks at user-level for applications that
span a single address space across multiple cores, and in the current codebase these spinlocks are also used at user-level in the single-core case, but that’s a bug / missing performance optimisation to elide them. Other synchronisation takes place via messages.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<pre>We do require some form of interrupts, for pre-emptive multitasking.<o:p></o:p></pre>
<p class="MsoNormal">What sources of interrupt must an architecture support? Timer? Off-chip? Another core on the same chip? Some combination?<span style="color:#1F497D"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Preferably a per-core countdown timer for scheduling, like the x86 APIC timer. On Beehive we used a kludge based on messages sent from a central “timer” core.</span><br>
<br>
<span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p></o:p></span></p>
<pre> We have run on a processor (Beehive) that lacked protection or address translation, but it was quite painful in many respects -- if you had some form of translation <o:p></o:p></pre>
<p class="MsoNormal">What is required here? The ability to create many totally disjoint address spaces? The ability to share pages arbitrarily between address spaces as in traditional virtual memory?<br>
<br>
<span style="color:#1F497D"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Each domain runs in its own address space, and most binaries are statically linked/compiled for the same addresses. You might get by without shared pages… at
first thought, the parts of the system that use them are the x86 message-passing mechanism, bulk transport (e.g. network stack/driver communication), and some similar mechanisms. The kernel relies on access to user pages, but I’m assuming that doesn’t count.</span><br>
<br>
<span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p></o:p></span></p>
<pre>and a protected kernel mode<o:p></o:p></pre>
<p class="MsoNormal">We don't have one yet either, but probably should. Out of curiosity, what did you do about this on Beehive (at a high level; I understand the memory is probably too painful to relive fully :)<br>
<span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">There was still a “kernel” on Beehive, it just happened to be in the same address space and protection domain as the applications (i.e. we ignored the problem).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<pre>, it would make a port much easier.<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<pre>* What additional primitives, if any, are desirable for existing versions of<o:p></o:p></pre>
<pre> Barrelfish to run well?<o:p></o:p></pre>
</blockquote>
<pre><o:p> </o:p></pre>
<pre>Efficient inter-core messaging would be great! (more on this below)<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<pre>* What further primitives would Barrelfish developers like to see on <o:p></o:p></pre>
<pre>future hardware<o:p></o:p></pre>
<pre> platforms?<o:p></o:p></pre>
</blockquote>
<pre><o:p> </o:p></pre>
<pre>Message-passing is the big one. We sketched out some ideas in this HotOS paper:<o:p></o:p></pre>
<pre><a href="http://www.barrelfish.org/gap_hotos11.pdf">http://www.barrelfish.org/gap_hotos11.pdf</a><o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>Essentially we'd like a remote write operation that pushes some data close to a remote core, and an asynchronous notification that causes a lightweight exception/control transfer on the target. Previous versions of hardware-supported message passing have provided some version of the former, but neglected the latter.<o:p></o:p></pre>
<p class="MsoNormal">I agree, that sounds extremely useful; implicit message passing by piggy-backing on the coherence hardware isn't a great solution. The closest thing we have is a broadcast update instruction that broadcasts a cache line from the L2 cache
of one cluster of cores into the L2 caches of all other cores. We only needed broadcast when we initially designed the architecture because we were focused on a task-based programming model where arbitrary point-to-point communication simply doesn't occur.
For being able to run more general classes of program, this generalization seems very useful. Being able to pass an IPI along with the data seems to give a similarly large benefit. For the use cases you have in mind, is it important that the interrupt be
serviced immediately, or would it be sufficient to wait until the target thread finishes what it's doing and does the moral equivalent of calling yield()? If these IPIs are frequent, having the servicing be cooperative rather than preemptive may buy you efficiency
by not having to spill the thread context to memory (maybe when the target yield()s, everything in its context is known to be dead).<br>
<br>
<o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Tim and Stefan should really answer this… they have gone into much more depth than I on the notification mechanisms.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">I think it’s useful to decouple the data transfer from the notification (or at least make the notification optional). I don’t think it’s always necessary to
deliver the notification immediately, and coalescing multiple undelivered notifications makes sense, but I don’t see the value in a notification that requires an explicit action on the receiver side… isn’t that just polling? Also, if I may be so rude, the
decision of what to do with thread context and whether it must be spilled is a software problem :) You can imagine a fault handler that knows enough about the actual program being run to decide whether to spill context and switch to a handler, or just set
a flag that a notification arrived and queue it for later processing.<o:p></o:p></span></p>
<pre><o:p> </o:p></pre>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<pre>* How would one go about porting Barrelfish to another architecture?<o:p></o:p></pre>
<pre> - Is CPU support code segregated into one part of the Barrelfish <o:p></o:p></pre>
<pre>source tree?<o:p></o:p></pre>
<pre> If not, what general patterns should I grep for?<o:p></o:p></pre>
<pre> I see things like .dev files for amd64 and arm, but don't know what <o:p></o:p></pre>
<pre>else is required.<o:p></o:p></pre>
</blockquote>
<pre><o:p> </o:p></pre>
<pre>Our tree is not as cleanly structured as it should be, but in general architecture-specific code lives in a directory named 'arch', e.g. kernel/arch/x86_64, lib/barrelfish/arch/x86_64, etc. Aside from this, the main things you would need are suitable drivers, and the backends for message-passing.<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<pre>* This is likely a dumb question, but does my architecture need ghc <o:p></o:p></pre>
<pre>codegen support<o:p></o:p></pre>
<pre> to run Barrelfish?<o:p></o:p></pre>
</blockquote>
<pre><o:p> </o:p></pre>
<pre>No. GHC is used to build the tools, not the OS or apps.<o:p></o:p></pre>
<p class="MsoNormal">That's what I figured, just thought I'd check :)<br>
<br>
<o:p></o:p></p>
<pre> We definitely compile with GCC, and have at various times compiled with LLVM and ICC (so it shouldn't be hard to get these working if needed).<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<pre>* At a high level, is this worth doing? There are two parts to that:<o:p></o:p></pre>
<pre> - If the port is successful, is there any hope of upstreaming the <o:p></o:p></pre>
<pre>changes so<o:p></o:p></pre>
<pre> they can keep pace with upstream API changes? (Remembering that <o:p></o:p></pre>
<pre>the platform<o:p></o:p></pre>
<pre> in question consists of a simulator, and possibly an FPGA in the <o:p></o:p></pre>
<pre>foreseeable future)<o:p></o:p></pre>
</blockquote>
<pre><o:p> </o:p></pre>
<pre>I can't give you a definitive answer (Mothy is perhaps better placed to do so), but there have been outside contributions to the tree. I think the main issue is one of maintenance bandwidth -- whether there are folks around who care about the port and are using/maintaining it. We recently removed the Beehive port from the tree, because it was unused, and we didn't have the cycles to keep it from bit-rotting.<o:p></o:p></pre>
<p class="MsoNormal">That's understandable; we'll cross that bridge when we come to it, but it's helpful to know that you're open to the idea in the abstract.<br>
<br>
<o:p></o:p></p>
<pre><o:p> </o:p></pre>
<pre><o:p> </o:p></pre>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<pre> - If not, is the upstream API or organization likely to change in <o:p></o:p></pre>
<pre>such a way that the<o:p></o:p></pre>
<pre> port will quickly become non-functional?<o:p></o:p></pre>
</blockquote>
<pre><o:p> </o:p></pre>
<pre>Well, it's a research project... and there is plenty of churn in the tree. Obviously with more ports and a better separation between architecture/platform-specific code this would be less of an issue, but we're not really at that state yet.<o:p></o:p></pre>
<p class="MsoNormal">Also understandable; all the more reason to make the support commitment so that the port can be upstreamed.<br>
<br>
<o:p></o:p></p>
<pre><o:p> </o:p></pre>
<pre><o:p> </o:p></pre>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<pre> - Would it be a Herculean effort, or something a student or two could <o:p></o:p></pre>
<pre>do in O(weeks)?<o:p></o:p></pre>
</blockquote>
<pre><o:p> </o:p></pre>
<pre>It largely depends on how weird your platform is :) If it's not too strange, then I would say that a competent (i.e. knows how to debug C code running on the metal without a symbolic debugger) student could do it in a matter of weeks for the base system. Our early ports took more time, partly because they had to introduce some of the architectural separation that was missing.<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>If you're looking for reference code, the x86_64 port is the most complete, but the ARM port (done by Orion Hodson at MSR) is much cleaner.<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>Hope this helps,<o:p></o:p></pre>
<pre>Andrew<o:p></o:p></pre>
</div>
</body>
</html>