[Barrelfish-users] Porting Barrelfish to a new architecture?

Thu Mar 15 06:45:26 CET 2012

Thanks Andrew and Tim!  That's very helpful.  A couple more questions below.
-Matt

On 03/14/2012 05:14 PM, Tim Harris (RESEARCH) wrote:
>> Essentially we'd like a remote write operation that pushes some data close to a
>> remote core, and an asynchronous notification that causes a lightweight
>> exception/control transfer on the target. Previous versions of
>> hardware-supported message passing have provided some version
>> of the former, but neglected the latter.
> BTW, we have a fairly detailed design for this now, plus implementation in M5.
>
> I'll try to bring it up to date with the current tree, so it doesn't keep bit-rotting.
>
> Hoping to get a write-up ready to submit to ASPLOS 2013 -- either just for use with message passing, or for accelerating a work-stealing system too (along the lines of the Stanford work from a few ASPLOS ago).
>
>   --Tim
>
>

On 03/14/2012 05:06 PM, Baumann Andrew wrote:
> Hi Matt,
>
> I've just been skimming over the Rigel and Cohesion papers -- interesting stuff!
Thanks!  And thanks for the pointer to the HotOS paper; my groupmates 
and I often make analogous arguments about different layers of the stack 
(architecture and software, architecture and compiler, memory model and 
programming model), so it's good to know there are kindred spirits out 
there who also think researchers should do twice as much work!
>
> To answer your questions:
>
>> * What primitives does Barrelfish *require* of the underlying CPU
>> (atomic instructions,
>>     interrupts, MMU features, etc.)?
> We don't require atomic instructions.
I'm guessing Barrelfish still needs some kind of synchronization 
primitive (e.g., to construct a lock), right?  Some memory consistency 
models allow you to build some kind of synchronization primitives with 
normal loads and stores -- are you saying that whatever you can build 
under the x86 and ARM consistency models is sufficient?  Do you need any 
kind of memory barriers?  For what it's worth, I've found Concurrency 
Kit's platform independence layer to provide a great language for 
talking about exactly what a platform provides and what a piece of 
software requires when it comes to these kinds of things.  
Unfortunately, the doc page doesn't tell you what the acronyms stand 
for, and the implementation files are too laden with preprocessor 
metaprogramming to be easily parsed :( 
http://concurrencykit.org/doc/appendixA.html 
http://concurrencykit.org/cgit/cgit.cgi/ck/tree/include/gcc/x86_64/ck_pr.h .
> We do require some form of interrupts, for pre-emptive multitasking.
What sources of interrupt must an architecture support?  Timer?  
Off-chip?  Another core on the same chip?  Some combination?
>   We have run on a processor (Beehive) that lacked protection or address translation, but it was quite painful in many respects -- if you had some form of translation
What is required here?  The ability to create many totally disjoint 
address spaces?  The ability to share pages arbitrarily between address 
spaces as in traditional virtual memory?

> and a protected kernel mode
We don't have one yet either, but probably should.  Out of curiosity, 
what did you do about this on Beehive (at a high level; I understand the 
memory is probably too painful to relive fully :)
> , it would make a port much easier.
>
>> * What additional primitives, if any, are desirable for existing versions of
>>     Barrelfish to run well?
> Efficient inter-core messaging would be great! (more on this below)
>
>> * What further primitives would Barrelfish developers like to see on
>> future hardware
>>    platforms?
> Message-passing is the big one. We sketched out some ideas in this HotOS paper:
> http://www.barrelfish.org/gap_hotos11.pdf
>
> Essentially we'd like a remote write operation that pushes some data close to a remote core, and an asynchronous notification that causes a lightweight exception/control transfer on the target. Previous versions of hardware-supported message passing have provided some version of the former, but neglected the latter.
I agree, that sounds extremely useful; implicit message passing by 
piggy-backing on the coherence hardware isn't a great solution.  The 
closest thing we have is a broadcast update instruction that broadcasts 
a cache line from the L2 cache of one cluster of cores into the L2 
caches of all other cores.  We only needed broadcast when we initially 
designed the architecture because we were focused on a task-based 
programming model where arbitrary point-to-point communication simply 
doesn't occur.  For being able to run more general classes of program, 
this generalization seems very useful.  Being able to pass an IPI along 
with the data seems to give a similarly large benefit.  For the use 
cases you have in mind, is it important that the interrupt be serviced 
immediately, or would it be sufficient to wait until the target thread 
finishes what it's doing and does the moral equivalent of calling 
yield()?  If these IPIs are frequent, having the servicing be 
cooperative rather than preemptive may buy you efficiency by not having 
to spill the thread context to memory (maybe when the target yield()s, 
everything in its context is known to be dead).
>> * How would one go about porting Barrelfish to another architecture?
>>     - Is CPU support code segregated into one part of the Barrelfish
>> source tree?
>>       If not, what general patterns should I grep for?
>>       I see things like .dev files for amd64 and arm, but don't know what
>> else is required.
> Our tree is not as cleanly structured as it should be, but in general architecture-specific code lives in a directory named 'arch', e.g. kernel/arch/x86_64, lib/barrelfish/arch/x86_64, etc. Aside from this, the main things you would need are suitable drivers, and the backends for message-passing.
>
>> * This is likely a dumb question, but does my architecture need ghc
>> codegen support
>>     to run Barrelfish?
> No. GHC is used to build the tools, not the OS or apps.
That's what I figured, just thought I'd check :)
>   We definitely compile with GCC, and have at various times compiled with LLVM and ICC (so it shouldn't be hard to get these working if needed).
>
>> * At a high level, is this worth doing?  There are two parts to that:
>>     - If the port is successful, is there any hope of upstreaming the
>> changes so
>>       they can keep pace with upstream API changes?  (Remembering that
>> the platform
>>       in question consists of a simulator, and possibly an FPGA in the
>> foreseeable future)
> I can't give you a definitive answer (Mothy is perhaps better placed to do so), but there have been outside contributions to the tree. I think the main issue is one of maintenance bandwidth -- whether there are folks around who care about the port and are using/maintaining it. We recently removed the Beehive port from the tree, because it was unused, and we didn't have the cycles to keep it from bit-rotting.
That's understandable; we'll cross that bridge when we come to it, but 
it's helpful to know that you're open to the idea in the abstract.
>
>>     - If not, is the upstream API or organization likely to change in
>> such a way that the
>>       port will quickly become non-functional?
> Well, it's a research project... and there is plenty of churn in the tree. Obviously with more ports and a better separation between architecture/platform-specific code this would be less of an issue, but we're not really at that state yet.
Also understandable; all the more reason to make the support commitment 
so that the port can be upstreamed.
>
>>     - Would it be a Herculean effort, or something a student or two could
>> do in O(weeks)?
> It largely depends on how weird your platform is :) If it's not too strange, then I would say that a competent (i.e. knows how to debug C code running on the metal without a symbolic debugger) student could do it in a matter of weeks for the base system. Our early ports took more time, partly because they had to introduce some of the architectural separation that was missing.
>
> If you're looking for reference code, the x86_64 port is the most complete, but the ARM port (done by Orion Hodson at MSR) is much cleaner.
>
> Hope this helps,
> Andrew
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.inf.ethz.ch/pipermail/barrelfish-users/attachments/20120315/7e9a32f8/attachment-0001.html