[Barrelfish-users] Porting Barrelfish to a new architecture?

Fri Mar 16 04:20:38 CET 2012

Hi Matt,

Answers inline below...

Cheers,
Andrew

From: Matt Johnson [mailto:johnso87 at crhc.illinois.edu]
Sent: Wednesday, 14 March 2012 22:45
To: Baumann Andrew
Cc: tharris at microsoft.com; barrelfish-users at lists.inf.ethz.ch
Subject: Re: [Barrelfish-users] Porting Barrelfish to a new architecture?

Thanks Andrew and Tim!  That's very helpful.  A couple more questions below.
-Matt

On 03/14/2012 05:14 PM, Tim Harris (RESEARCH) wrote:

Essentially we'd like a remote write operation that pushes some data close to a

remote core, and an asynchronous notification that causes a lightweight

exception/control transfer on the target. Previous versions of

hardware-supported message passing have provided some version

of the former, but neglected the latter.

BTW, we have a fairly detailed design for this now, plus implementation in M5.

I'll try to bring it up to date with the current tree, so it doesn't keep bit-rotting.

Hoping to get a write-up ready to submit to ASPLOS 2013 -- either just for use with message passing, or for accelerating a work-stealing system too (along the lines of the Stanford work from a few ASPLOS ago).

 --Tim

On 03/14/2012 05:06 PM, Baumann Andrew wrote:

Hi Matt,

I've just been skimming over the Rigel and Cohesion papers -- interesting stuff!
Thanks!  And thanks for the pointer to the HotOS paper; my groupmates and I often make analogous arguments about different layers of the stack (architecture and software, architecture and compiler, memory model and programming model), so it's good to know there are kindred spirits out there who also think researchers should do twice as much work!

To answer your questions:

* What primitives does Barrelfish *require* of the underlying CPU

(atomic instructions,

   interrupts, MMU features, etc.)?

We don't require atomic instructions.
I'm guessing Barrelfish still needs some kind of synchronization primitive (e.g., to construct a lock), right?  Some memory consistency models allow you to build some kind of synchronization primitives with normal loads and stores -- are you saying that whatever you can build under the x86 and ARM consistency models is sufficient?  Do you need any kind of memory barriers?  For what it's worth, I've found Concurrency Kit's platform independence layer to provide a great language for talking about exactly what a platform provides and what a piece of software requires when it comes to these kinds of things.  Unfortunately, the doc page doesn't tell you what the acronyms stand for, and the implementation files are too laden with preprocessor metaprogramming to be easily parsed :( http://concurrencykit.org/doc/appendixA.html http://concurrencykit.org/cgit/cgit.cgi/ck/tree/include/gcc/x86_64/ck_pr.h .

Actually the base Barrelfish system (by design) really does not require any synchronisation primitives. We use spinlocks at user-level for applications that span a single address space across multiple cores, and in the current codebase these spinlocks are also used at user-level in the single-core case, but that's a bug / missing performance optimisation to elide them. Other synchronisation takes place via messages.

We do require some form of interrupts, for pre-emptive multitasking.
What sources of interrupt must an architecture support?  Timer?  Off-chip?  Another core on the same chip?  Some combination?

Preferably a per-core countdown timer for scheduling, like the x86 APIC timer. On Beehive we used a kludge based on messages sent from a central "timer" core.

 We have run on a processor (Beehive) that lacked protection or address translation, but it was quite painful in many respects -- if you had some form of translation
What is required here?  The ability to create many totally disjoint address spaces?  The ability to share pages arbitrarily between address spaces as in traditional virtual memory?

Each domain runs in its own address space, and most binaries are statically linked/compiled for the same addresses. You might get by without shared pages... at first thought, the parts of the system that use them are the x86 message-passing mechanism, bulk transport (e.g. network stack/driver communication), and some similar mechanisms. The kernel relies on access to user pages, but I'm assuming that doesn't count.

and a protected kernel mode
We don't have one yet either, but probably should.  Out of curiosity, what did you do about this on Beehive (at a high level; I understand the memory is probably too painful to relive fully :)
There was still a "kernel" on Beehive, it just happened to be in the same address space and protection domain as the applications (i.e. we ignored the problem).

, it would make a port much easier.

* What additional primitives, if any, are desirable for existing versions of

   Barrelfish to run well?

Efficient inter-core messaging would be great! (more on this below)

* What further primitives would Barrelfish developers like to see on

future hardware

  platforms?

Message-passing is the big one. We sketched out some ideas in this HotOS paper:

http://www.barrelfish.org/gap_hotos11.pdf

Essentially we'd like a remote write operation that pushes some data close to a remote core, and an asynchronous notification that causes a lightweight exception/control transfer on the target. Previous versions of hardware-supported message passing have provided some version of the former, but neglected the latter.
I agree, that sounds extremely useful; implicit message passing by piggy-backing on the coherence hardware isn't a great solution.  The closest thing we have is a broadcast update instruction that broadcasts a cache line from the L2 cache of one cluster of cores into the L2 caches of all other cores.  We only needed broadcast when we initially designed the architecture because we were focused on a task-based programming model where arbitrary point-to-point communication simply doesn't occur.  For being able to run more general classes of program, this generalization seems very useful.  Being able to pass an IPI along with the data seems to give a similarly large benefit.  For the use cases you have in mind, is it important that the interrupt be serviced immediately, or would it be sufficient to wait until the target thread finishes what it's doing and does the moral equivalent of calling yield()?  If these IPIs are frequent, having the servicing be cooperative rather than preemptive may buy you efficiency by not having to spill the thread context to memory (maybe when the target yield()s, everything in its context is known to be dead).

Tim and Stefan should really answer this... they have gone into much more depth than I on the notification mechanisms.

I think it's useful to decouple the data transfer from the notification (or at least make the notification optional). I don't think it's always necessary to deliver the notification immediately, and coalescing multiple undelivered notifications makes sense, but I don't see the value in a notification that requires an explicit action on the receiver side... isn't that just polling? Also, if I may be so rude, the decision of what to do with thread context and whether it must be spilled is a software problem :) You can imagine a fault handler that knows enough about the actual program being run to decide whether to spill context and switch to a handler, or just set a flag that a notification arrived and queue it for later processing.

* How would one go about porting Barrelfish to another architecture?

   - Is CPU support code segregated into one part of the Barrelfish

source tree?

     If not, what general patterns should I grep for?

     I see things like .dev files for amd64 and arm, but don't know what

else is required.

Our tree is not as cleanly structured as it should be, but in general architecture-specific code lives in a directory named 'arch', e.g. kernel/arch/x86_64, lib/barrelfish/arch/x86_64, etc. Aside from this, the main things you would need are suitable drivers, and the backends for message-passing.

* This is likely a dumb question, but does my architecture need ghc

codegen support

   to run Barrelfish?

No. GHC is used to build the tools, not the OS or apps.
That's what I figured, just thought I'd check :)

 We definitely compile with GCC, and have at various times compiled with LLVM and ICC (so it shouldn't be hard to get these working if needed).

* At a high level, is this worth doing?  There are two parts to that:

   - If the port is successful, is there any hope of upstreaming the

changes so

     they can keep pace with upstream API changes?  (Remembering that

the platform

     in question consists of a simulator, and possibly an FPGA in the

foreseeable future)

I can't give you a definitive answer (Mothy is perhaps better placed to do so), but there have been outside contributions to the tree. I think the main issue is one of maintenance bandwidth -- whether there are folks around who care about the port and are using/maintaining it. We recently removed the Beehive port from the tree, because it was unused, and we didn't have the cycles to keep it from bit-rotting.
That's understandable; we'll cross that bridge when we come to it, but it's helpful to know that you're open to the idea in the abstract.

   - If not, is the upstream API or organization likely to change in

such a way that the

     port will quickly become non-functional?

Well, it's a research project... and there is plenty of churn in the tree. Obviously with more ports and a better separation between architecture/platform-specific code this would be less of an issue, but we're not really at that state yet.
Also understandable; all the more reason to make the support commitment so that the port can be upstreamed.

   - Would it be a Herculean effort, or something a student or two could

do in O(weeks)?

It largely depends on how weird your platform is :) If it's not too strange, then I would say that a competent (i.e. knows how to debug C code running on the metal without a symbolic debugger) student could do it in a matter of weeks for the base system. Our early ports took more time, partly because they had to introduce some of the architectural separation that was missing.

If you're looking for reference code, the x86_64 port is the most complete, but the ARM port (done by Orion Hodson at MSR) is much cleaner.

Hope this helps,

Andrew
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.inf.ethz.ch/pipermail/barrelfish-users/attachments/20120316/fb5ddcc6/attachment-0001.html