[Barrelfish-users] Assertion fired when calling "nameserver_register" function

Timothy Roscoe troscoe at inf.ethz.ch
Mon Jan 13 18:29:02 CET 2014


Hi there, 

Just to elaborate a little on this: each communication binding or
channel in Barrelfish is, by itself, completely asynchronous - it can
send a message and can be polled for received messages at any time.
This much like in L4, or other microkernels.

Obviously, if you are implementing a Remote Procedure Call-style model
on top of this, it's important on the client side that the thread
which issues the request message is also the thread which receives the
corresponding reply, otherwise deadlock is quite likely.   

Threads wait on a channel (or set of channels) in Barrelfish by
actually waiting on a 'waitset', in the spirit of Unix select() or
poll().   There is a default waitset which most bindings are put into
when created.  

When you do an RPC style call, the client stub puts the channel into a
newly-created, anonymous waitset, sends the request on the channel,
then waits on the waitset for the reply.  

Consequently, if another thread then tries to send on the channel in
the meantime, it will put the channel into another anonymous waitset,
and the original thread will never receive a reply. 

What we could have done instead is to introduce a lock around each
channel, which is acquired by a thread (in the stub code) before each
RPC and released when the reply is received.   We didn't implement
this in part to keep locking out of the stubs unless absolutely
necessary (and if you simply use one-way messaging, you almost
certainly don't want to mess with locks), and partly to keep the stubs
independent of any particular user-level threading model.  However, in
this case with a pure-RPC interfaces it might have made more sense to
put locks in.

This may be the reason for your deadlock.  The short-term workaround
may then be to surround your RPCs to the nameserver with a user-space
mutex.  In the long-term, we may want to revisit the interaction
between locking implementations and the stub code. 

Not sure if this helps you in any way (and others, please correct me
if I'm wrong about the current stub operations!). 

Best,

 -- Mothy

At Mon, 13 Jan 2014 15:54:40 +0100, Kornilios Kourtis <kornilios.kourtis at inf.ethz.ch> wrote:
> Hi Mark,
> 
> On Fri, Jan 10, 2014 at 05:58:18PM +0000, M Brown wrote:
> > Kornilios,
> >  I guess I'm a bit confused. I'm assuming that when I create a
> >  thread it is "not" creating a new domain. If it is that does not
> >  make sense. Please clarify the rules and API for thread creation.
> 
> You are correct in that creating a thread does not create a new domain.
> A spanning domain is a single domain that spans multiple cores. I
> assumed that this is what you were using, but I was probably wrong.
> Sorry for the confusion.
> 
> The fact remains, however, that the messaging infrastructure is not
> thread safe and cannot be safely used by multiple threads.
> 
> If I understand correctly, you are using the nameservice client from
> multiple threads. Since the nameservice client does not do any
> synchronization, the messaging infrastructre ends up being concurrently
> used by multple threads which I believe is what triggers the failed
> assertion.
> 
> Hope this helps. Please, let me know if I'm missing something.
> 
> cheers,
> Kornilios.
> 
> >
> > Mark Brown
> > Huawei Technologies Inc.
> > 5340 Legacy Dr., Suite 175
> > Plano, TX 75024
> > Tel: 469-277-5700 x5870
> > Email: m.brown at huawei.com
> >
> >
> > -----Original Message-----
> > From: Kornilios Kourtis [mailto:kornilios.kourtis at inf.ethz.ch]
> > Sent: Friday, January 10, 2014 10:35 AM
> > To: M Brown
> > Cc: Timothy Roscoe; barrelfish-users at lists.inf.ethz.ch; debashis bhattacharya
> > Subject: Re: [Barrelfish-users] Assertion fired when calling "nameserver_register" function
> >
> > Hi Mark,
> >
> > On Thu, Jan 09, 2014 at 05:48:07PM +0000, M Brown wrote:
> > > Kornilios,
> > >
> > >  I believe that I have found a bug in the nameserver. The problem
> > > appears to be in the use of the nameserver functions in multiple
> > > threads running on a single core. In this case the nameserver
> > > functions are non-reentrant. To illustrate this I took your message
> > > example (xmpl-msg) and wrapped both the client and server functions in
> > > separate threads and ran the application on a single core. The same
> > > assertion fired as illustrated below:
> > >
> > > The nameservice blocking lookup is invoked by the client application
> > > near the beginning to get the iref. The server then tries to register
> > > the iref in the nameserver and gets the assertion.
> > >
> > > In general for the nameserver to be useful, it needs to be accessible
> > > by any number of threads running on any number of cores in a
> > > concurrent manner.
> >
> > It seems that you are using multiple threads (spanned domains) to access the messaging infrastructure of Barrelfish. As far as I know, spanned domains were implemented as a quick way to enable running shared-memory applications, and a significant part of the barrelfish infrastructure (e.g., messages) will not work correctly with them. To the best of my knowledge, the only thing that works reliably on a spanned domain is thread synchronization primitives.
> >
> > I'm afraid this is not an easy problem to fix. My suggestion would be to avoid spanned domains altogether. If this is not possible the only quick solution I can think of is protecting libbarrelfish invocations with a lock.
> >
> > cheers,
> > Kornilios.
> >
> >
> > >
> > >
> > > Thanks,
> > >
> > >
> > >
> > > Mark Brown
> > >
> > > Huawei Technologies Inc.
> > >
> > > 5340 Legacy Dr., Suite 175
> > >
> > > Plano, TX 75024
> > >
> > > Tel: 469-277-5700 x5870
> > >
> > > Email: m.brown at huawei.com
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Kornilios Kourtis [mailto:kornilios.kourtis at inf.ethz.ch]
> > > Sent: Thursday, December 19, 2013 6:33 AM
> > > To: M Brown
> > > Cc: barrelfish-users at lists.inf.ethz.ch
> > > Subject: Re: [Barrelfish-users] Assertion fired when calling
> > > "nameserver_register" function
> > >
> > >
> > >
> > > Dear Mark,
> > >
> > >
> > >
> > > On Mon, Dec 16, 2013 at 05:38:37PM +0000, M Brown wrote:
> > >
> > > > Guys,
> > >
> > > >
> > >
> > > >    I’m getting the following assertion fired when calling the
> > >
> > > >    nameserver_register function from within an export callback
> > >
> > > >    function:
> > >
> > > >
> > >
> > > > [cid]
> > >
> > > >
> > >
> > > > The example message test works fine. The structure of the code I
> > > > have
> > >
> > > > is as follows:
> > >
> > > >
> > >
> > > > main {
> > >
> > > >                 thread_create(myTask, NULL);
> > >
> > > >                 .
> > >
> > > >                 .
> > >
> > > >                 .
> > >
> > > > }
> > >
> > > >
> > >
> > > > int myTask(void* arg) {
> > >
> > > >                 .
> > >
> > > >                 .
> > >
> > > >                 .
> > >
> > > >                 <iface>_export(NULL,
> > >
> > > >                                export_cb, connect_cb,
> > >
> > > >                                get_default_waitset(),
> > >
> > > >                                IDC_EXPORT_FLAGS_DEFAULT);
> > >
> > > >
> > >
> > > > }
> > >
> > > >
> > >
> > > > void export_cb(void *st, errval_t err, iref_t iref) {
> > >
> > > >                 .
> > >
> > > >                 .
> > >
> > > >                 .
> > >
> > > >                 // The assertion fires within this invocation
> > >
> > > >                 nameserver_register(“iface”, iref); }
> > >
> > > >
> > >
> > > > Is there something I’m doing wrong here?
> > >
> > >
> > >
> > > [I'm guessing you mean nameservice_register() above]
> > >
> > >
> > >
> > > Judging from the failed assertion (!_rpc->rpc_in_progress), I'm
> > > guessing that it might have something to do with using multiple
> > > threads. What are the other threads doing? Can you reproduce the problem when using a single thread?
> > >
> > >
> > >
> > > cheers,
> > >
> > > Kornilios.
> > >
> > >
> > >
> > > --
> > >
> > > Kornilios Kourtis
> > >
> > >
> > >
> > > _______________________________________________
> > >
> > > Barrelfish-users mailing list
> > >
> > > Barrelfish-users at lists.inf.ethz.ch
> > >
> > > https://lists.inf.ethz.ch/mailman/listinfo/barrelfish-users
> > >
> >
> >
> >
> > --
> > Kornilios Kourtis
> > _______________________________________________
> > Barrelfish-users mailing list
> > Barrelfish-users at lists.inf.ethz.ch
> > https://lists.inf.ethz.ch/mailman/listinfo/barrelfish-users
> 
> --
> Kornilios Kourtis
> 
> _______________________________________________
> Barrelfish-users mailing list
> Barrelfish-users at lists.inf.ethz.ch
> https://lists.inf.ethz.ch/mailman/listinfo/barrelfish-users
> 



More information about the Barrelfish-users mailing list