[Barrelfish-users] Assertion fired when calling "nameserver_register" function

M Brown M.Brown at huawei.com
Sat Feb 1 00:00:32 CET 2014


Mothy, Kornilios,
   Thanks for the expanded explanation. I understand what is going on. I did what you suggested and wrapped a mutex lock around the nameserver calls. Unfortunately I couldn't use the blocking form of the nameserver lookup as that could cause a deadlock. I had to poll on the non-blocking form which I don't like to do but since the name server is only used during initialization to setup connections, it wasn't much of a worry. Anyway I am still puzzled as to your intended use model of the nameserver. The multi-threaded environment seems like a normal concurrent use model. How did you intend the nameserver to be used in such a situation?
Thanks again, 

Mark Brown
Huawei Technologies Inc.
5340 Legacy Dr., Suite 175
Plano, TX 75024
Tel: 469-277-5700 x5870
Email: m.brown at huawei.com

-----Original Message-----
From: Timothy Roscoe [mailto:troscoe at inf.ethz.ch] 
Sent: Monday, January 13, 2014 11:29 AM
To: Kornilios Kourtis
Cc: barrelfish-users at lists.inf.ethz.ch; debashis bhattacharya; M Brown; Timothy Roscoe
Subject: Re: [Barrelfish-users] Assertion fired when calling "nameserver_register" function


Hi there, 

Just to elaborate a little on this: each communication binding or channel in Barrelfish is, by itself, completely asynchronous - it can send a message and can be polled for received messages at any time.
This much like in L4, or other microkernels.

Obviously, if you are implementing a Remote Procedure Call-style model on top of this, it's important on the client side that the thread which issues the request message is also the thread which receives the
corresponding reply, otherwise deadlock is quite likely.   

Threads wait on a channel (or set of channels) in Barrelfish by actually waiting on a 'waitset', in the spirit of Unix select() or
poll().   There is a default waitset which most bindings are put into
when created.  

When you do an RPC style call, the client stub puts the channel into a newly-created, anonymous waitset, sends the request on the channel, then waits on the waitset for the reply.  

Consequently, if another thread then tries to send on the channel in the meantime, it will put the channel into another anonymous waitset, and the original thread will never receive a reply. 

What we could have done instead is to introduce a lock around each channel, which is acquired by a thread (in the stub code) before each
RPC and released when the reply is received.   We didn't implement
this in part to keep locking out of the stubs unless absolutely necessary (and if you simply use one-way messaging, you almost certainly don't want to mess with locks), and partly to keep the stubs independent of any particular user-level threading model.  However, in this case with a pure-RPC interfaces it might have made more sense to put locks in.

This may be the reason for your deadlock.  The short-term workaround may then be to surround your RPCs to the nameserver with a user-space mutex.  In the long-term, we may want to revisit the interaction between locking implementations and the stub code. 

Not sure if this helps you in any way (and others, please correct me if I'm wrong about the current stub operations!). 

Best,

 -- Mothy

At Mon, 13 Jan 2014 15:54:40 +0100, Kornilios Kourtis <kornilios.kourtis at inf.ethz.ch> wrote:
> Hi Mark,
> 
> On Fri, Jan 10, 2014 at 05:58:18PM +0000, M Brown wrote:
> > Kornilios,
> >  I guess I'm a bit confused. I'm assuming that when I create a  
> > thread it is "not" creating a new domain. If it is that does not  
> > make sense. Please clarify the rules and API for thread creation.
> 
> You are correct in that creating a thread does not create a new domain.
> A spanning domain is a single domain that spans multiple cores. I 
> assumed that this is what you were using, but I was probably wrong.
> Sorry for the confusion.
> 
> The fact remains, however, that the messaging infrastructure is not 
> thread safe and cannot be safely used by multiple threads.
> 
> If I understand correctly, you are using the nameservice client from 
> multiple threads. Since the nameservice client does not do any 
> synchronization, the messaging infrastructre ends up being 
> concurrently used by multple threads which I believe is what triggers 
> the failed assertion.
> 
> Hope this helps. Please, let me know if I'm missing something.
> 
> cheers,
> Kornilios.
> 
> >
> > Mark Brown
> > Huawei Technologies Inc.
> > 5340 Legacy Dr., Suite 175
> > Plano, TX 75024
> > Tel: 469-277-5700 x5870
> > Email: m.brown at huawei.com
> >
> >
> > -----Original Message-----
> > From: Kornilios Kourtis [mailto:kornilios.kourtis at inf.ethz.ch]
> > Sent: Friday, January 10, 2014 10:35 AM
> > To: M Brown
> > Cc: Timothy Roscoe; barrelfish-users at lists.inf.ethz.ch; debashis 
> > bhattacharya
> > Subject: Re: [Barrelfish-users] Assertion fired when calling 
> > "nameserver_register" function
> >
> > Hi Mark,
> >
> > On Thu, Jan 09, 2014 at 05:48:07PM +0000, M Brown wrote:
> > > Kornilios,
> > >
> > >  I believe that I have found a bug in the nameserver. The problem 
> > > appears to be in the use of the nameserver functions in multiple 
> > > threads running on a single core. In this case the nameserver 
> > > functions are non-reentrant. To illustrate this I took your 
> > > message example (xmpl-msg) and wrapped both the client and server 
> > > functions in separate threads and ran the application on a single 
> > > core. The same assertion fired as illustrated below:
> > >
> > > The nameservice blocking lookup is invoked by the client 
> > > application near the beginning to get the iref. The server then 
> > > tries to register the iref in the nameserver and gets the assertion.
> > >
> > > In general for the nameserver to be useful, it needs to be 
> > > accessible by any number of threads running on any number of cores 
> > > in a concurrent manner.
> >
> > It seems that you are using multiple threads (spanned domains) to access the messaging infrastructure of Barrelfish. As far as I know, spanned domains were implemented as a quick way to enable running shared-memory applications, and a significant part of the barrelfish infrastructure (e.g., messages) will not work correctly with them. To the best of my knowledge, the only thing that works reliably on a spanned domain is thread synchronization primitives.
> >
> > I'm afraid this is not an easy problem to fix. My suggestion would be to avoid spanned domains altogether. If this is not possible the only quick solution I can think of is protecting libbarrelfish invocations with a lock.
> >
> > cheers,
> > Kornilios.
> >
> >
> > >
> > >
> > > Thanks,
> > >
> > >
> > >
> > > Mark Brown
> > >
> > > Huawei Technologies Inc.
> > >
> > > 5340 Legacy Dr., Suite 175
> > >
> > > Plano, TX 75024
> > >
> > > Tel: 469-277-5700 x5870
> > >
> > > Email: m.brown at huawei.com
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Kornilios Kourtis [mailto:kornilios.kourtis at inf.ethz.ch]
> > > Sent: Thursday, December 19, 2013 6:33 AM
> > > To: M Brown
> > > Cc: barrelfish-users at lists.inf.ethz.ch
> > > Subject: Re: [Barrelfish-users] Assertion fired when calling 
> > > "nameserver_register" function
> > >
> > >
> > >
> > > Dear Mark,
> > >
> > >
> > >
> > > On Mon, Dec 16, 2013 at 05:38:37PM +0000, M Brown wrote:
> > >
> > > > Guys,
> > >
> > > >
> > >
> > > >    I’m getting the following assertion fired when calling the
> > >
> > > >    nameserver_register function from within an export callback
> > >
> > > >    function:
> > >
> > > >
> > >
> > > > [cid]
> > >
> > > >
> > >
> > > > The example message test works fine. The structure of the code I 
> > > > have
> > >
> > > > is as follows:
> > >
> > > >
> > >
> > > > main {
> > >
> > > >                 thread_create(myTask, NULL);
> > >
> > > >                 .
> > >
> > > >                 .
> > >
> > > >                 .
> > >
> > > > }
> > >
> > > >
> > >
> > > > int myTask(void* arg) {
> > >
> > > >                 .
> > >
> > > >                 .
> > >
> > > >                 .
> > >
> > > >                 <iface>_export(NULL,
> > >
> > > >                                export_cb, connect_cb,
> > >
> > > >                                get_default_waitset(),
> > >
> > > >                                IDC_EXPORT_FLAGS_DEFAULT);
> > >
> > > >
> > >
> > > > }
> > >
> > > >
> > >
> > > > void export_cb(void *st, errval_t err, iref_t iref) {
> > >
> > > >                 .
> > >
> > > >                 .
> > >
> > > >                 .
> > >
> > > >                 // The assertion fires within this invocation
> > >
> > > >                 nameserver_register(“iface”, iref); }
> > >
> > > >
> > >
> > > > Is there something I’m doing wrong here?
> > >
> > >
> > >
> > > [I'm guessing you mean nameservice_register() above]
> > >
> > >
> > >
> > > Judging from the failed assertion (!_rpc->rpc_in_progress), I'm 
> > > guessing that it might have something to do with using multiple 
> > > threads. What are the other threads doing? Can you reproduce the problem when using a single thread?
> > >
> > >
> > >
> > > cheers,
> > >
> > > Kornilios.
> > >
> > >
> > >
> > > --
> > >
> > > Kornilios Kourtis
> > >
> > >
> > >
> > > _______________________________________________
> > >
> > > Barrelfish-users mailing list
> > >
> > > Barrelfish-users at lists.inf.ethz.ch
> > >
> > > https://lists.inf.ethz.ch/mailman/listinfo/barrelfish-users
> > >
> >
> >
> >
> > --
> > Kornilios Kourtis
> > _______________________________________________
> > Barrelfish-users mailing list
> > Barrelfish-users at lists.inf.ethz.ch
> > https://lists.inf.ethz.ch/mailman/listinfo/barrelfish-users
> 
> --
> Kornilios Kourtis
> 
> _______________________________________________
> Barrelfish-users mailing list
> Barrelfish-users at lists.inf.ethz.ch
> https://lists.inf.ethz.ch/mailman/listinfo/barrelfish-users
> 


More information about the Barrelfish-users mailing list