[Barrelfish-users] Assertion fired when calling "nameserver_register" function

Timothy Roscoe troscoe at inf.ethz.ch
Mon Feb 3 11:19:42 CET 2014


Dear Mark,

You're right that a multithreaded environment is the common case.  With 
Barrelfish, the interface to remote servers started as a pure 
message-passing one, with RPC semantics being added later.  An early 
decision was to avoid adding thread safety to a particular 
message-passing channel for efficiency reasons; for many use-cases 
(including multithreaded ones) this makes sense.  In the case of the 
nameserver, it's not ideal.

For some of our remote services (such as the memory server), all the RPC 
calls are wrapped inside a client-side library, where we can hide the 
locking from the user.   In the case of the nameserver, we don't, but 
clearly we should...

One alternative implementation (which we don't have right now) would be 
to allow lots of outstanding RPCs over a single channel, and then 
dynamically match replies to waiting threads when they arrive.  This 
would require an additional level of accounting not provided by current 
Barrelfish waitsets (which, like Unix file descriptor sets, allow you 
wait on channels but not specific messages), but for RPC-style 
interactions might be a better model.

That said, I'm still not sure why you could not use a blocking call to 
the nameserver.   If the nameserver channel binding is protected by a 
client-side lock, you would only see deadlock if you needed to receive 
(and reply to) an asynchronous message from the nameserver before it 
would return, but I'm not sure I understand your use case in sufficient 
detail.

  -- Mothy


On 02/01/2014 12:00 AM, M Brown wrote:
> Mothy, Kornilios, Thanks for the expanded explanation. I understand
> what is going on. I did what you suggested and wrapped a mutex lock
> around the nameserver calls. Unfortunately I couldn't use the
> blocking form of the nameserver lookup as that could cause a
> deadlock. I had to poll on the non-blocking form which I don't like
> to do but since the name server is only used during initialization to
> setup connections, it wasn't much of a worry. Anyway I am still
> puzzled as to your intended use model of the nameserver. The
> multi-threaded environment seems like a normal concurrent use model.
> How did you intend the nameserver to be used in such a situation?
> Thanks again,
>
> Mark Brown Huawei Technologies Inc. 5340 Legacy Dr., Suite 175 Plano,
> TX 75024 Tel: 469-277-5700 x5870 Email: m.brown at huawei.com
>
> -----Original Message----- From: Timothy Roscoe
> [mailto:troscoe at inf.ethz.ch] Sent: Monday, January 13, 2014 11:29 AM
> To: Kornilios Kourtis Cc: barrelfish-users at lists.inf.ethz.ch;
> debashis bhattacharya; M Brown; Timothy Roscoe Subject: Re:
> [Barrelfish-users] Assertion fired when calling "nameserver_register"
> function
>
>
> Hi there,
>
> Just to elaborate a little on this: each communication binding or
> channel in Barrelfish is, by itself, completely asynchronous - it can
> send a message and can be polled for received messages at any time.
> This much like in L4, or other microkernels.
>
> Obviously, if you are implementing a Remote Procedure Call-style
> model on top of this, it's important on the client side that the
> thread which issues the request message is also the thread which
> receives the corresponding reply, otherwise deadlock is quite
> likely.
>
> Threads wait on a channel (or set of channels) in Barrelfish by
> actually waiting on a 'waitset', in the spirit of Unix select() or
> poll().   There is a default waitset which most bindings are put
> into when created.
>
> When you do an RPC style call, the client stub puts the channel into
> a newly-created, anonymous waitset, sends the request on the channel,
> then waits on the waitset for the reply.
>
> Consequently, if another thread then tries to send on the channel in
> the meantime, it will put the channel into another anonymous waitset,
> and the original thread will never receive a reply.
>
> What we could have done instead is to introduce a lock around each
> channel, which is acquired by a thread (in the stub code) before
> each RPC and released when the reply is received.   We didn't
> implement this in part to keep locking out of the stubs unless
> absolutely necessary (and if you simply use one-way messaging, you
> almost certainly don't want to mess with locks), and partly to keep
> the stubs independent of any particular user-level threading model.
> However, in this case with a pure-RPC interfaces it might have made
> more sense to put locks in.
>
> This may be the reason for your deadlock.  The short-term workaround
> may then be to surround your RPCs to the nameserver with a user-space
> mutex.  In the long-term, we may want to revisit the interaction
> between locking implementations and the stub code.
>
> Not sure if this helps you in any way (and others, please correct me
> if I'm wrong about the current stub operations!).
>
> Best,
>
> -- Mothy
>
> At Mon, 13 Jan 2014 15:54:40 +0100, Kornilios Kourtis
> <kornilios.kourtis at inf.ethz.ch> wrote:
>> Hi Mark,
>>
>> On Fri, Jan 10, 2014 at 05:58:18PM +0000, M Brown wrote:
>>> Kornilios, I guess I'm a bit confused. I'm assuming that when I
>>> create a thread it is "not" creating a new domain. If it is that
>>> does not make sense. Please clarify the rules and API for thread
>>> creation.
>>
>> You are correct in that creating a thread does not create a new
>> domain. A spanning domain is a single domain that spans multiple
>> cores. I assumed that this is what you were using, but I was
>> probably wrong. Sorry for the confusion.
>>
>> The fact remains, however, that the messaging infrastructure is
>> not thread safe and cannot be safely used by multiple threads.
>>
>> If I understand correctly, you are using the nameservice client
>> from multiple threads. Since the nameservice client does not do
>> any synchronization, the messaging infrastructre ends up being
>> concurrently used by multple threads which I believe is what
>> triggers the failed assertion.
>>
>> Hope this helps. Please, let me know if I'm missing something.
>>
>> cheers, Kornilios.
>>
>>>
>>> Mark Brown Huawei Technologies Inc. 5340 Legacy Dr., Suite 175
>>> Plano, TX 75024 Tel: 469-277-5700 x5870 Email:
>>> m.brown at huawei.com
>>>
>>>
>>> -----Original Message----- From: Kornilios Kourtis
>>> [mailto:kornilios.kourtis at inf.ethz.ch] Sent: Friday, January 10,
>>> 2014 10:35 AM To: M Brown Cc: Timothy Roscoe;
>>> barrelfish-users at lists.inf.ethz.ch; debashis bhattacharya
>>> Subject: Re: [Barrelfish-users] Assertion fired when calling
>>> "nameserver_register" function
>>>
>>> Hi Mark,
>>>
>>> On Thu, Jan 09, 2014 at 05:48:07PM +0000, M Brown wrote:
>>>> Kornilios,
>>>>
>>>> I believe that I have found a bug in the nameserver. The
>>>> problem appears to be in the use of the nameserver functions in
>>>> multiple threads running on a single core. In this case the
>>>> nameserver functions are non-reentrant. To illustrate this I
>>>> took your message example (xmpl-msg) and wrapped both the
>>>> client and server functions in separate threads and ran the
>>>> application on a single core. The same assertion fired as
>>>> illustrated below:
>>>>
>>>> The nameservice blocking lookup is invoked by the client
>>>> application near the beginning to get the iref. The server
>>>> then tries to register the iref in the nameserver and gets the
>>>> assertion.
>>>>
>>>> In general for the nameserver to be useful, it needs to be
>>>> accessible by any number of threads running on any number of
>>>> cores in a concurrent manner.
>>>
>>> It seems that you are using multiple threads (spanned domains) to
>>> access the messaging infrastructure of Barrelfish. As far as I
>>> know, spanned domains were implemented as a quick way to enable
>>> running shared-memory applications, and a significant part of the
>>> barrelfish infrastructure (e.g., messages) will not work
>>> correctly with them. To the best of my knowledge, the only thing
>>> that works reliably on a spanned domain is thread synchronization
>>> primitives.
>>>
>>> I'm afraid this is not an easy problem to fix. My suggestion
>>> would be to avoid spanned domains altogether. If this is not
>>> possible the only quick solution I can think of is protecting
>>> libbarrelfish invocations with a lock.
>>>
>>> cheers, Kornilios.
>>>
>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>>
>>>>
>>>> Mark Brown
>>>>
>>>> Huawei Technologies Inc.
>>>>
>>>> 5340 Legacy Dr., Suite 175
>>>>
>>>> Plano, TX 75024
>>>>
>>>> Tel: 469-277-5700 x5870
>>>>
>>>> Email: m.brown at huawei.com
>>>>
>>>>
>>>>
>>>> -----Original Message----- From: Kornilios Kourtis
>>>> [mailto:kornilios.kourtis at inf.ethz.ch] Sent: Thursday, December
>>>> 19, 2013 6:33 AM To: M Brown Cc:
>>>> barrelfish-users at lists.inf.ethz.ch Subject: Re:
>>>> [Barrelfish-users] Assertion fired when calling
>>>> "nameserver_register" function
>>>>
>>>>
>>>>
>>>> Dear Mark,
>>>>
>>>>
>>>>
>>>> On Mon, Dec 16, 2013 at 05:38:37PM +0000, M Brown wrote:
>>>>
>>>>> Guys,
>>>>
>>>>>
>>>>
>>>>> I’m getting the following assertion fired when calling the
>>>>
>>>>> nameserver_register function from within an export callback
>>>>
>>>>> function:
>>>>
>>>>>
>>>>
>>>>> [cid]
>>>>
>>>>>
>>>>
>>>>> The example message test works fine. The structure of the
>>>>> code I have
>>>>
>>>>> is as follows:
>>>>
>>>>>
>>>>
>>>>> main {
>>>>
>>>>> thread_create(myTask, NULL);
>>>>
>>>>> .
>>>>
>>>>> .
>>>>
>>>>> .
>>>>
>>>>> }
>>>>
>>>>>
>>>>
>>>>> int myTask(void* arg) {
>>>>
>>>>> .
>>>>
>>>>> .
>>>>
>>>>> .
>>>>
>>>>> <iface>_export(NULL,
>>>>
>>>>> export_cb, connect_cb,
>>>>
>>>>> get_default_waitset(),
>>>>
>>>>> IDC_EXPORT_FLAGS_DEFAULT);
>>>>
>>>>>
>>>>
>>>>> }
>>>>
>>>>>
>>>>
>>>>> void export_cb(void *st, errval_t err, iref_t iref) {
>>>>
>>>>> .
>>>>
>>>>> .
>>>>
>>>>> .
>>>>
>>>>> // The assertion fires within this invocation
>>>>
>>>>> nameserver_register(“iface”, iref); }
>>>>
>>>>>
>>>>
>>>>> Is there something I’m doing wrong here?
>>>>
>>>>
>>>>
>>>> [I'm guessing you mean nameservice_register() above]
>>>>
>>>>
>>>>
>>>> Judging from the failed assertion (!_rpc->rpc_in_progress),
>>>> I'm guessing that it might have something to do with using
>>>> multiple threads. What are the other threads doing? Can you
>>>> reproduce the problem when using a single thread?
>>>>
>>>>
>>>>
>>>> cheers,
>>>>
>>>> Kornilios.
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Kornilios Kourtis
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>>
>>>> Barrelfish-users mailing list
>>>>
>>>> Barrelfish-users at lists.inf.ethz.ch
>>>>
>>>> https://lists.inf.ethz.ch/mailman/listinfo/barrelfish-users
>>>>
>>>
>>>
>>>
>>> -- Kornilios Kourtis
>>> _______________________________________________ Barrelfish-users
>>> mailing list Barrelfish-users at lists.inf.ethz.ch
>>> https://lists.inf.ethz.ch/mailman/listinfo/barrelfish-users
>>
>> -- Kornilios Kourtis
>>
>> _______________________________________________ Barrelfish-users
>> mailing list Barrelfish-users at lists.inf.ethz.ch
>> https://lists.inf.ethz.ch/mailman/listinfo/barrelfish-users
>>



More information about the Barrelfish-users mailing list