[Barrelfish-users] Threads on different cores and timers lead to deadlock

Andrew Baumann Andrew.Baumann at microsoft.com
Thu Dec 20 10:16:54 CET 2012


Hi Lukas,

The first thing to note is that the timer library you're using (<timer/timer.h>) has been superseded by the deferred events (<barrelfish/deferred.h>), which should be used for any new code. (We should really get around to updating the existing clients of the timer library and removing it.) The old timer library relies on IDC to a driver to function, which we're realised was a bad idea, whereas the new deferred events code plugs into the waitset logic directly and runs off the kernel timer, so is more accurate and not prone to the type of deadlock that you're seeing.

To try to answer your questions: all the IDC mechanisms, including messages_wait_and_handle_next() which exists only for backwards compatibility and shouldn't be used in new code, are core-local. However, I suspect that by having the second core spin in while (1) you're probably preventing servicing of one of the internal event handlers needed by Flounder.

Andrew

-----Original Message-----
From: Lukas Humbel [mailto:humbell at ethz.ch] 
Sent: Wednesday, 19 December 2012 11:17
To: barrelfish-users at lists.inf.ethz.ch
Subject: [Barrelfish-users] Threads on different cores and timers lead to deadlock

Hi all,

I'm having problems avoiding a deadlock when using the timer code, 
especially the function timer_remaining. It blocks in 
messages_wait_and_handle_next. There is also a comment next to it:

// XXX: shouldn't block on default waitset! if we're in a callback we'll 
deadlock

I'm not calling the function from within a callback. But what seems 
related to it, is that I have set up a flounder binding just before, 
when I remove it, the problem vanishes.

Maybe this lock is just a symptom of a more fundamental problem so let 
me tell you what I'm trying to do:
I have one domain which spans to two cores and two threads (shared 
vspace): one on each core. The first thread exports a flounder 
interface, spins until a connection has been set up and enters the timer 
test (create, start, check remaining) and locks. The second thread 
connects to the interface, performs messages_wait_and_handle_next until 
the connection set up callback is called and enters a while(1){}. I'm 
using the default waitset everywhere, the flounder connection looks 
fine, as I can perform RPCs.

Is this setup possible with flounder? Or is there something that 
prevents flounder from working when client and server are using the same 
vspace?

Is messages_wait_and_handle_next "core aware" ? means if I call it on 
core 0 will it only receive messages for core 0? (As far as I can see 
from my debug output this seems to be true). (Just out of curiosity:) 
How about two threads on the same core?

Do you have any ideas why this lock happens and how to avoid it?


Cheers,
Lukas

_______________________________________________
Barrelfish-users mailing list
Barrelfish-users at lists.inf.ethz.ch
https://lists.inf.ethz.ch/mailman/listinfo/barrelfish-users






More information about the Barrelfish-users mailing list