[Barrelfish-users] Threads on different cores and timers lead to deadlock

Lukas Humbel humbell at ethz.ch
Thu Dec 20 15:03:38 CET 2012


Hi Andrew,

Thanks. I'll try to use deferred events and see if it works. If 
messages_wait_and_handle_next is deferred, what should I use? Just 
inline the function?

I already tried replacing the while(1); loop with something that handles 
the message, also no luck...

Lukas

On 12/20/2012 10:16 AM, Andrew Baumann wrote:
> Hi Lukas,
>
> The first thing to note is that the timer library you're using (<timer/timer.h>) has been superseded by the deferred events (<barrelfish/deferred.h>), which should be used for any new code. (We should really get around to updating the existing clients of the timer library and removing it.) The old timer library relies on IDC to a driver to function, which we're realised was a bad idea, whereas the new deferred events code plugs into the waitset logic directly and runs off the kernel timer, so is more accurate and not prone to the type of deadlock that you're seeing.
>
> To try to answer your questions: all the IDC mechanisms, including messages_wait_and_handle_next() which exists only for backwards compatibility and shouldn't be used in new code, are core-local. However, I suspect that by having the second core spin in while (1) you're probably preventing servicing of one of the internal event handlers needed by Flounder.
>
> Andrew
>
> -----Original Message-----
> From: Lukas Humbel [mailto:humbell at ethz.ch]
> Sent: Wednesday, 19 December 2012 11:17
> To: barrelfish-users at lists.inf.ethz.ch
> Subject: [Barrelfish-users] Threads on different cores and timers lead to deadlock
>
> Hi all,
>
> I'm having problems avoiding a deadlock when using the timer code,
> especially the function timer_remaining. It blocks in
> messages_wait_and_handle_next. There is also a comment next to it:
>
> // XXX: shouldn't block on default waitset! if we're in a callback we'll
> deadlock
>
> I'm not calling the function from within a callback. But what seems
> related to it, is that I have set up a flounder binding just before,
> when I remove it, the problem vanishes.
>
> Maybe this lock is just a symptom of a more fundamental problem so let
> me tell you what I'm trying to do:
> I have one domain which spans to two cores and two threads (shared
> vspace): one on each core. The first thread exports a flounder
> interface, spins until a connection has been set up and enters the timer
> test (create, start, check remaining) and locks. The second thread
> connects to the interface, performs messages_wait_and_handle_next until
> the connection set up callback is called and enters a while(1){}. I'm
> using the default waitset everywhere, the flounder connection looks
> fine, as I can perform RPCs.
>
> Is this setup possible with flounder? Or is there something that
> prevents flounder from working when client and server are using the same
> vspace?
>
> Is messages_wait_and_handle_next "core aware" ? means if I call it on
> core 0 will it only receive messages for core 0? (As far as I can see
> from my debug output this seems to be true). (Just out of curiosity:)
> How about two threads on the same core?
>
> Do you have any ideas why this lock happens and how to avoid it?
>
>
> Cheers,
> Lukas
>
> _______________________________________________
> Barrelfish-users mailing list
> Barrelfish-users at lists.inf.ethz.ch
> https://lists.inf.ethz.ch/mailman/listinfo/barrelfish-users
>
>
>




More information about the Barrelfish-users mailing list