[Barrelfish-users] Bindings

Georgios Varisteas yorgos at kth.se
Sat Dec 3 16:50:19 CET 2011


When a client application running on core 1 exits (main() returns the result) I get the following:

ERROR: monitor.1 in error_handler() ../src/lib/barrelfish/monitor_client.c:26
ERROR: asynchronous error in monitor binding
Failure: (         kernel) Capability not found (empty slot encountered) [SYS_ERR_CAP_NOT_FOUND]
Aborted
kernel 1: monitor terminated; expect badness!

My assumptions are two:
1) some queued message fails to deliver, not likely.
2) the client fails to uninitialize its bindings and/or other stuff

If the second case is true, how can I disconnect and unbind two apps? I can't find anything on that

cheers,
Georgios


________________________________________
From: Georgios Varisteas [yorgos at kth.se]
Sent: Friday, December 02, 2011 18:21
To: Baumann  Andrew; barrelfish-users at lists.inf.ethz.ch
Subject: Re: [Barrelfish-users] Bindings

Thank you Andrew. I thought that much and had that already implemented. By the way your and Tim's help was invaluable. My system finally executes fully. The cool thing is that you can just use existing WOOL apps totally unmodified; just have to include the library in the Hakefile.

Best regards,
Georgios

________________________________________
From: Baumann  Andrew [andrewb at inf.ethz.ch]
Sent: Friday, December 02, 2011 17:17
To: Georgios Varisteas; barrelfish-users at lists.inf.ethz.ch
Subject: RE: Bindings

You don't get any information about the other end of the connection on a bind request. If you need that, you'll need to setup the binding, and then exchange that in a handshake message.

I guess we could pass a word of information along with the bind request, but since the remote side could make this up anyway, there's little added value over deferring that to the user. We have discussed (and planned) making IREFs into actual capabilities, so in the future when we implement that we could securely deliver some information in the bind request that was encoded into the capability, but for now you get nothing, sorry :)

Andrew

-----Original Message-----
From: Georgios Varisteas [mailto:yorgos at kth.se]
Sent: Friday, 02 December, 2011 7:06
To: barrelfish-users at lists.inf.ethz.ch
Subject: Re: [Barrelfish-users] Bindings

I can't figure out how to identify who requested to bind in the connect callback function. The st argument is the same argument passed to export and I guess it's meant to identify which service is being binded, right? So, I thought b->st would be the argument passed for the continuation on the requester's side but it was uninitialized. Any hints?

--Georgios

PS: Thank you Tim for your input. I have finally have it working. I will also give THC a try really soon, it looks to be a much cleaner implementation.


________________________________________
From: Georgios Varisteas [yorgos at kth.se]
Sent: Friday, December 02, 2011 01:46
To: Baumann  Andrew
Cc: barrelfish-users at lists.inf.ethz.ch
Subject: Re: [Barrelfish-users] Bindings

The server is quite simple. The primary instance starts, if needed spawns other instances of itself and binds to each one. Each of the other instances does the same. After exchanging some configuration messages they all reach the code fragment I sent before and listen for events, or at least that's what they ought to do. Everything is bright and dandy up to this point.

Then I manually run a client app through fish. Now this is complicated. The part of the app that communicates with the server is actually a library. The actual app (one of many) is based on some arbitrary task based programming model so it owns main(). There the lib's init function [is|should be] called, which:
 - successfully finds the primary server's iref through the name service (nameservice_blocking_lookup())
 - and then tries to bind to it. but bind is an asynchronous call. So in order to block the single thread I have at this point until the reply comes I call messages_wait_and_handle_next().

At that point everything blocks.
If I omit messages_wait_and_handle_next() then the client continues normally but still, the reply to the binding never comes and so there is absolutely no communication with the server.

1) How else can I make bind a synchronous call?
2) How else can I make the server wait and listen?

btw I do use continuations almost everywhere else.

I really do not know what is wrong. I made an assumption before that it's the server that is not replying or not able to receive the bind request. It is the only thing that makes sense cause otherwise the client would have received the reply at some point of its execution (and I've tested inputs that take almost a full minute to finish). I assume that the callback function would actually run even if the client is doing something else, right?

Finally, somewhere between all that each instance of the server starts a timer (I do load the lpc_timer module) which never fires. This is my next obstacle. Is the duration argument in seconds or milliseconds (I do not think that this is specified anywhere is the code)?

This is a brief summary of a few thousand lines of code. The implementation I'm following is nothing different to code I've read in the tree and the docs. No real improvisations on my part.

--Georgios


________________________________________
From: Baumann  Andrew [andrewb at inf.ethz.ch]
Sent: Friday, December 02, 2011 01:03
To: Georgios Varisteas
Subject: RE: Bindings

I'm confused. I thought it was the client that was deadlocking, not the server. Can you sketch the code in the program that deadlocks, and where it gets stuck?

My point was that it is only safe/sane to have the waiting loop in main() or a function called directly from it, and not in any event handler.

Andrew

-----Original Message-----
From: Georgios Varisteas [mailto:yorgos at kth.se]
Sent: Thursday, 01 December, 2011 15:59
To: Baumann Andrew
Subject: RE: Bindings

Now I'm confused. My code in the server is the following (the main loop that waits for the client):

        while (1) {
                messages_wait_and_handle_next();
        }

This part is reach before the client binds to the server so according to what you wrote before I assume that it has blocked and can't dispatch the binding. However, if I expand the unwanted function I get exactly what you propose so I do not really see the difference or how it would solve my problem.

--Georgios

PS: fish also delays when I use oncore.
________________________________________
From: Baumann  Andrew [andrewb at inf.ethz.ch]
Sent: Friday, December 02, 2011 00:44
To: Georgios Varisteas; barrelfish-users at lists.inf.ethz.ch
Subject: RE: Bindings

The best way is to have a top-level message loop that never does anything other than run event handlers, i.e.:

main(...)
{
  // set stuff up, maybe initiate some bindings

  while (1) {
    event_dispatch(default_waitset);
  }
}

This way everything interesting happens in the context of an event handler, and you just have to hook them up appropriately. You never have to wait for an event to occur -- the code that runs after the event arrives should really be in a separate continuation function.


Regarding the spawn problem, did you check whether fish isn't just as slow when spawning your program on another core? None of those paths are optimised in any significant way, so there is lots of unnecessary work happening here. The remote cores are probably waiting for ramfs file IO and memory access, all of which are bottlenecked on servers on the BSP core in the default setup.

Andrew

-----Original Message-----
From: Georgios Varisteas [mailto:yorgos at kth.se]
Sent: Thursday, 01 December, 2011 15:34
To: Baumann Andrew; barrelfish-users at lists.inf.ethz.ch
Subject: RE: Bindings

Thank you Andrew, your answer clarified many things. However, if I ought to remove "messages_wait_and_handle_next()" what else can I use to have the program loop and listen for messages?

By the way there is one more point that is not so important right now. When I'm spawning  other instances (through spawn_program() as fish does) i get a significant delay when spawn_domain() is called (spawn_client.c:239). Any ideas why? I've tried to replicate what fish does and it is weird how it can spawn processes instantly but my program can't. By the way the other processes are always on other cores.


cheers,
Georgios

________________________________________
From: Baumann  Andrew [andrewb at inf.ethz.ch]
Sent: Friday, December 02, 2011 00:09
To: Georgios Varisteas; barrelfish-users at lists.inf.ethz.ch
Subject: RE: Bindings

Hi Georgios,

messages_wait_and_handle_next() is really a kludge for a legacy (and broken) IDC system, and shouldn't be used in any new code, although I'll be the first to admit that there are still too many references to it in the tree. If you look at the implementation, you'll see it's just a wrapper for "event_dispatch(get_default_waitset())". The problem with this is that it blocks indefinitely if there are no events to dispatch on the default waitset, and there's no guarantee on which event it does dispatch when it returns to you. So it's typically used in a loop testing some external condition, as in "while (!callback_has_run()) messages_wait_and_handle_next();" but even this only works when:

 1. There is only one thread dispatching the default waitset. If more than one thread dispatches the waitset, it might run the even that triggers callback_has_run(), while your loop blocks waiting for an event that will never arrive.

 2. The messages_wait_and_handle_next() doesn't happen in the context of an event handler. We don't support nested events on the same binding, so you will probably deadlock here.

That said, you do need to dispatch the default waitset for bindings to complete. If you want to go with a completely event-driven model, the clean thing to do is "stack-rip" your function where it blocks, so that all the logic that comes after the binding completes actually runs in the binding completion continuation. If you want to go with a threaded model, you could have one thread dispatching the waitset (and doing nothing else), and have the event handlers unblock the main thread (via mechanisms like semaphores or condition variables).

BTW, have you looked into using THC for your application? Is there a reason you can't do this? It makes a lot of this event machinery much nicer to program...

Andrew

-----Original Message-----
From: Georgios Varisteas [mailto:yorgos at kth.se]
Sent: Thursday, 01 December, 2011 8:02
To: barrelfish-users at lists.inf.ethz.ch
Subject: [Barrelfish-users] Bindings

Hi,

I'm really stuck on a problem cause I can't figure out its source. The bottom line is that a client freezes while waiting for a reply from the bind operation on the server's iref. Let me elaborate on it...

There is a primary instance of the server which spawns multiple other instances of itself as separate applications. Each of them gets a binding to each other, thus creating an interconnected distributed service. Communication between them provably works.

The client comes up, successfully retrieves the primary server's iref from the nameservice and executes the bind operation on it. Right afterwards it executes the "messages_wait_and_handle_next()" function to wait for the bind's reply. The reply's handler is set as the continuation to the bind call but it never executes. At that point everything freezes. If I omit the "messages_wait_and_handle_next()" then the client will proceed normally without ever having the bind's reply execute.

Any hints or ideas would be mostly welcome. Thanks.

--Georgios


_______________________________________________
Barrelfish-users mailing list
Barrelfish-users at lists.inf.ethz.ch
https://lists.inf.ethz.ch/mailman/listinfo/barrelfish-users

_______________________________________________
Barrelfish-users mailing list
Barrelfish-users at lists.inf.ethz.ch
https://lists.inf.ethz.ch/mailman/listinfo/barrelfish-users

_______________________________________________
Barrelfish-users mailing list
Barrelfish-users at lists.inf.ethz.ch
https://lists.inf.ethz.ch/mailman/listinfo/barrelfish-users

_______________________________________________
Barrelfish-users mailing list
Barrelfish-users at lists.inf.ethz.ch
https://lists.inf.ethz.ch/mailman/listinfo/barrelfish-users



More information about the Barrelfish-users mailing list