[Barrelfish-users] Bidirectional bulk transfer

Baumann Andrew andrewb at inf.ethz.ch
Mon Apr 23 17:50:10 CEST 2012


Hi Zeus,

This doesn't make much sense to me... the part where they bind to each other is at the very end of your log:
bulktest.1: lmp TX monitor.bind_ump_client_request
monitor.1: lmp RX monitor.bind_ump_client_request
monitor.0: lmp TX monitor.bind_ump_service_request
bulktest.0: lmp RX monitor.bind_ump_service_request
bulktest.0: service got a connection!
bulktest.0: lmp TX monitor.bind_ump_reply_monitor
monitor.0: lmp RX monitor.bind_ump_reply_monitor
monitor.1: lmp TX monitor.bind_ump_reply_client
bulktest.0: START
bulktest.1: lmp RX monitor.bind_ump_reply_client
bulktest.1: 0x804e3020 client bound!
bulktest.1: START

... at this point, I would expect at least one of them to request the monitor for an out-of-band cap transfer, but apparently that never happens?

Since you seem to have a workaround, I will try to reconstruct a test case for this after the OSDI deadline and debug it at this end.

Andrew

From: zeus at aluzina.org [mailto:zeus at aluzina.org] On Behalf Of Zeus Gómez Marmolejo
Sent: Monday, 23 April 2012 07:10
To: Baumann Andrew
Cc: barrelfish-users at lists.inf.ethz.ch
Subject: Re: [Barrelfish-users] Bidirectional bulk transfer

Ok,

I tried to circunvent it by not sending the caps at the same time... With the NULL_CAP still not working... Here is the output with FLOUNDER_DEBUG set to 1 on the monitor binding and the bulktest:

zeus.

spawnd.1: spawning /x86_64/sbin/bulktest on core 1
bulktest.0: lmp TX monitor.alloc_iref_request
bulktest.1: lmp TX monitor.get_monitor_rpc_iref_request
monitor.0: lmp RX monitor.alloc_iref_request
monitor.1: lmp RX monitor.get_monitor_rpc_iref_request
monitor.0: lmp TX monitor.alloc_iref_reply
monitor.1: lmp TX monitor.get_monitor_rpc_iref_reply
bulktest.0: lmp RX monitor.alloc_iref_reply
bulktest.1: lmp RX monitor.get_monitor_rpc_iref_reply
bulktest.0: service exported at iref 21
bulktest.1: lmp TX monitor.bind_lmp_client_request
monitor.1: lmp RX monitor.bind_lmp_client_request
monitor.1: lmp TX monitor.bind_lmp_reply_client
bulktest.1: lmp RX monitor.bind_lmp_reply_client
bulktest.1: lmp TX monitor.get_mem_iref_request
monitor.1: lmp RX monitor.get_mem_iref_request
monitor.1: lmp TX monitor.get_mem_iref_reply
bulktest.1: lmp RX monitor.get_mem_iref_reply
bulktest.1: lmp TX monitor.bind_lmp_client_request
monitor.1: lmp RX monitor.bind_lmp_client_request
monitor.1: lmp TX monitor.bind_lmp_reply_client
bulktest.1: lmp RX monitor.bind_lmp_reply_client
bulktest.1: lmp TX monitor.new_monitor_binding_request
monitor.1: lmp RX monitor.new_monitor_binding_request
monitor.1: lmp TX monitor.new_monitor_binding_reply
bulktest.1: lmp RX monitor.new_monitor_binding_reply
bulktest.1: lmp TX monitor.bind_ump_client_request
monitor.1: lmp RX monitor.bind_ump_client_request
monitor.0: lmp TX monitor.bind_ump_service_request
mem_serv.0: lmp RX monitor.bind_ump_service_request
mem_serv.0: lmp TX monitor.bind_ump_reply_monitor
monitor.0: lmp RX monitor.bind_ump_reply_monitor
monitor.1: lmp TX monitor.bind_ump_reply_client
bulktest.1: lmp RX monitor.bind_ump_reply_client
bulktest.1: lmp TX monitor.get_name_iref_request
monitor.1: lmp RX monitor.get_name_iref_request
monitor.1: lmp TX monitor.get_name_iref_reply
bulktest.1: lmp RX monitor.get_name_iref_reply
bulktest.1: lmp TX monitor.bind_lmp_client_request
monitor.1: lmp RX monitor.bind_lmp_client_request
monitor.1: lmp TX monitor.bind_lmp_reply_client
bulktest.1: lmp RX monitor.bind_lmp_reply_client
mem_serv.0: lmp TX monitor.cap_send_request
monitor.0: lmp RX monitor.cap_send_request
monitor.1: lmp TX monitor.cap_receive_request
bulktest.1: lmp RX monitor.cap_receive_request
bulktest.1: lmp TX monitor.new_monitor_binding_request
monitor.1: lmp RX monitor.new_monitor_binding_request
monitor.1: lmp TX monitor.new_monitor_binding_reply
bulktest.1: lmp RX monitor.new_monitor_binding_reply
mem_serv.0: lmp TX monitor.cap_send_request
monitor.0: lmp RX monitor.cap_send_request
monitor.1: lmp TX monitor.cap_receive_request
bulktest.1: lmp RX monitor.cap_receive_request
bulktest.1: lmp TX monitor.bind_ump_client_request
monitor.1: lmp RX monitor.bind_ump_client_request
monitor.0: lmp TX monitor.bind_ump_service_request
chips.0: lmp RX monitor.bind_ump_service_request
chips.0: lmp TX monitor.bind_ump_reply_monitor
monitor.0: lmp RX monitor.bind_ump_reply_monitor
monitor.1: lmp TX monitor.bind_ump_reply_client
bulktest.1: lmp RX monitor.bind_ump_reply_client
bulktest.1: lmp TX monitor.bind_lmp_client_request
monitor.1: lmp RX monitor.bind_lmp_client_request
monitor.1: lmp TX monitor.bind_lmp_reply_client
bulktest.1: lmp TX monitor.alloc_iref_request
monitor.1: lmp RX monitor.alloc_iref_request
monitor.1: lmp TX monitor.alloc_iref_reply
bulktest.1: lmp RX monitor.bind_lmp_reply_client
mem_serv.0: lmp TX monitor.cap_send_request
monitor.0: lmp RX monitor.cap_send_request
monitor.1: lmp TX monitor.cap_receive_request
bulktest.1: lmp RX monitor.cap_receive_request
mem_serv.0: lmp TX monitor.cap_send_request
monitor.0: lmp RX monitor.cap_send_request
monitor.1: lmp TX monitor.cap_receive_request
bulktest.1: lmp RX monitor.cap_receive_request
bulktest.1: lmp TX monitor.bind_ump_client_request
monitor.1: lmp RX monitor.bind_ump_client_request
bulktest.1: lmp RX monitor.alloc_iref_reply
monitor.0: lmp TX monitor.bind_ump_service_request
serial.0: lmp RX monitor.bind_ump_service_request
serial.0: lmp TX monitor.bind_ump_reply_monitor
monitor.0: lmp RX monitor.bind_ump_reply_monitor
monitor.1: lmp TX monitor.bind_ump_reply_client
mem_serv.0: lmp TX monitor.cap_send_request
monitor.0: lmp RX monitor.cap_send_request
monitor.1: lmp TX monitor.cap_receive_request
bulktest.1: lmp RX monitor.cap_receive_request
mem_serv.0: lmp TX monitor.cap_send_request
monitor.0: lmp RX monitor.cap_send_request
monitor.1: lmp TX monitor.cap_receive_request
bulktest.1: lmp RX monitor.cap_receive_request
bulktest.1: client looking up 'bulktest' in name service...
bulktest.1: client binding request to 21...
bulktest.1: lmp TX monitor.bind_lmp_client_request
monitor.1: lmp RX monitor.bind_lmp_client_request
monitor.1: lmp TX monitor.bind_lmp_reply_client
bulktest.1: lmp RX monitor.bind_ump_reply_client
bulktest.1: lmp RX monitor.bind_lmp_reply_client
mem_serv.0: lmp TX monitor.cap_send_request
monitor.0: lmp RX monitor.cap_send_request
monitor.1: lmp TX monitor.cap_receive_request
bulktest.1: lmp RX monitor.cap_receive_request
bulktest.1: lmp TX monitor.bind_ump_client_request
monitor.1: lmp RX monitor.bind_ump_client_request
monitor.0: lmp TX monitor.bind_ump_service_request
bulktest.0: lmp RX monitor.bind_ump_service_request
bulktest.0: service got a connection!
bulktest.0: lmp TX monitor.bind_ump_reply_monitor
monitor.0: lmp RX monitor.bind_ump_reply_monitor
monitor.1: lmp TX monitor.bind_ump_reply_client
bulktest.0: START
bulktest.1: lmp RX monitor.bind_ump_reply_client
bulktest.1: 0x804e3020 client bound!
bulktest.1: START

Here the output stops...


El 23 de abril de 2012 06:13, Baumann Andrew <andrewb at inf.ethz.ch<mailto:andrewb at inf.ethz.ch>> escribió:
Hi Zeus,

Sorry, I've been busy with a deadline and dropped the ball on this one. Did you track down where it hangs? Which messages are sent and received when it hangs? Does it still hang if you just send a NULL_CAP instead of the frame cap?

If you can make the code as simple as possible, then compile the following files with FLOUNDER_DEBUG defined to 1, and send me the output from your program, I may be able to make some sense of it.

x86_64/lib/barrelfish/_for_lib_barrelfish/monitor_flounder_extra_bindings.c
x86_64/usr/.../bulktest/_for_app_bulktest/bulkbench_flounder_bindings.c

Thanks,
Andrew

From: zeus at aluzina.org<mailto:zeus at aluzina.org> [mailto:zeus at aluzina.org<mailto:zeus at aluzina.org>] On Behalf Of Zeus Gómez Marmolejo
Sent: Wednesday, 18 April 2012 04:56

To: Baumann Andrew
Cc: barrelfish-users at lists.inf.ethz.ch<mailto:barrelfish-users at lists.inf.ethz.ch>
Subject: Re: [Barrelfish-users] Bidirectional bulk transfer

mmm

it's interesting. You are right. If we remove all the bulk transport it also hangs... But it only happens if both of them are sending the cap. If only server or client sends it, it goes well...

zeus.
El 17 de abril de 2012 20:39, Baumann Andrew <andrewb at inf.ethz.ch<mailto:andrewb at inf.ethz.ch>> escribió:
Hi,

Let's try to make this as simple as possible. What happens if you remove all the bulk transport stuff and just call frame_alloc() on each side and send the cap? Do both caps arrive? Do the replies make it back? I'd like to understand more of the state of the system when it wedges...

Thanks,
Andrew

From: zeus at aluzina.org<mailto:zeus at aluzina.org> [mailto:zeus at aluzina.org<mailto:zeus at aluzina.org>] On Behalf Of Zeus Gómez Marmolejo
Sent: Tuesday, 17 April 2012 11:36
To: Baumann Andrew
Cc: barrelfish-users at lists.inf.ethz.ch<mailto:barrelfish-users at lists.inf.ethz.ch>
Subject: Re: [Barrelfish-users] Bidirectional bulk transfer

Hey Andrew,

Yes, sorry, I was building the easy example quite fast, I forgot that. In any case if you change the line

while(!request_done) {

by

while(1) {

the result is the same. Messages are not being delivered.

zeus.
El 17 de abril de 2012 19:27, Baumann Andrew <andrewb at inf.ethz.ch<mailto:andrewb at inf.ethz.ch>> escribió:
Hi Zeus,

Perhaps this is a bug arising only from over-simplification in the example, but it looks to me like the problem is here:

    request_done = false;
    while (!request_done) {
        event_dispatch(get_default_waitset());
    }

    debug_printf("DONE\n");

    return 0;

Whenever either server or client sees their request completed they exit. There is no guarantee that they have seen and responded to the other side's request. If you instead loop while (1), does the problem still arise?

Andrew

From: Zeus Gómez Marmolejo [mailto:zeus.gomez at bsc.es<mailto:zeus.gomez at bsc.es>]
Sent: Tuesday, 17 April 2012 08:59
To: barrelfish-users at lists.inf.ethz.ch<mailto:barrelfish-users at lists.inf.ethz.ch>
Subject: [Barrelfish-users] Bidirectional bulk transfer

Hi,

I'm currently testing the bulk transfer implementation and I'm having some issues. As I want to be able to send from core 0 to core 1 and from 1 to 0 large amounts of data, I'm creating two bulk memory regions. One master to core 0 and another master to core 1. When creating this link there are two caprefs sent, one from core 0 to 1 and the other on the opposite way. In this case, the test hangs. When there is only one being sent, the example works well.

Do you have any clues about this? I attach the example.

--
Zeus Gómez Marmolejo
Barcelona Supercomputing Center
PhD student
http://www.bsc.es



--
Zeus Gómez Marmolejo
Barcelona Supercomputing Center
PhD student
http://www.bsc.es



--
Zeus Gómez Marmolejo
Barcelona Supercomputing Center
PhD student
http://www.bsc.es



--
Zeus Gómez Marmolejo
Barcelona Supercomputing Center
PhD student
http://www.bsc.es

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.inf.ethz.ch/pipermail/barrelfish-users/attachments/20120423/7a42c4b3/attachment-0001.html 


More information about the Barrelfish-users mailing list