[Barrelfish-users] UMP IPI on x86 (_64)
humbell at ethz.ch
Fri Feb 22 11:53:17 CET 2013
Thanks, it looks like the buffer being too small is actually the cause
for the bug i observed (i should have printed the srccore variable too,
then it would have been more obvious...).
static char my_notify_page[BASE_PAGE_SIZE];
static char my_notify_page[NOTIFY_FIFO_BYTES * MAX_COREID];
and it works. Or let's say, it fails with another bug :-)
If IPIs are activated globally, it runs into the panic("FULL") on core 0
while booting the 15th core. The error goes away when i put a printf in
ipi_raise_notify, so i guess the fifo is just not dequeued fast enough.
Also doubling the buffer size resolves the problem.
When i use my code that only enables IPIs on one channel (or use the
double sized buffer), messages can be transmitted, but later on, my
process freezes (but i guess there is another bug in my code...).
On 02/22/2013 07:58 AM, Kornilios Kourtis wrote:
> Hi Lukas,
> On Thu, Feb 21, 2013 at 07:01:39PM +0100, Lukas Humbel wrote:
>> I'm trying to use the UMP_IPI flounder backend on x86_64, but it crashes
>> every time I send a message over the channel with a "read page fault due
>> to page not present".
>> I added "ump_ipi" to optInterconnectDrivers in X86_64.hs, copy&pasted
>> the flounder generated code and changed it in such a way that my
>> specific connection uses ump_ipi (i also made sure that the ipi_init
>> routines are called and commented out some other #if
>> defined(CONFIG_FLOUNDER_BACKEND_UMP_IPI) ). I'm not sure if my
>> modifications are 100% correct, but the error it produces is the same as
>> if I activate it globally (by adding ump_ipi to optFlounderBackends).
>> The error occurs on the receiving side of the channel after a message
>> has been sent in the function ipi_handle_notify at the line
>> assert(endpoints[val].cap.type != ObjType_Null); . I added a printf to
>> the function inside the while(fifo[slot] != 0) loop, and it seems that
>> val gets really big (133143986182) which then causes the fault (in the
>> second iteration of the loop).
>> ipi_handle_notify line:93 val:1 slot: 0
>> ipi_handle_notify line:93 val:133143986182 slot: 0
>> Any ideas? This really seems strange to me, because the code does
>> fifo[slot] = 0; //first iteration, slot = 0
>> val = fifo[slot]; //second iteration, slot = 0
>> val == 133143986182 ??
> From a quick look:
> - 133143986182 is 0x1f00000006
> - AFAICT, the fifo[slot] is updated in ipi_raise_notify() using chanid
> Maybe the intended value for chanid is 6, and the value is corrupted
> somewhere along the way?
> BTW, ipi_notify seems to be broken for NCPUS > 64 assuming a 4K page:
> 31 : #define NOTIFY_FIFO_BYTES (8 * sizeof(uint64_t))
> 34 : static char my_notify_page[BASE_PAGE_SIZE];
> 81 : volatile uint64_t *fifo = (void *)&my_notify_page[NOTIFY_FIFO_BYTES*srccore];
> Maybe it's a good idea to add a static assertion, or just allocate the
> necessary number of pages based on MAX_COREID.
More information about the Barrelfish-users