[Barrelfish-users] UMP IPI on x86 (_64)

Fri Feb 22 11:53:17 CET 2013

Hi Kornilios,

Thanks, it looks like the buffer being too small is actually the cause 
for the bug i observed (i should have printed the srccore variable too, 
then it would have been more obvious...).

I changed

static char my_notify_page[BASE_PAGE_SIZE];

to

static char my_notify_page[NOTIFY_FIFO_BYTES * MAX_COREID];

and it works. Or let's say, it fails with another bug :-)

If IPIs are activated globally, it runs into the panic("FULL") on core 0 
while booting the 15th core. The error goes away when i put a printf in 
ipi_raise_notify, so i guess the fifo is just not dequeued fast enough. 
Also doubling the buffer size resolves the problem.

When i use my code that only enables IPIs on one channel (or use the 
double sized buffer), messages can be transmitted, but later on, my 
process freezes (but i guess there is another bug in my code...).

Cheers,
Lukas

On 02/22/2013 07:58 AM, Kornilios Kourtis wrote:
> Hi Lukas,
>
> On Thu, Feb 21, 2013 at 07:01:39PM +0100, Lukas Humbel wrote:
>> I'm trying to use the UMP_IPI flounder backend on x86_64, but it crashes
>> every time I send a message over the channel with a "read page fault due
>> to page not present".
>>
>> I added "ump_ipi" to optInterconnectDrivers in X86_64.hs, copy&pasted
>> the flounder generated code and changed it in such a way that my
>> specific connection uses ump_ipi (i also made sure that the ipi_init
>> routines are called and commented out some other #if
>> defined(CONFIG_FLOUNDER_BACKEND_UMP_IPI) ). I'm not sure if my
>> modifications are 100% correct, but the error it produces is the same as
>> if I activate it globally (by adding ump_ipi to optFlounderBackends).
>>
>> The error occurs on the receiving side of the channel after a message
>> has been sent in the function ipi_handle_notify at the line
>> assert(endpoints[val].cap.type != ObjType_Null); . I added a printf to
>> the function inside the while(fifo[slot] != 0) loop, and it seems that
>> val gets really big (133143986182) which then causes the fault (in the
>> second iteration of the loop).
>>
>> ipi_handle_notify
>> ipi_handle_notify line:93  val:1  slot: 0
>> ipi_handle_notify line:93  val:133143986182  slot: 0
>>
>>
>> Any ideas? This really seems strange to me, because the code does
>> basically:
>> ...
>> fifo[slot] = 0;  //first iteration, slot = 0
>> ...
>> val = fifo[slot]; //second iteration, slot = 0
>> val == 133143986182 ??
>  From a quick look:
>   - 133143986182 is 0x1f00000006
>   - AFAICT, the fifo[slot] is updated in ipi_raise_notify() using chanid
>     Maybe the intended value for chanid is 6, and the value is corrupted
>     somewhere along the way?
>
> BTW, ipi_notify seems to be broken for NCPUS > 64 assuming  a 4K page:
>   31 : #define NOTIFY_FIFO_BYTES       (8 * sizeof(uint64_t))
>   34 : static char my_notify_page[BASE_PAGE_SIZE];
>   81 : volatile uint64_t *fifo = (void *)&my_notify_page[NOTIFY_FIFO_BYTES*srccore];
>
> Maybe it's a good idea to add a static assertion, or just allocate the
> necessary number of pages based on MAX_COREID.
>
> cheers,
> Kornilios.
>