[Barrelfish-users] A Weird Bug about Page Fault

Shi Jinghao jhshi at cs.hku.hk
Tue Dec 4 11:16:21 CET 2012


Hi Simon,

Yes, I think so. But this bug didn't occur on sccLinux running on SCC (see
write_fault.c). So I suspect that some code in Barrelfish that deals with
exception don't behave right. But I really have no idea where to debug...

Can someone in the community who has access to SCC test the code? Many
thanks.

Jinghao

On Tue, Dec 4, 2012 at 4:45 PM, Simon Peter <speter at inf.ethz.ch> wrote:

> Hi Jinghao,
>
> It seems this is SCC specific. I just ran your test-case on QEMU on both
> x86-64 and -32 platforms and it seems to work just fine (i.e. I get the
> "all good" output).
>
> Simon
>
>
> On 12/03/2012 12:47 AM, Shi Jinghao wrote:
>
>> Hi Andrew,
>>
>> Thanks for your reply. The two different exceptions you mentioned is
>> insightful I tried your suggestion. But that does not help. The NaN
>> errors still occur. I also tried to put extra dummy float point
>> operations in page fault handler. And that does not help, either.
>>
>> Thanks,
>> Jinghao
>>
>> On Sun, Dec 2, 2012 at 2:06 AM, Andrew Baumann
>> <Andrew.Baumann at microsoft.com <mailto:Andrew.Baumann@**microsoft.com<Andrew.Baumann at microsoft.com>>>
>> wrote:
>>
>>     Hi Jinghao,____
>>
>>     __ __
>>
>>
>>     I notice that the first time you use floating point in this program
>>     is when writing to the array. There should be two different
>>     exceptions raised and handled here: one for the page fault, and one
>>     for the first use of the floating point hardware (which we lazily
>>     context-switch). My guess is that the page-fault path, which is not
>>     heavily exercised, does not interact well with the floating point
>>     save/restore code.____
>>
>>     __ __
>>
>>
>>     If you initialise the floating point hardware by doing some other
>>     floating point operations (or writing to a statically allocated
>>     variable) beforehand, does the problem go away?____
>>
>>     __ __
>>
>>     Andrew____
>>
>>     __ __
>>
>>     *From:* Shi Jinghao [mailto:jhshi at cs.hku.hk <mailto:jhshi at cs.hku.hk>]
>>     *Sent:* Saturday, 1 December 2012 02:20
>>     *To:* barrelfish-users at lists.inf.**ethz.ch<barrelfish-users at lists.inf.ethz.ch>
>>     <mailto:barrelfish-users@**lists.inf.ethz.ch<barrelfish-users at lists.inf.ethz.ch>
>> >
>>     *Subject:* [Barrelfish-users] A Weird Bug about Page Fault____
>>
>>     __ __
>>
>>     Hi,____
>>
>>     __ __
>>
>>
>>     I've been developing a memory management library on Barrelfish
>>     (SCC). Recently I bumped into a very weird bug about page fault. I
>>     attached a minimal case (pgfault_test.tgz) that can reproduce this
>>     bug.____
>>
>>     __ __
>>
>>     The work flow of the test case is as simple as following:____
>>
>>     __ __
>>
>>
>>     1) Allocate an array of doubles as read-only, using frame_alloc and
>>     vspace_map_one_frame_attr (or pmap->f.map, this doesn't matter)____
>>
>>     __ __
>>
>>     2) Initiate the array, this will generate page fault____
>>
>>     __ __
>>
>>
>>     3) In page fault handler, remap the faulted page as read-write,
>>     using pmap->f.modify_flags____
>>
>>     __ __
>>
>>
>>     The weird thing is: the first touch of this array will not result in
>>     a proper value, but just NaN!____
>>
>>     __ __
>>
>>     I've conducted several runs and found the following:____
>>
>>     __ __
>>
>>
>>     1) This bug will occur when the array type is double or float.
>>     Everything is fine if it's a integer array.____
>>
>>     __ __
>>
>>
>>     2) Only the item that caused the page fault will end in a NaN value,
>>     others items are just fine. And this applies when the faulted be
>>     anywhere within that page, not just the page start.____
>>
>>     __ __
>>
>>
>>     3) If you assign each array value with a constant value (say 1.0),
>>     or a int/double variable, then all items will end up with a right
>>     value. It seems only when we assign a[i] with i (or any expression
>>     contains i) will produce this bug.____
>>
>>     __ __
>>
>>
>>     I tested the attached code in release2012-05-25 (the revision I work
>>     on) and the latest revision (release2012-10-03).____
>>
>>     __ __
>>
>>
>>     I've also composed a minimal test case in sccLinux (write_fault.c).
>>     It turns out that everything is all good. No annoying NaN values.____
>>
>>     __ __
>>
>>
>>     This bug has bothered me for quite a few days. Really appreciate if
>>     someone can give a hint on this.____
>>
>>     __ __
>>
>>     Thanks,____
>>
>>     Jinghao____
>>
>>
>>
>>
>> ______________________________**_________________
>> Barrelfish-users mailing list
>> Barrelfish-users at lists.inf.**ethz.ch <Barrelfish-users at lists.inf.ethz.ch>
>> https://lists.inf.ethz.ch/**mailman/listinfo/barrelfish-**users<https://lists.inf.ethz.ch/mailman/listinfo/barrelfish-users>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.inf.ethz.ch/pipermail/barrelfish-users/attachments/20121204/cf4ee297/attachment-0001.html 


More information about the Barrelfish-users mailing list