[Barrelfish-users] A Weird Bug about Page Fault

Shi Jinghao jhshi at cs.hku.hk
Mon Dec 3 09:47:49 CET 2012


Hi Andrew,

Thanks for your reply. The two different exceptions you mentioned
is insightful  I tried your suggestion. But that does not help. The NaN
errors still occur. I also tried to put extra dummy float point operations
in page fault handler. And that does not help, either.

Thanks,
Jinghao

On Sun, Dec 2, 2012 at 2:06 AM, Andrew Baumann <Andrew.Baumann at microsoft.com
> wrote:

>  Hi Jinghao,****
>
> ** **
>
> I notice that the first time you use floating point in this program is
> when writing to the array. There should be two different exceptions raised
> and handled here: one for the page fault, and one for the first use of the
> floating point hardware (which we lazily context-switch). My guess is that
> the page-fault path, which is not heavily exercised, does not interact well
> with the floating point save/restore code.****
>
> ** **
>
> If you initialise the floating point hardware by doing some other floating
> point operations (or writing to a statically allocated variable)
> beforehand, does the problem go away?****
>
> ** **
>
> Andrew****
>
> ** **
>
> *From:* Shi Jinghao [mailto:jhshi at cs.hku.hk]
> *Sent:* Saturday, 1 December 2012 02:20
> *To:* barrelfish-users at lists.inf.ethz.ch
> *Subject:* [Barrelfish-users] A Weird Bug about Page Fault****
>
> ** **
>
> Hi,****
>
> ** **
>
> I've been developing a memory management library on Barrelfish (SCC).
> Recently I bumped into a very weird bug about page fault. I attached a
> minimal case (pgfault_test.tgz) that can reproduce this bug.****
>
> ** **
>
> The work flow of the test case is as simple as following:****
>
> ** **
>
> 1) Allocate an array of doubles as read-only, using frame_alloc and
> vspace_map_one_frame_attr (or pmap->f.map, this doesn't matter)****
>
> ** **
>
> 2) Initiate the array, this will generate page fault****
>
> ** **
>
> 3) In page fault handler, remap the faulted page as read-write, using
> pmap->f.modify_flags****
>
> ** **
>
> The weird thing is: the first touch of this array will not result in a
> proper value, but just NaN!****
>
> ** **
>
> I've conducted several runs and found the following:****
>
> ** **
>
> 1) This bug will occur when the array type is double or float. Everything
> is fine if it's a integer array.****
>
> ** **
>
> 2) Only the item that caused the page fault will end in a NaN value,
> others items are just fine. And this applies when the faulted be anywhere
> within that page, not just the page start.****
>
> ** **
>
> 3) If you assign each array value with a constant value (say 1.0), or a
> int/double variable, then all items will end up with a right value. It
> seems only when we assign a[i] with i (or any expression contains i) will
> produce this bug.****
>
> ** **
>
> I tested the attached code in release2012-05-25 (the revision I work on)
> and the latest revision (release2012-10-03).****
>
> ** **
>
> I've also composed a minimal test case in sccLinux (write_fault.c). It
> turns out that everything is all good. No annoying NaN values.****
>
> ** **
>
> This bug has bothered me for quite a few days. Really appreciate if
> someone can give a hint on this.****
>
> ** **
>
> Thanks,****
>
> Jinghao****
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.inf.ethz.ch/pipermail/barrelfish-users/attachments/20121203/fb59de59/attachment.html 


More information about the Barrelfish-users mailing list