[Barrelfish-users] A Weird Bug about Page Fault

Andrew Baumann Andrew.Baumann at microsoft.com
Tue Dec 4 22:52:23 CET 2012


That's interesting... to my knowledge there's very little code that's unique to SCC and not shared with x86_32.

One likely the culprit is fpu_save and fpu_restore (from /include/arch/x86_32/barrelfish_kpi/asm_inlines_arch.h) which do fxsave and fxrstor on x86_32 but fnsave and frstror on SCC. Are we sure the two are equivalent?

It might also be helpful if someone could test on real x86_32 hardware, just to rule out qemu.

Andrew

From: jhshi89 at gmail.com [mailto:jhshi89 at gmail.com] On Behalf Of Shi Jinghao
Sent: Tuesday, 4 December 2012 02:16
To: Simon Peter
Cc: Andrew Baumann; barrelfish-users at lists.inf.ethz.ch
Subject: Re: [Barrelfish-users] A Weird Bug about Page Fault

Hi Simon,

Yes, I think so. But this bug didn't occur on sccLinux running on SCC (see write_fault.c). So I suspect that some code in Barrelfish that deals with exception don't behave right. But I really have no idea where to debug...

Can someone in the community who has access to SCC test the code? Many thanks.

Jinghao
On Tue, Dec 4, 2012 at 4:45 PM, Simon Peter <speter at inf.ethz.ch<mailto:speter at inf.ethz.ch>> wrote:
Hi Jinghao,

It seems this is SCC specific. I just ran your test-case on QEMU on both x86-64 and -32 platforms and it seems to work just fine (i.e. I get the "all good" output).

Simon


On 12/03/2012 12:47 AM, Shi Jinghao wrote:
Hi Andrew,

Thanks for your reply. The two different exceptions you mentioned is
insightful I tried your suggestion. But that does not help. The NaN
errors still occur. I also tried to put extra dummy float point
operations in page fault handler. And that does not help, either.

Thanks,
Jinghao

On Sun, Dec 2, 2012 at 2:06 AM, Andrew Baumann
<Andrew.Baumann at microsoft.com<mailto:Andrew.Baumann at microsoft.com> <mailto:Andrew.Baumann at microsoft.com<mailto:Andrew.Baumann at microsoft.com>>> wrote:

    Hi Jinghao,____

    __ __


    I notice that the first time you use floating point in this program
    is when writing to the array. There should be two different
    exceptions raised and handled here: one for the page fault, and one
    for the first use of the floating point hardware (which we lazily
    context-switch). My guess is that the page-fault path, which is not
    heavily exercised, does not interact well with the floating point
    save/restore code.____

    __ __


    If you initialise the floating point hardware by doing some other
    floating point operations (or writing to a statically allocated
    variable) beforehand, does the problem go away?____

    __ __

    Andrew____

    __ __

    *From:* Shi Jinghao [mailto:jhshi at cs.hku.hk<mailto:jhshi at cs.hku.hk> <mailto:jhshi at cs.hku.hk<mailto:jhshi at cs.hku.hk>>]
    *Sent:* Saturday, 1 December 2012 02:20
    *To:* barrelfish-users at lists.inf.ethz.ch<mailto:barrelfish-users at lists.inf.ethz.ch>
    <mailto:barrelfish-users at lists.inf.ethz.ch<mailto:barrelfish-users at lists.inf.ethz.ch>>
    *Subject:* [Barrelfish-users] A Weird Bug about Page Fault____

    __ __

    Hi,____

    __ __


    I've been developing a memory management library on Barrelfish
    (SCC). Recently I bumped into a very weird bug about page fault. I
    attached a minimal case (pgfault_test.tgz) that can reproduce this
    bug.____

    __ __

    The work flow of the test case is as simple as following:____

    __ __


    1) Allocate an array of doubles as read-only, using frame_alloc and
    vspace_map_one_frame_attr (or pmap->f.map, this doesn't matter)____

    __ __

    2) Initiate the array, this will generate page fault____

    __ __


    3) In page fault handler, remap the faulted page as read-write,
    using pmap->f.modify_flags____

    __ __


    The weird thing is: the first touch of this array will not result in
    a proper value, but just NaN!____

    __ __

    I've conducted several runs and found the following:____

    __ __


    1) This bug will occur when the array type is double or float.
    Everything is fine if it's a integer array.____

    __ __


    2) Only the item that caused the page fault will end in a NaN value,
    others items are just fine. And this applies when the faulted be
    anywhere within that page, not just the page start.____

    __ __


    3) If you assign each array value with a constant value (say 1.0),
    or a int/double variable, then all items will end up with a right
    value. It seems only when we assign a[i] with i (or any expression
    contains i) will produce this bug.____

    __ __


    I tested the attached code in release2012-05-25 (the revision I work
    on) and the latest revision (release2012-10-03).____

    __ __


    I've also composed a minimal test case in sccLinux (write_fault.c).
    It turns out that everything is all good. No annoying NaN values.____

    __ __


    This bug has bothered me for quite a few days. Really appreciate if
    someone can give a hint on this.____

    __ __

    Thanks,____

    Jinghao____




_______________________________________________
Barrelfish-users mailing list
Barrelfish-users at lists.inf.ethz.ch<mailto:Barrelfish-users at lists.inf.ethz.ch>
https://lists.inf.ethz.ch/mailman/listinfo/barrelfish-users


-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.inf.ethz.ch/pipermail/barrelfish-users/attachments/20121204/00451c1a/attachment-0001.html 


More information about the Barrelfish-users mailing list