[Barrelfish-users] A Weird Bug about Page Fault

Andrew Baumann Andrew.Baumann at microsoft.com
Sat Dec 8 00:38:17 CET 2012


Hi,

Much as we'd like to help, it's tough when it requires an SCC to reproduce. Can you perhaps restructure your code to avoid the page fault handler (e.g. pre-allocate/pre-fault the memory)?

Andrew

From: Shi Jinghao [mailto:jhshi at cs.hku.hk]
Sent: Thursday, 6 December 2012 01:33
To: barrelfish-users at lists.inf.ethz.ch
Subject: Re: [Barrelfish-users] A Weird Bug about Page Fault

Hi,

Are there any progresses on this issue? I've been tracing kernel's page fault path for a while but found no clue yet. It's quite frustrating since this bug has prevented me doing any kind of benchmarks that has double/float data type...

Jinghao

On Wed, Dec 5, 2012 at 5:52 AM, Andrew Baumann <Andrew.Baumann at microsoft.com<mailto:Andrew.Baumann at microsoft.com>> wrote:
That's interesting... to my knowledge there's very little code that's unique to SCC and not shared with x86_32.

One likely the culprit is fpu_save and fpu_restore (from /include/arch/x86_32/barrelfish_kpi/asm_inlines_arch.h) which do fxsave and fxrstor on x86_32 but fnsave and frstror on SCC. Are we sure the two are equivalent?

It might also be helpful if someone could test on real x86_32 hardware, just to rule out qemu.

Andrew

From: jhshi89 at gmail.com<mailto:jhshi89 at gmail.com> [mailto:jhshi89 at gmail.com<mailto:jhshi89 at gmail.com>] On Behalf Of Shi Jinghao
Sent: Tuesday, 4 December 2012 02:16
To: Simon Peter
Cc: Andrew Baumann; barrelfish-users at lists.inf.ethz.ch<mailto:barrelfish-users at lists.inf.ethz.ch>
Subject: Re: [Barrelfish-users] A Weird Bug about Page Fault

Hi Simon,

Yes, I think so. But this bug didn't occur on sccLinux running on SCC (see write_fault.c). So I suspect that some code in Barrelfish that deals with exception don't behave right. But I really have no idea where to debug...

Can someone in the community who has access to SCC test the code? Many thanks.

Jinghao
On Tue, Dec 4, 2012 at 4:45 PM, Simon Peter <speter at inf.ethz.ch<mailto:speter at inf.ethz.ch>> wrote:
Hi Jinghao,

It seems this is SCC specific. I just ran your test-case on QEMU on both x86-64 and -32 platforms and it seems to work just fine (i.e. I get the "all good" output).

Simon


On 12/03/2012 12:47 AM, Shi Jinghao wrote:
Hi Andrew,

Thanks for your reply. The two different exceptions you mentioned is
insightful I tried your suggestion. But that does not help. The NaN
errors still occur. I also tried to put extra dummy float point
operations in page fault handler. And that does not help, either.

Thanks,
Jinghao

On Sun, Dec 2, 2012 at 2:06 AM, Andrew Baumann
<Andrew.Baumann at microsoft.com<mailto:Andrew.Baumann at microsoft.com> <mailto:Andrew.Baumann at microsoft.com<mailto:Andrew.Baumann at microsoft.com>>> wrote:

    Hi Jinghao,____

    __ __


    I notice that the first time you use floating point in this program
    is when writing to the array. There should be two different
    exceptions raised and handled here: one for the page fault, and one
    for the first use of the floating point hardware (which we lazily
    context-switch). My guess is that the page-fault path, which is not
    heavily exercised, does not interact well with the floating point
    save/restore code.____

    __ __


    If you initialise the floating point hardware by doing some other
    floating point operations (or writing to a statically allocated
    variable) beforehand, does the problem go away?____

    __ __

    Andrew____

    __ __

    *From:* Shi Jinghao [mailto:jhshi at cs.hku.hk<mailto:jhshi at cs.hku.hk> <mailto:jhshi at cs.hku.hk<mailto:jhshi at cs.hku.hk>>]
    *Sent:* Saturday, 1 December 2012 02:20
    *To:* barrelfish-users at lists.inf.ethz.ch<mailto:barrelfish-users at lists.inf.ethz.ch>
    <mailto:barrelfish-users at lists.inf.ethz.ch<mailto:barrelfish-users at lists.inf.ethz.ch>>
    *Subject:* [Barrelfish-users] A Weird Bug about Page Fault____

    __ __

    Hi,____

    __ __


    I've been developing a memory management library on Barrelfish
    (SCC). Recently I bumped into a very weird bug about page fault. I
    attached a minimal case (pgfault_test.tgz) that can reproduce this
    bug.____

    __ __

    The work flow of the test case is as simple as following:____

    __ __


    1) Allocate an array of doubles as read-only, using frame_alloc and
    vspace_map_one_frame_attr (or pmap->f.map, this doesn't matter)____

    __ __

    2) Initiate the array, this will generate page fault____

    __ __


    3) In page fault handler, remap the faulted page as read-write,
    using pmap->f.modify_flags____

    __ __


    The weird thing is: the first touch of this array will not result in
    a proper value, but just NaN!____

    __ __

    I've conducted several runs and found the following:____

    __ __


    1) This bug will occur when the array type is double or float.
    Everything is fine if it's a integer array.____

    __ __


    2) Only the item that caused the page fault will end in a NaN value,
    others items are just fine. And this applies when the faulted be
    anywhere within that page, not just the page start.____

    __ __


    3) If you assign each array value with a constant value (say 1.0),
    or a int/double variable, then all items will end up with a right
    value. It seems only when we assign a[i] with i (or any expression
    contains i) will produce this bug.____

    __ __


    I tested the attached code in release2012-05-25 (the revision I work
    on) and the latest revision (release2012-10-03).____

    __ __


    I've also composed a minimal test case in sccLinux (write_fault.c).
    It turns out that everything is all good. No annoying NaN values.____

    __ __


    This bug has bothered me for quite a few days. Really appreciate if
    someone can give a hint on this.____

    __ __

    Thanks,____

    Jinghao____




_______________________________________________
Barrelfish-users mailing list
Barrelfish-users at lists.inf.ethz.ch<mailto:Barrelfish-users at lists.inf.ethz.ch>
https://lists.inf.ethz.ch/mailman/listinfo/barrelfish-users



-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.inf.ethz.ch/pipermail/barrelfish-users/attachments/20121207/f62211d1/attachment-0001.html 


More information about the Barrelfish-users mailing list