[Barrelfish-users] A Weird Bug about Page Fault

Shi Jinghao jhshi at cs.hku.hk
Thu Dec 6 10:32:43 CET 2012


Hi,

Are there any progresses on this issue? I've been tracing kernel's page
fault path for a while but found no clue yet. It's quite frustrating since
this bug has prevented me doing any kind of benchmarks that has
double/float data type...

Jinghao


On Wed, Dec 5, 2012 at 5:52 AM, Andrew Baumann <Andrew.Baumann at microsoft.com
> wrote:

>  That’s interesting… to my knowledge there’s very little code that’s
> unique to SCC and not shared with x86_32. ****
>
> ** **
>
> One likely the culprit is fpu_save and fpu_restore (from
> /include/arch/x86_32/barrelfish_kpi/asm_inlines_arch.h) which do fxsave and
> fxrstor on x86_32 but fnsave and frstror on SCC. Are we sure the two are
> equivalent?****
>
> ** **
>
> It might also be helpful if someone could test on real x86_32 hardware,
> just to rule out qemu.****
>
> ** **
>
> Andrew****
>
> ** **
>
> *From:* jhshi89 at gmail.com [mailto:jhshi89 at gmail.com] *On Behalf Of *Shi
> Jinghao
> *Sent:* Tuesday, 4 December 2012 02:16
> *To:* Simon Peter
> *Cc:* Andrew Baumann; barrelfish-users at lists.inf.ethz.ch
> *Subject:* Re: [Barrelfish-users] A Weird Bug about Page Fault****
>
> ** **
>
> Hi Simon,****
>
> ** **
>
> Yes, I think so. But this bug didn't occur on sccLinux running on SCC (see
> write_fault.c). So I suspect that some code in Barrelfish that deals with
> exception don't behave right. But I really have no idea where to debug...*
> ***
>
> ** **
>
> Can someone in the community who has access to SCC test the code? Many
> thanks.****
>
> ** **
>
> Jinghao****
>
> On Tue, Dec 4, 2012 at 4:45 PM, Simon Peter <speter at inf.ethz.ch> wrote:***
> *
>
> Hi Jinghao,
>
> It seems this is SCC specific. I just ran your test-case on QEMU on both
> x86-64 and -32 platforms and it seems to work just fine (i.e. I get the
> "all good" output).
>
> Simon****
>
>
>
> On 12/03/2012 12:47 AM, Shi Jinghao wrote:****
>
>  Hi Andrew,
>
> Thanks for your reply. The two different exceptions you mentioned is
> insightful I tried your suggestion. But that does not help. The NaN
> errors still occur. I also tried to put extra dummy float point
> operations in page fault handler. And that does not help, either.
>
> Thanks,
> Jinghao
>
> On Sun, Dec 2, 2012 at 2:06 AM, Andrew Baumann****
>
> <Andrew.Baumann at microsoft.com <mailto:Andrew.Baumann at microsoft.com>>
> wrote:
>
>     Hi Jinghao,____
>
>     __ __****
>
>
>
>     I notice that the first time you use floating point in this program
>     is when writing to the array. There should be two different
>     exceptions raised and handled here: one for the page fault, and one
>     for the first use of the floating point hardware (which we lazily
>     context-switch). My guess is that the page-fault path, which is not
>     heavily exercised, does not interact well with the floating point****
>
>     save/restore code.____
>
>     __ __****
>
>
>
>     If you initialise the floating point hardware by doing some other
>     floating point operations (or writing to a statically allocated****
>
>     variable) beforehand, does the problem go away?____
>
>     __ __
>
>     Andrew____
>
>     __ __
>
>     *From:* Shi Jinghao [mailto:jhshi at cs.hku.hk <mailto:jhshi at cs.hku.hk>]
>     *Sent:* Saturday, 1 December 2012 02:20
>     *To:* barrelfish-users at lists.inf.ethz.ch
>     <mailto:barrelfish-users at lists.inf.ethz.ch>
>     *Subject:* [Barrelfish-users] A Weird Bug about Page Fault____
>
>     __ __
>
>     Hi,____
>
>     __ __****
>
>
>
>     I've been developing a memory management library on Barrelfish
>     (SCC). Recently I bumped into a very weird bug about page fault. I
>     attached a minimal case (pgfault_test.tgz) that can reproduce this****
>
>     bug.____
>
>     __ __
>
>     The work flow of the test case is as simple as following:____
>
>     __ __****
>
>
>
>     1) Allocate an array of doubles as read-only, using frame_alloc and***
> *
>
>     vspace_map_one_frame_attr (or pmap->f.map, this doesn't matter)____
>
>     __ __
>
>     2) Initiate the array, this will generate page fault____
>
>     __ __****
>
>
>
>     3) In page fault handler, remap the faulted page as read-write,****
>
>     using pmap->f.modify_flags____
>
>     __ __****
>
>
>
>     The weird thing is: the first touch of this array will not result in**
> **
>
>     a proper value, but just NaN!____
>
>     __ __
>
>     I've conducted several runs and found the following:____
>
>     __ __****
>
>
>
>     1) This bug will occur when the array type is double or float.****
>
>     Everything is fine if it's a integer array.____
>
>     __ __****
>
>
>
>     2) Only the item that caused the page fault will end in a NaN value,
>     others items are just fine. And this applies when the faulted be****
>
>     anywhere within that page, not just the page start.____
>
>     __ __****
>
>
>
>     3) If you assign each array value with a constant value (say 1.0),
>     or a int/double variable, then all items will end up with a right
>     value. It seems only when we assign a[i] with i (or any expression****
>
>     contains i) will produce this bug.____
>
>     __ __****
>
>
>
>     I tested the attached code in release2012-05-25 (the revision I work**
> **
>
>     on) and the latest revision (release2012-10-03).____
>
>     __ __****
>
>
>
>     I've also composed a minimal test case in sccLinux (write_fault.c).***
> *
>
>     It turns out that everything is all good. No annoying NaN values.____
>
>     __ __****
>
>
>
>     This bug has bothered me for quite a few days. Really appreciate if***
> *
>
>     someone can give a hint on this.____
>
>     __ __
>
>     Thanks,____
>
>     Jinghao____
>
>
>
>
> _______________________________________________
> Barrelfish-users mailing list
> Barrelfish-users at lists.inf.ethz.ch
> https://lists.inf.ethz.ch/mailman/listinfo/barrelfish-users****
>
> ** **
>
>  ** **
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.inf.ethz.ch/pipermail/barrelfish-users/attachments/20121206/7a40d49c/attachment.html 


More information about the Barrelfish-users mailing list