[Barrelfish-users] A Weird Bug about Page Fault

Shi Jinghao jhshi at cs.hku.hk
Mon Dec 10 06:52:01 CET 2012


Hi Andrew,

Yes, understood. But unfortunately, page fault is inevitable in my program.
No only because of demand-paging, but I also use page fault to monitor the
access status of each page, and trigger responding library routine as
needed.

I'll keep working on this issue, and keep the community updated if got any
progress.

Thanks,
Jinghao

On Sat, Dec 8, 2012 at 7:38 AM, Andrew Baumann <Andrew.Baumann at microsoft.com
> wrote:

>  Hi,****
>
> ** **
>
> Much as we’d like to help, it’s tough when it requires an SCC to
> reproduce. Can you perhaps restructure your code to avoid the page fault
> handler (e.g. pre-allocate/pre-fault the memory)?****
>
> ** **
>
> Andrew****
>
> ** **
>
> *From:* Shi Jinghao [mailto:jhshi at cs.hku.hk]
> *Sent:* Thursday, 6 December 2012 01:33
> *To:* barrelfish-users at lists.inf.ethz.ch
>
> *Subject:* Re: [Barrelfish-users] A Weird Bug about Page Fault****
>
> ** **
>
> Hi,****
>
> ** **
>
> Are there any progresses on this issue? I've been tracing kernel's page
> fault path for a while but found no clue yet. It's quite frustrating since
> this bug has prevented me doing any kind of benchmarks that has
> double/float data type...****
>
> ** **
>
> Jinghao****
>
> ** **
>
> On Wed, Dec 5, 2012 at 5:52 AM, Andrew Baumann <
> Andrew.Baumann at microsoft.com> wrote:****
>
>  That’s interesting… to my knowledge there’s very little code that’s
> unique to SCC and not shared with x86_32. ****
>
>  ****
>
> One likely the culprit is fpu_save and fpu_restore (from
> /include/arch/x86_32/barrelfish_kpi/asm_inlines_arch.h) which do fxsave and
> fxrstor on x86_32 but fnsave and frstror on SCC. Are we sure the two are
> equivalent?****
>
>  ****
>
> It might also be helpful if someone could test on real x86_32 hardware,
> just to rule out qemu.****
>
>  ****
>
> Andrew****
>
>  ****
>
> *From:* jhshi89 at gmail.com [mailto:jhshi89 at gmail.com] *On Behalf Of *Shi
> Jinghao
> *Sent:* Tuesday, 4 December 2012 02:16
> *To:* Simon Peter
> *Cc:* Andrew Baumann; barrelfish-users at lists.inf.ethz.ch
> *Subject:* Re: [Barrelfish-users] A Weird Bug about Page Fault****
>
>  ****
>
> Hi Simon,****
>
>  ****
>
> Yes, I think so. But this bug didn't occur on sccLinux running on SCC (see
> write_fault.c). So I suspect that some code in Barrelfish that deals with
> exception don't behave right. But I really have no idea where to debug...*
> ***
>
>  ****
>
> Can someone in the community who has access to SCC test the code? Many
> thanks.****
>
>  ****
>
> Jinghao****
>
> On Tue, Dec 4, 2012 at 4:45 PM, Simon Peter <speter at inf.ethz.ch> wrote:***
> *
>
> Hi Jinghao,
>
> It seems this is SCC specific. I just ran your test-case on QEMU on both
> x86-64 and -32 platforms and it seems to work just fine (i.e. I get the
> "all good" output).
>
> Simon****
>
>
>
> On 12/03/2012 12:47 AM, Shi Jinghao wrote:****
>
>  Hi Andrew,
>
> Thanks for your reply. The two different exceptions you mentioned is
> insightful I tried your suggestion. But that does not help. The NaN
> errors still occur. I also tried to put extra dummy float point
> operations in page fault handler. And that does not help, either.
>
> Thanks,
> Jinghao
>
> On Sun, Dec 2, 2012 at 2:06 AM, Andrew Baumann****
>
> <Andrew.Baumann at microsoft.com <mailto:Andrew.Baumann at microsoft.com>>
> wrote:
>
>     Hi Jinghao,____
>
>     __ __****
>
>
>
>     I notice that the first time you use floating point in this program
>     is when writing to the array. There should be two different
>     exceptions raised and handled here: one for the page fault, and one
>     for the first use of the floating point hardware (which we lazily
>     context-switch). My guess is that the page-fault path, which is not
>     heavily exercised, does not interact well with the floating point****
>
>     save/restore code.____
>
>     __ __****
>
>
>
>     If you initialise the floating point hardware by doing some other
>     floating point operations (or writing to a statically allocated****
>
>     variable) beforehand, does the problem go away?____
>
>     __ __
>
>     Andrew____
>
>     __ __
>
>     *From:* Shi Jinghao [mailto:jhshi at cs.hku.hk <mailto:jhshi at cs.hku.hk>]
>     *Sent:* Saturday, 1 December 2012 02:20
>     *To:* barrelfish-users at lists.inf.ethz.ch
>     <mailto:barrelfish-users at lists.inf.ethz.ch>
>     *Subject:* [Barrelfish-users] A Weird Bug about Page Fault____
>
>     __ __
>
>     Hi,____
>
>     __ __****
>
>
>
>     I've been developing a memory management library on Barrelfish
>     (SCC). Recently I bumped into a very weird bug about page fault. I
>     attached a minimal case (pgfault_test.tgz) that can reproduce this****
>
>     bug.____
>
>     __ __
>
>     The work flow of the test case is as simple as following:____
>
>     __ __****
>
>
>
>     1) Allocate an array of doubles as read-only, using frame_alloc and***
> *
>
>     vspace_map_one_frame_attr (or pmap->f.map, this doesn't matter)____
>
>     __ __
>
>     2) Initiate the array, this will generate page fault____
>
>     __ __****
>
>
>
>     3) In page fault handler, remap the faulted page as read-write,****
>
>     using pmap->f.modify_flags____
>
>     __ __****
>
>
>
>     The weird thing is: the first touch of this array will not result in**
> **
>
>     a proper value, but just NaN!____
>
>     __ __
>
>     I've conducted several runs and found the following:____
>
>     __ __****
>
>
>
>     1) This bug will occur when the array type is double or float.****
>
>     Everything is fine if it's a integer array.____
>
>     __ __****
>
>
>
>     2) Only the item that caused the page fault will end in a NaN value,
>     others items are just fine. And this applies when the faulted be****
>
>     anywhere within that page, not just the page start.____
>
>     __ __****
>
>
>
>     3) If you assign each array value with a constant value (say 1.0),
>     or a int/double variable, then all items will end up with a right
>     value. It seems only when we assign a[i] with i (or any expression****
>
>     contains i) will produce this bug.____
>
>     __ __****
>
>
>
>     I tested the attached code in release2012-05-25 (the revision I work**
> **
>
>     on) and the latest revision (release2012-10-03).____
>
>     __ __****
>
>
>
>     I've also composed a minimal test case in sccLinux (write_fault.c).***
> *
>
>     It turns out that everything is all good. No annoying NaN values.____
>
>     __ __****
>
>
>
>     This bug has bothered me for quite a few days. Really appreciate if***
> *
>
>     someone can give a hint on this.____
>
>     __ __
>
>     Thanks,____
>
>     Jinghao____
>
>
>
>
> _______________________________________________
> Barrelfish-users mailing list
> Barrelfish-users at lists.inf.ethz.ch
> https://lists.inf.ethz.ch/mailman/listinfo/barrelfish-users****
>
>  ****
>
>   ****
>
>  ** **
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.inf.ethz.ch/pipermail/barrelfish-users/attachments/20121210/4fac0d74/attachment.html 


More information about the Barrelfish-users mailing list