[Barrelfish-users] A Weird Bug about Page Fault
Andrew Baumann
Andrew.Baumann at microsoft.com
Tue Dec 4 22:52:23 CET 2012
That's interesting... to my knowledge there's very little code that's unique to SCC and not shared with x86_32.
One likely the culprit is fpu_save and fpu_restore (from /include/arch/x86_32/barrelfish_kpi/asm_inlines_arch.h) which do fxsave and fxrstor on x86_32 but fnsave and frstror on SCC. Are we sure the two are equivalent?
It might also be helpful if someone could test on real x86_32 hardware, just to rule out qemu.
Andrew
From: jhshi89 at gmail.com [mailto:jhshi89 at gmail.com] On Behalf Of Shi Jinghao
Sent: Tuesday, 4 December 2012 02:16
To: Simon Peter
Cc: Andrew Baumann; barrelfish-users at lists.inf.ethz.ch
Subject: Re: [Barrelfish-users] A Weird Bug about Page Fault
Hi Simon,
Yes, I think so. But this bug didn't occur on sccLinux running on SCC (see write_fault.c). So I suspect that some code in Barrelfish that deals with exception don't behave right. But I really have no idea where to debug...
Can someone in the community who has access to SCC test the code? Many thanks.
Jinghao
On Tue, Dec 4, 2012 at 4:45 PM, Simon Peter <speter at inf.ethz.ch<mailto:speter at inf.ethz.ch>> wrote:
Hi Jinghao,
It seems this is SCC specific. I just ran your test-case on QEMU on both x86-64 and -32 platforms and it seems to work just fine (i.e. I get the "all good" output).
Simon
On 12/03/2012 12:47 AM, Shi Jinghao wrote:
Hi Andrew,
Thanks for your reply. The two different exceptions you mentioned is
insightful I tried your suggestion. But that does not help. The NaN
errors still occur. I also tried to put extra dummy float point
operations in page fault handler. And that does not help, either.
Thanks,
Jinghao
On Sun, Dec 2, 2012 at 2:06 AM, Andrew Baumann
<Andrew.Baumann at microsoft.com<mailto:Andrew.Baumann at microsoft.com> <mailto:Andrew.Baumann at microsoft.com<mailto:Andrew.Baumann at microsoft.com>>> wrote:
Hi Jinghao,____
__ __
I notice that the first time you use floating point in this program
is when writing to the array. There should be two different
exceptions raised and handled here: one for the page fault, and one
for the first use of the floating point hardware (which we lazily
context-switch). My guess is that the page-fault path, which is not
heavily exercised, does not interact well with the floating point
save/restore code.____
__ __
If you initialise the floating point hardware by doing some other
floating point operations (or writing to a statically allocated
variable) beforehand, does the problem go away?____
__ __
Andrew____
__ __
*From:* Shi Jinghao [mailto:jhshi at cs.hku.hk<mailto:jhshi at cs.hku.hk> <mailto:jhshi at cs.hku.hk<mailto:jhshi at cs.hku.hk>>]
*Sent:* Saturday, 1 December 2012 02:20
*To:* barrelfish-users at lists.inf.ethz.ch<mailto:barrelfish-users at lists.inf.ethz.ch>
<mailto:barrelfish-users at lists.inf.ethz.ch<mailto:barrelfish-users at lists.inf.ethz.ch>>
*Subject:* [Barrelfish-users] A Weird Bug about Page Fault____
__ __
Hi,____
__ __
I've been developing a memory management library on Barrelfish
(SCC). Recently I bumped into a very weird bug about page fault. I
attached a minimal case (pgfault_test.tgz) that can reproduce this
bug.____
__ __
The work flow of the test case is as simple as following:____
__ __
1) Allocate an array of doubles as read-only, using frame_alloc and
vspace_map_one_frame_attr (or pmap->f.map, this doesn't matter)____
__ __
2) Initiate the array, this will generate page fault____
__ __
3) In page fault handler, remap the faulted page as read-write,
using pmap->f.modify_flags____
__ __
The weird thing is: the first touch of this array will not result in
a proper value, but just NaN!____
__ __
I've conducted several runs and found the following:____
__ __
1) This bug will occur when the array type is double or float.
Everything is fine if it's a integer array.____
__ __
2) Only the item that caused the page fault will end in a NaN value,
others items are just fine. And this applies when the faulted be
anywhere within that page, not just the page start.____
__ __
3) If you assign each array value with a constant value (say 1.0),
or a int/double variable, then all items will end up with a right
value. It seems only when we assign a[i] with i (or any expression
contains i) will produce this bug.____
__ __
I tested the attached code in release2012-05-25 (the revision I work
on) and the latest revision (release2012-10-03).____
__ __
I've also composed a minimal test case in sccLinux (write_fault.c).
It turns out that everything is all good. No annoying NaN values.____
__ __
This bug has bothered me for quite a few days. Really appreciate if
someone can give a hint on this.____
__ __
Thanks,____
Jinghao____
_______________________________________________
Barrelfish-users mailing list
Barrelfish-users at lists.inf.ethz.ch<mailto:Barrelfish-users at lists.inf.ethz.ch>
https://lists.inf.ethz.ch/mailman/listinfo/barrelfish-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.inf.ethz.ch/pipermail/barrelfish-users/attachments/20121204/00451c1a/attachment-0001.html
More information about the Barrelfish-users
mailing list