[Barrelfish-users] A Weird Bug about Page Fault
Andrew Baumann
Andrew.Baumann at microsoft.com
Sat Dec 8 00:38:17 CET 2012
Hi,
Much as we'd like to help, it's tough when it requires an SCC to reproduce. Can you perhaps restructure your code to avoid the page fault handler (e.g. pre-allocate/pre-fault the memory)?
Andrew
From: Shi Jinghao [mailto:jhshi at cs.hku.hk]
Sent: Thursday, 6 December 2012 01:33
To: barrelfish-users at lists.inf.ethz.ch
Subject: Re: [Barrelfish-users] A Weird Bug about Page Fault
Hi,
Are there any progresses on this issue? I've been tracing kernel's page fault path for a while but found no clue yet. It's quite frustrating since this bug has prevented me doing any kind of benchmarks that has double/float data type...
Jinghao
On Wed, Dec 5, 2012 at 5:52 AM, Andrew Baumann <Andrew.Baumann at microsoft.com<mailto:Andrew.Baumann at microsoft.com>> wrote:
That's interesting... to my knowledge there's very little code that's unique to SCC and not shared with x86_32.
One likely the culprit is fpu_save and fpu_restore (from /include/arch/x86_32/barrelfish_kpi/asm_inlines_arch.h) which do fxsave and fxrstor on x86_32 but fnsave and frstror on SCC. Are we sure the two are equivalent?
It might also be helpful if someone could test on real x86_32 hardware, just to rule out qemu.
Andrew
From: jhshi89 at gmail.com<mailto:jhshi89 at gmail.com> [mailto:jhshi89 at gmail.com<mailto:jhshi89 at gmail.com>] On Behalf Of Shi Jinghao
Sent: Tuesday, 4 December 2012 02:16
To: Simon Peter
Cc: Andrew Baumann; barrelfish-users at lists.inf.ethz.ch<mailto:barrelfish-users at lists.inf.ethz.ch>
Subject: Re: [Barrelfish-users] A Weird Bug about Page Fault
Hi Simon,
Yes, I think so. But this bug didn't occur on sccLinux running on SCC (see write_fault.c). So I suspect that some code in Barrelfish that deals with exception don't behave right. But I really have no idea where to debug...
Can someone in the community who has access to SCC test the code? Many thanks.
Jinghao
On Tue, Dec 4, 2012 at 4:45 PM, Simon Peter <speter at inf.ethz.ch<mailto:speter at inf.ethz.ch>> wrote:
Hi Jinghao,
It seems this is SCC specific. I just ran your test-case on QEMU on both x86-64 and -32 platforms and it seems to work just fine (i.e. I get the "all good" output).
Simon
On 12/03/2012 12:47 AM, Shi Jinghao wrote:
Hi Andrew,
Thanks for your reply. The two different exceptions you mentioned is
insightful I tried your suggestion. But that does not help. The NaN
errors still occur. I also tried to put extra dummy float point
operations in page fault handler. And that does not help, either.
Thanks,
Jinghao
On Sun, Dec 2, 2012 at 2:06 AM, Andrew Baumann
<Andrew.Baumann at microsoft.com<mailto:Andrew.Baumann at microsoft.com> <mailto:Andrew.Baumann at microsoft.com<mailto:Andrew.Baumann at microsoft.com>>> wrote:
Hi Jinghao,____
__ __
I notice that the first time you use floating point in this program
is when writing to the array. There should be two different
exceptions raised and handled here: one for the page fault, and one
for the first use of the floating point hardware (which we lazily
context-switch). My guess is that the page-fault path, which is not
heavily exercised, does not interact well with the floating point
save/restore code.____
__ __
If you initialise the floating point hardware by doing some other
floating point operations (or writing to a statically allocated
variable) beforehand, does the problem go away?____
__ __
Andrew____
__ __
*From:* Shi Jinghao [mailto:jhshi at cs.hku.hk<mailto:jhshi at cs.hku.hk> <mailto:jhshi at cs.hku.hk<mailto:jhshi at cs.hku.hk>>]
*Sent:* Saturday, 1 December 2012 02:20
*To:* barrelfish-users at lists.inf.ethz.ch<mailto:barrelfish-users at lists.inf.ethz.ch>
<mailto:barrelfish-users at lists.inf.ethz.ch<mailto:barrelfish-users at lists.inf.ethz.ch>>
*Subject:* [Barrelfish-users] A Weird Bug about Page Fault____
__ __
Hi,____
__ __
I've been developing a memory management library on Barrelfish
(SCC). Recently I bumped into a very weird bug about page fault. I
attached a minimal case (pgfault_test.tgz) that can reproduce this
bug.____
__ __
The work flow of the test case is as simple as following:____
__ __
1) Allocate an array of doubles as read-only, using frame_alloc and
vspace_map_one_frame_attr (or pmap->f.map, this doesn't matter)____
__ __
2) Initiate the array, this will generate page fault____
__ __
3) In page fault handler, remap the faulted page as read-write,
using pmap->f.modify_flags____
__ __
The weird thing is: the first touch of this array will not result in
a proper value, but just NaN!____
__ __
I've conducted several runs and found the following:____
__ __
1) This bug will occur when the array type is double or float.
Everything is fine if it's a integer array.____
__ __
2) Only the item that caused the page fault will end in a NaN value,
others items are just fine. And this applies when the faulted be
anywhere within that page, not just the page start.____
__ __
3) If you assign each array value with a constant value (say 1.0),
or a int/double variable, then all items will end up with a right
value. It seems only when we assign a[i] with i (or any expression
contains i) will produce this bug.____
__ __
I tested the attached code in release2012-05-25 (the revision I work
on) and the latest revision (release2012-10-03).____
__ __
I've also composed a minimal test case in sccLinux (write_fault.c).
It turns out that everything is all good. No annoying NaN values.____
__ __
This bug has bothered me for quite a few days. Really appreciate if
someone can give a hint on this.____
__ __
Thanks,____
Jinghao____
_______________________________________________
Barrelfish-users mailing list
Barrelfish-users at lists.inf.ethz.ch<mailto:Barrelfish-users at lists.inf.ethz.ch>
https://lists.inf.ethz.ch/mailman/listinfo/barrelfish-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.inf.ethz.ch/pipermail/barrelfish-users/attachments/20121207/f62211d1/attachment-0001.html
More information about the Barrelfish-users
mailing list