[Barrelfish-users] A Weird Bug about Page Fault
Simon Peter
speter at inf.ethz.ch
Wed Dec 12 21:01:57 CET 2012
Apparently, FNSAVE also resets the FPU after storing its state to
memory, while FXSAVE does not. I've implemented fpu_save() on the
assumption that it just stores the FPU state, not modify it. So this is
likely where the problem is. One quick workaround would be to reload the
FPU state immediately after storing it in fpu_save(). This is slow, but
should work well in your situation.
In include/arch/x86_32/barrelfish_kpi, replace line 68:
__asm volatile("fnsave %0; fwait" : "=m" (*regs));
with:
__asm volatile("fnsave %0; fwait; frstor %0" : "=m" (*regs));
This is untested, but should work.
Simon
On 12-12-09 09:52 PM, Shi Jinghao wrote:
> Hi Andrew,
>
> Yes, understood. But unfortunately, page fault is inevitable in my
> program. No only because of demand-paging, but I also use page fault
> to monitor the access status of each page, and trigger responding
> library routine as needed.
>
> I'll keep working on this issue, and keep the community updated if got
> any progress.
>
> Thanks,
> Jinghao
>
> On Sat, Dec 8, 2012 at 7:38 AM, Andrew Baumann
> <Andrew.Baumann at microsoft.com <mailto:Andrew.Baumann at microsoft.com>>
> wrote:
>
> Hi,
>
> Much as we'd like to help, it's tough when it requires an SCC to
> reproduce. Can you perhaps restructure your code to avoid the page
> fault handler (e.g. pre-allocate/pre-fault the memory)?
>
> Andrew
>
> *From:*Shi Jinghao [mailto:jhshi at cs.hku.hk <mailto:jhshi at cs.hku.hk>]
> *Sent:* Thursday, 6 December 2012 01:33
> *To:* barrelfish-users at lists.inf.ethz.ch
> <mailto:barrelfish-users at lists.inf.ethz.ch>
>
>
> *Subject:* Re: [Barrelfish-users] A Weird Bug about Page Fault
>
> Hi,
>
> Are there any progresses on this issue? I've been tracing kernel's
> page fault path for a while but found no clue yet. It's quite
> frustrating since this bug has prevented me doing any kind of
> benchmarks that has double/float data type...
>
> Jinghao
>
> On Wed, Dec 5, 2012 at 5:52 AM, Andrew Baumann
> <Andrew.Baumann at microsoft.com
> <mailto:Andrew.Baumann at microsoft.com>> wrote:
>
> That's interesting... to my knowledge there's very little code
> that's unique to SCC and not shared with x86_32.
>
> One likely the culprit is fpu_save and fpu_restore (from
> /include/arch/x86_32/barrelfish_kpi/asm_inlines_arch.h) which
> do fxsave and fxrstor on x86_32 but fnsave and frstror on SCC.
> Are we sure the two are equivalent?
>
> It might also be helpful if someone could test on real x86_32
> hardware, just to rule out qemu.
>
> Andrew
>
> *From:*jhshi89 at gmail.com <mailto:jhshi89 at gmail.com>
> [mailto:jhshi89 at gmail.com <mailto:jhshi89 at gmail.com>] *On
> Behalf Of *Shi Jinghao
> *Sent:* Tuesday, 4 December 2012 02:16
> *To:* Simon Peter
> *Cc:* Andrew Baumann; barrelfish-users at lists.inf.ethz.ch
> <mailto:barrelfish-users at lists.inf.ethz.ch>
> *Subject:* Re: [Barrelfish-users] A Weird Bug about Page Fault
>
> Hi Simon,
>
> Yes, I think so. But this bug didn't occur on sccLinux running
> on SCC (see write_fault.c). So I suspect that some code in
> Barrelfish that deals with exception don't behave right. But I
> really have no idea where to debug...
>
> Can someone in the community who has access to SCC test the
> code? Many thanks.
>
> Jinghao
>
> On Tue, Dec 4, 2012 at 4:45 PM, Simon Peter
> <speter at inf.ethz.ch <mailto:speter at inf.ethz.ch>> wrote:
>
> Hi Jinghao,
>
> It seems this is SCC specific. I just ran your test-case
> on QEMU on both x86-64 and -32 platforms and it seems to
> work just fine (i.e. I get the "all good" output).
>
> Simon
>
>
>
> On 12/03/2012 12:47 AM, Shi Jinghao wrote:
>
> Hi Andrew,
>
> Thanks for your reply. The two different exceptions
> you mentioned is
> insightful I tried your suggestion. But that does not
> help. The NaN
> errors still occur. I also tried to put extra dummy
> float point
> operations in page fault handler. And that does not
> help, either.
>
> Thanks,
> Jinghao
>
> On Sun, Dec 2, 2012 at 2:06 AM, Andrew Baumann
>
> <Andrew.Baumann at microsoft.com
> <mailto:Andrew.Baumann at microsoft.com>
> <mailto:Andrew.Baumann at microsoft.com
> <mailto:Andrew.Baumann at microsoft.com>>> wrote:
>
> Hi Jinghao,____
>
> __ __
>
>
>
> I notice that the first time you use floating
> point in this program
> is when writing to the array. There should be two
> different
> exceptions raised and handled here: one for the
> page fault, and one
> for the first use of the floating point hardware
> (which we lazily
> context-switch). My guess is that the page-fault
> path, which is not
> heavily exercised, does not interact well with the
> floating point
>
> save/restore code.____
>
> __ __
>
>
>
> If you initialise the floating point hardware by
> doing some other
> floating point operations (or writing to a
> statically allocated
>
> variable) beforehand, does the problem go away?____
>
> __ __
>
> Andrew____
>
> __ __
>
> *From:* Shi Jinghao [mailto:jhshi at cs.hku.hk
> <mailto:jhshi at cs.hku.hk> <mailto:jhshi at cs.hku.hk
> <mailto:jhshi at cs.hku.hk>>]
> *Sent:* Saturday, 1 December 2012 02:20
> *To:* barrelfish-users at lists.inf.ethz.ch
> <mailto:barrelfish-users at lists.inf.ethz.ch>
> <mailto:barrelfish-users at lists.inf.ethz.ch
> <mailto:barrelfish-users at lists.inf.ethz.ch>>
> *Subject:* [Barrelfish-users] A Weird Bug about
> Page Fault____
>
> __ __
>
> Hi,____
>
> __ __
>
>
>
> I've been developing a memory management library
> on Barrelfish
> (SCC). Recently I bumped into a very weird bug
> about page fault. I
> attached a minimal case (pgfault_test.tgz) that
> can reproduce this
>
> bug.____
>
> __ __
>
> The work flow of the test case is as simple as
> following:____
>
> __ __
>
>
>
> 1) Allocate an array of doubles as read-only,
> using frame_alloc and
>
> vspace_map_one_frame_attr (or pmap->f.map, this
> doesn't matter)____
>
> __ __
>
> 2) Initiate the array, this will generate page
> fault____
>
> __ __
>
>
>
> 3) In page fault handler, remap the faulted page
> as read-write,
>
> using pmap->f.modify_flags____
>
> __ __
>
>
>
> The weird thing is: the first touch of this array
> will not result in
>
> a proper value, but just NaN!____
>
> __ __
>
> I've conducted several runs and found the
> following:____
>
> __ __
>
>
>
> 1) This bug will occur when the array type is
> double or float.
>
> Everything is fine if it's a integer array.____
>
> __ __
>
>
>
> 2) Only the item that caused the page fault will
> end in a NaN value,
> others items are just fine. And this applies when
> the faulted be
>
> anywhere within that page, not just the page start.____
>
> __ __
>
>
>
> 3) If you assign each array value with a constant
> value (say 1.0),
> or a int/double variable, then all items will end
> up with a right
> value. It seems only when we assign a[i] with i
> (or any expression
>
> contains i) will produce this bug.____
>
> __ __
>
>
>
> I tested the attached code in release2012-05-25
> (the revision I work
>
> on) and the latest revision (release2012-10-03).____
>
> __ __
>
>
>
> I've also composed a minimal test case in sccLinux
> (write_fault.c).
>
> It turns out that everything is all good. No
> annoying NaN values.____
>
> __ __
>
>
>
> This bug has bothered me for quite a few days.
> Really appreciate if
>
> someone can give a hint on this.____
>
> __ __
>
> Thanks,____
>
> Jinghao____
>
>
>
>
> _______________________________________________
> Barrelfish-users mailing list
> Barrelfish-users at lists.inf.ethz.ch
> <mailto:Barrelfish-users at lists.inf.ethz.ch>
> https://lists.inf.ethz.ch/mailman/listinfo/barrelfish-users
>
>
>
>
> _______________________________________________
> Barrelfish-users mailing list
> Barrelfish-users at lists.inf.ethz.ch
> https://lists.inf.ethz.ch/mailman/listinfo/barrelfish-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.inf.ethz.ch/pipermail/barrelfish-users/attachments/20121212/c65632db/attachment-0001.html
More information about the Barrelfish-users
mailing list