[Barrelfish-users] A Weird Bug about Page Fault
Shi Jinghao
jhshi at cs.hku.hk
Mon Dec 24 08:39:06 CET 2012
Hi, guys.
Apology for the huge delay on this problem! The SCC data center has been
down for the past few days...
I tested the patch provided by Simon, it works! Many thanks.
Jinghao
On Thu, Dec 13, 2012 at 4:01 AM, Simon Peter <speter at inf.ethz.ch> wrote:
> Apparently, FNSAVE also resets the FPU after storing its state to
> memory, while FXSAVE does not. I've implemented fpu_save() on the
> assumption that it just stores the FPU state, not modify it. So this is
> likely where the problem is. One quick workaround would be to reload the
> FPU state immediately after storing it in fpu_save(). This is slow, but
> should work well in your situation.
>
> In include/arch/x86_32/barrelfish_kpi, replace line 68:
>
> __asm volatile("fnsave %0; fwait" : "=m" (*regs));
>
> with:
>
> __asm volatile("fnsave %0; fwait; frstor %0" : "=m" (*regs));
>
> This is untested, but should work.
>
> Simon
>
>
> On 12-12-09 09:52 PM, Shi Jinghao wrote:
>
> Hi Andrew,
>
> Yes, understood. But unfortunately, page fault is inevitable in my
> program. No only because of demand-paging, but I also use page fault
> to monitor the access status of each page, and trigger responding library
> routine as needed.
>
> I'll keep working on this issue, and keep the community updated if got
> any progress.
>
> Thanks,
> Jinghao
>
> On Sat, Dec 8, 2012 at 7:38 AM, Andrew Baumann <
> Andrew.Baumann at microsoft.com> wrote:
>
>> Hi,
>>
>>
>>
>> Much as we’d like to help, it’s tough when it requires an SCC to
>> reproduce. Can you perhaps restructure your code to avoid the page fault
>> handler (e.g. pre-allocate/pre-fault the memory)?
>>
>>
>>
>> Andrew
>>
>>
>>
>> *From:* Shi Jinghao [mailto:jhshi at cs.hku.hk]
>> *Sent:* Thursday, 6 December 2012 01:33
>> *To:* barrelfish-users at lists.inf.ethz.ch
>>
>> *Subject:* Re: [Barrelfish-users] A Weird Bug about Page Fault
>>
>>
>>
>> Hi,
>>
>>
>>
>> Are there any progresses on this issue? I've been tracing kernel's page
>> fault path for a while but found no clue yet. It's quite frustrating since
>> this bug has prevented me doing any kind of benchmarks that has
>> double/float data type...
>>
>>
>>
>> Jinghao
>>
>>
>>
>> On Wed, Dec 5, 2012 at 5:52 AM, Andrew Baumann <
>> Andrew.Baumann at microsoft.com> wrote:
>>
>> That’s interesting… to my knowledge there’s very little code that’s
>> unique to SCC and not shared with x86_32.
>>
>>
>>
>> One likely the culprit is fpu_save and fpu_restore (from
>> /include/arch/x86_32/barrelfish_kpi/asm_inlines_arch.h) which do fxsave and
>> fxrstor on x86_32 but fnsave and frstror on SCC. Are we sure the two are
>> equivalent?
>>
>>
>>
>> It might also be helpful if someone could test on real x86_32 hardware,
>> just to rule out qemu.
>>
>>
>>
>> Andrew
>>
>>
>>
>> *From:* jhshi89 at gmail.com [mailto:jhshi89 at gmail.com] *On Behalf Of *Shi
>> Jinghao
>> *Sent:* Tuesday, 4 December 2012 02:16
>> *To:* Simon Peter
>> *Cc:* Andrew Baumann; barrelfish-users at lists.inf.ethz.ch
>> *Subject:* Re: [Barrelfish-users] A Weird Bug about Page Fault
>>
>>
>>
>> Hi Simon,
>>
>>
>>
>> Yes, I think so. But this bug didn't occur on sccLinux running on SCC
>> (see write_fault.c). So I suspect that some code in Barrelfish that deals
>> with exception don't behave right. But I really have no idea where to
>> debug...
>>
>>
>>
>> Can someone in the community who has access to SCC test the code? Many
>> thanks.
>>
>>
>>
>> Jinghao
>>
>> On Tue, Dec 4, 2012 at 4:45 PM, Simon Peter <speter at inf.ethz.ch> wrote:
>>
>> Hi Jinghao,
>>
>> It seems this is SCC specific. I just ran your test-case on QEMU on both
>> x86-64 and -32 platforms and it seems to work just fine (i.e. I get the
>> "all good" output).
>>
>> Simon
>>
>>
>>
>> On 12/03/2012 12:47 AM, Shi Jinghao wrote:
>>
>> Hi Andrew,
>>
>> Thanks for your reply. The two different exceptions you mentioned is
>> insightful I tried your suggestion. But that does not help. The NaN
>> errors still occur. I also tried to put extra dummy float point
>> operations in page fault handler. And that does not help, either.
>>
>> Thanks,
>> Jinghao
>>
>> On Sun, Dec 2, 2012 at 2:06 AM, Andrew Baumann
>>
>> <Andrew.Baumann at microsoft.com <mailto:Andrew.Baumann at microsoft.com>>
>> wrote:
>>
>> Hi Jinghao,____
>>
>> __ __
>>
>>
>>
>> I notice that the first time you use floating point in this program
>> is when writing to the array. There should be two different
>> exceptions raised and handled here: one for the page fault, and one
>> for the first use of the floating point hardware (which we lazily
>> context-switch). My guess is that the page-fault path, which is not
>> heavily exercised, does not interact well with the floating point
>>
>> save/restore code.____
>>
>> __ __
>>
>>
>>
>> If you initialise the floating point hardware by doing some other
>> floating point operations (or writing to a statically allocated
>>
>> variable) beforehand, does the problem go away?____
>>
>> __ __
>>
>> Andrew____
>>
>> __ __
>>
>> *From:* Shi Jinghao [mailto:jhshi at cs.hku.hk <mailto:jhshi at cs.hku.hk>]
>> *Sent:* Saturday, 1 December 2012 02:20
>> *To:* barrelfish-users at lists.inf.ethz.ch
>> <mailto:barrelfish-users at lists.inf.ethz.ch>
>> *Subject:* [Barrelfish-users] A Weird Bug about Page Fault____
>>
>> __ __
>>
>> Hi,____
>>
>> __ __
>>
>>
>>
>> I've been developing a memory management library on Barrelfish
>> (SCC). Recently I bumped into a very weird bug about page fault. I
>> attached a minimal case (pgfault_test.tgz) that can reproduce this
>>
>> bug.____
>>
>> __ __
>>
>> The work flow of the test case is as simple as following:____
>>
>> __ __
>>
>>
>>
>> 1) Allocate an array of doubles as read-only, using frame_alloc and
>>
>> vspace_map_one_frame_attr (or pmap->f.map, this doesn't matter)____
>>
>> __ __
>>
>> 2) Initiate the array, this will generate page fault____
>>
>> __ __
>>
>>
>>
>> 3) In page fault handler, remap the faulted page as read-write,
>>
>> using pmap->f.modify_flags____
>>
>> __ __
>>
>>
>>
>> The weird thing is: the first touch of this array will not result in
>>
>> a proper value, but just NaN!____
>>
>> __ __
>>
>> I've conducted several runs and found the following:____
>>
>> __ __
>>
>>
>>
>> 1) This bug will occur when the array type is double or float.
>>
>> Everything is fine if it's a integer array.____
>>
>> __ __
>>
>>
>>
>> 2) Only the item that caused the page fault will end in a NaN value,
>> others items are just fine. And this applies when the faulted be
>>
>> anywhere within that page, not just the page start.____
>>
>> __ __
>>
>>
>>
>> 3) If you assign each array value with a constant value (say 1.0),
>> or a int/double variable, then all items will end up with a right
>> value. It seems only when we assign a[i] with i (or any expression
>>
>> contains i) will produce this bug.____
>>
>> __ __
>>
>>
>>
>> I tested the attached code in release2012-05-25 (the revision I work
>>
>> on) and the latest revision (release2012-10-03).____
>>
>> __ __
>>
>>
>>
>> I've also composed a minimal test case in sccLinux (write_fault.c).
>>
>> It turns out that everything is all good. No annoying NaN values.____
>>
>> __ __
>>
>>
>>
>> This bug has bothered me for quite a few days. Really appreciate if
>>
>> someone can give a hint on this.____
>>
>> __ __
>>
>> Thanks,____
>>
>> Jinghao____
>>
>>
>>
>>
>> _______________________________________________
>> Barrelfish-users mailing list
>> Barrelfish-users at lists.inf.ethz.ch
>> https://lists.inf.ethz.ch/mailman/listinfo/barrelfish-users
>>
>>
>>
>>
>>
>>
>>
>
>
>
> _______________________________________________
> Barrelfish-users mailing listBarrelfish-users at lists.inf.ethz.chhttps://lists.inf.ethz.ch/mailman/listinfo/barrelfish-users
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.inf.ethz.ch/pipermail/barrelfish-users/attachments/20121224/eedad81e/attachment-0001.html
More information about the Barrelfish-users
mailing list