[Barrelfish-users] A Weird Bug about Page Fault

Andrew Baumann Andrew.Baumann at microsoft.com
Sat Dec 1 19:06:48 CET 2012


Hi Jinghao,

I notice that the first time you use floating point in this program is when writing to the array. There should be two different exceptions raised and handled here: one for the page fault, and one for the first use of the floating point hardware (which we lazily context-switch). My guess is that the page-fault path, which is not heavily exercised, does not interact well with the floating point save/restore code.

If you initialise the floating point hardware by doing some other floating point operations (or writing to a statically allocated variable) beforehand, does the problem go away?

Andrew

From: Shi Jinghao [mailto:jhshi at cs.hku.hk]
Sent: Saturday, 1 December 2012 02:20
To: barrelfish-users at lists.inf.ethz.ch
Subject: [Barrelfish-users] A Weird Bug about Page Fault

Hi,

I've been developing a memory management library on Barrelfish (SCC). Recently I bumped into a very weird bug about page fault. I attached a minimal case (pgfault_test.tgz) that can reproduce this bug.

The work flow of the test case is as simple as following:

1) Allocate an array of doubles as read-only, using frame_alloc and vspace_map_one_frame_attr (or pmap->f.map, this doesn't matter)

2) Initiate the array, this will generate page fault

3) In page fault handler, remap the faulted page as read-write, using pmap->f.modify_flags

The weird thing is: the first touch of this array will not result in a proper value, but just NaN!

I've conducted several runs and found the following:

1) This bug will occur when the array type is double or float. Everything is fine if it's a integer array.

2) Only the item that caused the page fault will end in a NaN value, others items are just fine. And this applies when the faulted be anywhere within that page, not just the page start.

3) If you assign each array value with a constant value (say 1.0), or a int/double variable, then all items will end up with a right value. It seems only when we assign a[i] with i (or any expression contains i) will produce this bug.

I tested the attached code in release2012-05-25 (the revision I work on) and the latest revision (release2012-10-03).

I've also composed a minimal test case in sccLinux (write_fault.c). It turns out that everything is all good. No annoying NaN values.

This bug has bothered me for quite a few days. Really appreciate if someone can give a hint on this.

Thanks,
Jinghao
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.inf.ethz.ch/pipermail/barrelfish-users/attachments/20121201/77cdb643/attachment.html 


More information about the Barrelfish-users mailing list