Hi Simon,<div><br></div><div>Yes, I think so. But this bug didn't occur on sccLinux running on SCC (see write_fault.c). So I suspect that some code in Barrelfish that deals with exception don't behave right. But I really have no idea where to debug...</div>
<div><br></div><div>Can someone in the community who has access to SCC test the code? Many thanks.</div><div><br></div><div>Jinghao<br><br><div class="gmail_quote">On Tue, Dec 4, 2012 at 4:45 PM, Simon Peter <span dir="ltr"><<a href="mailto:speter@inf.ethz.ch" target="_blank">speter@inf.ethz.ch</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Jinghao,<br>
<br>
It seems this is SCC specific. I just ran your test-case on QEMU on both x86-64 and -32 platforms and it seems to work just fine (i.e. I get the "all good" output).<br>
<br>
Simon<div class="im"><br>
<br>
On 12/03/2012 12:47 AM, Shi Jinghao wrote:<br>
</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">
Hi Andrew,<br>
<br>
Thanks for your reply. The two different exceptions you mentioned is<br>
insightful I tried your suggestion. But that does not help. The NaN<br>
errors still occur. I also tried to put extra dummy float point<br>
operations in page fault handler. And that does not help, either.<br>
<br>
Thanks,<br>
Jinghao<br>
<br>
On Sun, Dec 2, 2012 at 2:06 AM, Andrew Baumann<br></div>
<<a href="mailto:Andrew.Baumann@microsoft.com" target="_blank">Andrew.Baumann@microsoft.com</a> <mailto:<a href="mailto:Andrew.Baumann@microsoft.com" target="_blank">Andrew.Baumann@<u></u>microsoft.com</a>>> wrote:<br>
<br>
Hi Jinghao,____<br>
<br>
__ __<div class="im"><br>
<br>
I notice that the first time you use floating point in this program<br>
is when writing to the array. There should be two different<br>
exceptions raised and handled here: one for the page fault, and one<br>
for the first use of the floating point hardware (which we lazily<br>
context-switch). My guess is that the page-fault path, which is not<br>
heavily exercised, does not interact well with the floating point<br></div>
save/restore code.____<br>
<br>
__ __<div class="im"><br>
<br>
If you initialise the floating point hardware by doing some other<br>
floating point operations (or writing to a statically allocated<br></div>
variable) beforehand, does the problem go away?____<br>
<br>
__ __<br>
<br>
Andrew____<br>
<br>
__ __<br>
<br>
*From:* Shi Jinghao [mailto:<a href="mailto:jhshi@cs.hku.hk" target="_blank">jhshi@cs.hku.hk</a> <mailto:<a href="mailto:jhshi@cs.hku.hk" target="_blank">jhshi@cs.hku.hk</a>>]<br>
*Sent:* Saturday, 1 December 2012 02:20<br>
*To:* <a href="mailto:barrelfish-users@lists.inf.ethz.ch" target="_blank">barrelfish-users@lists.inf.<u></u>ethz.ch</a><br>
<mailto:<a href="mailto:barrelfish-users@lists.inf.ethz.ch" target="_blank">barrelfish-users@<u></u>lists.inf.ethz.ch</a>><br>
*Subject:* [Barrelfish-users] A Weird Bug about Page Fault____<br>
<br>
__ __<br>
<br>
Hi,____<br>
<br>
__ __<div class="im"><br>
<br>
I've been developing a memory management library on Barrelfish<br>
(SCC). Recently I bumped into a very weird bug about page fault. I<br>
attached a minimal case (pgfault_test.tgz) that can reproduce this<br></div>
bug.____<br>
<br>
__ __<br>
<br>
The work flow of the test case is as simple as following:____<br>
<br>
__ __<div class="im"><br>
<br>
1) Allocate an array of doubles as read-only, using frame_alloc and<br></div>
vspace_map_one_frame_attr (or pmap->f.map, this doesn't matter)____<br>
<br>
__ __<br>
<br>
2) Initiate the array, this will generate page fault____<br>
<br>
__ __<div class="im"><br>
<br>
3) In page fault handler, remap the faulted page as read-write,<br></div>
using pmap->f.modify_flags____<br>
<br>
__ __<div class="im"><br>
<br>
The weird thing is: the first touch of this array will not result in<br></div>
a proper value, but just NaN!____<br>
<br>
__ __<br>
<br>
I've conducted several runs and found the following:____<br>
<br>
__ __<div class="im"><br>
<br>
1) This bug will occur when the array type is double or float.<br></div>
Everything is fine if it's a integer array.____<br>
<br>
__ __<div class="im"><br>
<br>
2) Only the item that caused the page fault will end in a NaN value,<br>
others items are just fine. And this applies when the faulted be<br></div>
anywhere within that page, not just the page start.____<br>
<br>
__ __<div class="im"><br>
<br>
3) If you assign each array value with a constant value (say 1.0),<br>
or a int/double variable, then all items will end up with a right<br>
value. It seems only when we assign a[i] with i (or any expression<br></div>
contains i) will produce this bug.____<br>
<br>
__ __<div class="im"><br>
<br>
I tested the attached code in release2012-05-25 (the revision I work<br></div>
on) and the latest revision (release2012-10-03).____<br>
<br>
__ __<div class="im"><br>
<br>
I've also composed a minimal test case in sccLinux (write_fault.c).<br></div>
It turns out that everything is all good. No annoying NaN values.____<br>
<br>
__ __<div class="im"><br>
<br>
This bug has bothered me for quite a few days. Really appreciate if<br></div>
someone can give a hint on this.____<br>
<br>
__ __<br>
<br>
Thanks,____<br>
<br>
Jinghao____<br>
<br>
<br>
<br>
<br>
______________________________<u></u>_________________<br>
Barrelfish-users mailing list<br>
<a href="mailto:Barrelfish-users@lists.inf.ethz.ch" target="_blank">Barrelfish-users@lists.inf.<u></u>ethz.ch</a><br>
<a href="https://lists.inf.ethz.ch/mailman/listinfo/barrelfish-users" target="_blank">https://lists.inf.ethz.ch/<u></u>mailman/listinfo/barrelfish-<u></u>users</a><br>
</blockquote>
<br>
</blockquote></div><br></div>