<div dir="ltr">Hi Timothy,<div><br></div><div>I don't know why you didn't see that reply with code. Anyway, it's like this: </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="font-size:12.8000001907349px">CPU: Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz</div><div style="font-size:12.8000001907349px"><br></div><div style="font-size:12.8000001907349px">QEMU execution time: 5509153 us</div><div style="font-size:12.8000001907349px">Bare-metal execution time: 10802345 us</div><div style="font-size:12.8000001907349px">w/o hyper-threading: 6065235 us</div><div style="font-size:12.8000001907349px"><br></div><div style="font-size:12.8000001907349px">code:</div><div style="font-size:12.8000001907349px"><blockquote>#include <stdio.h><br>#include <sys/time.h><br>long fib(long a, long b, long depth)<br>{<br> if (depth > 0) {<br> return fib(b, a + b, depth - 1);<br> }<br> return b;<br>}<br>int main(void)<br>{<br> struct timeval start;<br> gettimeofday(&start, NULL);<br> printf("fib: %ld\n", fib(1, 1, 10000000000));<br> struct timeval end;<br> gettimeofday(&end, NULL);<br> printf("time: %ld us\n", end.tv_usec - start.tv_usec + (end.tv_sec - start.tv_sec) * 10000\00);<br> return 0;<br>}</blockquote></div></blockquote></div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Feb 8, 2015 at 7:10 PM, Timothy Roscoe <span dir="ltr"><<a href="mailto:troscoe@inf.ethz.ch" target="_blank">troscoe@inf.ethz.ch</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
Dear Tom,<br>
<br>
Perhaps it would be easier for us to understand your problem if you<br>
posted the code you running, and also how you are taking your<br>
measurements. Can you share that with us?<br>
<span class="HOEnZb"><font color="#888888"><br>
-- Timothy<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
At Sun, 8 Feb 2015 12:37:09 +0800, "tomsun.0.7" <<a href="mailto:tomsun.0.7@gmail.com">tomsun.0.7@gmail.com</a>> wrote:<br>
> Really sorry for misunderstanding your reply but I also considered about<br>
> QEMU's imprecise measure of time. I increased the steps of additions by 10<br>
> times so that it will take around one minute to finish. Then I started<br>
> two Barrelfish (one in QEMU and one on bare-metal) and spawned the same<br>
> application at the same time (nearly, no guarantee for submillisecond<br>
> difference), but I found there is a obvious gap (about 5 seconds) between<br>
> their end time to finish.<br>
><br>
> And I have read your SOSP paper, which gives an evaluation with OpenMP.<br>
> Indeed, the performance for insert sort is nearly the same with the one on<br>
> Linux while running on one core. So, I become more curious about why I got<br>
> different performance.<br>
><br>
> Is there any tools, like profiling tools, I can leverage to find out the<br>
> reasons?<br>
><br>
> On Sun, Feb 8, 2015 at 3:51 AM, Simon Peter <<a href="mailto:speter@inf.ethz.ch">speter@inf.ethz.ch</a>> wrote:<br>
><br>
> > What I mean is that in order to measure performance in terms of execution<br>
> > time (as I can glance from your previous email that had results in it), you<br>
> > first need a notion of time. Barrelfish's notion of time is off on QEMU.<br>
> > Hence, you might be seeing wrong results.<br>
> ><br>
> > We have compared Barrelfish to Linux performance on bare-metal hardware in<br>
> > various papers, such as our SOSP paper.<br>
> ><br>
> > On 02/06/2015 07:02 PM, tomsun.0.7 wrote:<br>
> ><br>
> >> I don't actually care about the measure of CPU speed, I want to know why<br>
> >> Barrelfish performs worse on bare-metal than on QEMU with KVM.<br>
> >><br>
> >> As my second reply demonstrated, I got worse performance while running<br>
> >> applications on bare-metal than both QEMU with KVM and native Linux.<br>
> >><br>
> >> I made sure that it doesn't result from CPU frequency, because I<br>
> >> accessed the hardware performance registers directly within kernel and<br>
> >> found it did run with full speed.<br>
> >><br>
> >> Now, I'm suspecting that is it possible that the performance is<br>
> >> influenced by some other factors like, device interrupts?<br>
> >> Have you ever measured the performance of Barrelfish and compared it<br>
> >> with Linux or other operating system?<br>
> >><br>
> >> On Sat, Feb 7, 2015 at 3:47 AM, Simon Peter <<a href="mailto:speter@inf.ethz.ch">speter@inf.ethz.ch</a><br>
> >> <mailto:<a href="mailto:speter@inf.ethz.ch">speter@inf.ethz.ch</a>>> wrote:<br>
> >><br>
> >> I'm also suspecting that it might just be jittery CPU emulation<br>
> >> speed that's getting you different results. Barrelfish's usleep<br>
> >> ultimately uses sys_debug_get_tsc_per_ms, so your 2 ways might<br>
> >> actually be the same. Barrelfish measures CPU speed at bootup, but<br>
> >> it's very bad at figuring it out correctly on QEMU. I'm not sure<br>
> >> what the best way is to get accurate results on QEMU.<br>
> >><br>
> >><br>
> >> On 15-02-04 09:46 PM, tomsun.0.7 wrote:<br>
> >><br>
> >>> Hi,<br>
> >>><br>
> >>> I started a network application who is dedicated to produce<br>
> >>> packets all the time. However, when I started it on bare-metal, I<br>
> >>> found the throughput is only a half of running in QEMU (of course,<br>
> >>> with KVM enabled).<br>
> >>><br>
> >>> This application is only CPU-intensive, it just produces a lot of<br>
> >>> packets and then destroys them. So it's none of the devices'<br>
> >>> business. At first, I think it results from the low frequency of<br>
> >>> cores, so I measured this by two ways: 1. invoking native<br>
> >>> Barrelfish interface, sys_debug_get_tsc_per_ms, directly; 2.<br>
> >>> reading tsc and sleeping for 1 second using POSIX sleep (which is<br>
> >>> implemented by invoking Barrelfish's usleep as I know). However, I<br>
> >>> got the full-speed results under both conditions.<br>
> >>><br>
> >>> So, I don't know whether it results from incorrect measure of<br>
> >>> frequency or some other CPU problems because I even tried to start<br>
> >>> it with PXE in QEMU, and got full performance.<br>
> >>><br>
> >>> What can I do to get normal performance on bare-metal? Or, if it<br>
> >>> results from low frequency of CPUs, what can I do to tune up the<br>
> >>> frequency?<br>
> >>><br>
> >>><br>
> >>> Tom<br>
> >>><br>
> >>><br>
> >>> _______________________________________________<br>
> >>> Barrelfish-users mailing list<br>
> >>> <a href="mailto:Barrelfish-users@lists.inf.ethz.ch">Barrelfish-users@lists.inf.ethz.ch</a> <mailto:<a href="mailto:Barrelfish-users@">Barrelfish-users@</a><br>
> >>> <a href="http://lists.inf.ethz.ch" target="_blank">lists.inf.ethz.ch</a>><br>
> >>> <a href="https://lists.inf.ethz.ch/mailman/listinfo/barrelfish-users" target="_blank">https://lists.inf.ethz.ch/mailman/listinfo/barrelfish-users</a><br>
> >>><br>
> >><br>
> >><br>
> >><br>
> ><br>
><br>
</div></div></blockquote></div><br></div>