<div dir="ltr">Thanks a lot, these are indeed reasonable advices! I'll try them later.</div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Feb 9, 2015 at 3:06 AM, Timothy Roscoe <span dir="ltr"><<a href="mailto:troscoe@inf.ethz.ch" target="_blank">troscoe@inf.ethz.ch</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
So here's a few other ideas:<br>
<br>
- Measure the cost of gettimeofday(). I suspect this is pretty fast<br>
on Linux, but is no highly optimized on Barrelfish and quite<br>
possibly results in a system call, potentially followed by a<br>
reschedule. I have no idea what QEMU would use to implement this<br>
either.<br>
<br>
- Perhaps I'm missing something, but it looks like you have a printf<br>
inside your timing loop. On bare metal, this is going to go to the<br>
UART, which will mean you are limited by the rate at which<br>
Barrelfish can pump characters down a serial line. I suggest you<br>
remove the printf from your timing loop and only print the<br>
values at the end.<br>
<br>
- Use rdtsc to get cycle counts, both for gettimeofday() (see above)<br>
and for iterations of your function.<br>
<br>
- Look at the machine code being generated for Linux and Barrelfish -<br>
it's possible the compiler flags are different and result in<br>
different optimizations being applied.<br>
<br>
Hopefully some of this is helpful.<br>
<span class="HOEnZb"><font color="#888888"><br>
-- Timothy Roscoe<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
At Sun, 8 Feb 2015 19:22:58 +0800, "tomsun.0.7" <<a href="mailto:tomsun.0.7@gmail.com">tomsun.0.7@gmail.com</a>> wrote:<br>
> Hi Timothy,<br>
><br>
> I don't know why you didn't see that reply with code. Anyway, it's like<br>
> this:<br>
><br>
> > CPU: Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz<br>
> ><br>
> > QEMU execution time: 5509153 us<br>
> > Bare-metal execution time: 10802345 us<br>
> > w/o hyper-threading: 6065235 us<br>
> ><br>
> > code:<br>
> ><br>
> > #include <stdio.h><br>
> > #include <sys/time.h><br>
> > long fib(long a, long b, long depth)<br>
> > {<br>
> > if (depth > 0) {<br>
> > return fib(b, a + b, depth - 1);<br>
> > }<br>
> > return b;<br>
> > }<br>
> > int main(void)<br>
> > {<br>
> > struct timeval start;<br>
> > gettimeofday(&start, NULL);<br>
> > printf("fib: %ld\n", fib(1, 1, 10000000000));<br>
> > struct timeval end;<br>
> > gettimeofday(&end, NULL);<br>
> > printf("time: %ld us\n", end.tv_usec - start.tv_usec + (end.tv_sec -<br>
> > start.tv_sec) * 10000\00);<br>
> > return 0;<br>
> > }<br>
> ><br>
> ><br>
> On Sun, Feb 8, 2015 at 7:10 PM, Timothy Roscoe <<a href="mailto:troscoe@inf.ethz.ch">troscoe@inf.ethz.ch</a>> wrote:<br>
><br>
> ><br>
> > Dear Tom,<br>
> ><br>
> > Perhaps it would be easier for us to understand your problem if you<br>
> > posted the code you running, and also how you are taking your<br>
> > measurements. Can you share that with us?<br>
> ><br>
> > -- Timothy<br>
> ><br>
> > At Sun, 8 Feb 2015 12:37:09 +0800, "tomsun.0.7" <<a href="mailto:tomsun.0.7@gmail.com">tomsun.0.7@gmail.com</a>><br>
> > wrote:<br>
> > > Really sorry for misunderstanding your reply but I also considered about<br>
> > > QEMU's imprecise measure of time. I increased the steps of additions by<br>
> > 10<br>
> > > times so that it will take around one minute to finish. Then I started<br>
> > > two Barrelfish (one in QEMU and one on bare-metal) and spawned the same<br>
> > > application at the same time (nearly, no guarantee for submillisecond<br>
> > > difference), but I found there is a obvious gap (about 5 seconds) between<br>
> > > their end time to finish.<br>
> > ><br>
> > > And I have read your SOSP paper, which gives an evaluation with OpenMP.<br>
> > > Indeed, the performance for insert sort is nearly the same with the one<br>
> > on<br>
> > > Linux while running on one core. So, I become more curious about why I<br>
> > got<br>
> > > different performance.<br>
> > ><br>
> > > Is there any tools, like profiling tools, I can leverage to find out the<br>
> > > reasons?<br>
> > ><br>
> > > On Sun, Feb 8, 2015 at 3:51 AM, Simon Peter <<a href="mailto:speter@inf.ethz.ch">speter@inf.ethz.ch</a>> wrote:<br>
> > ><br>
> > > > What I mean is that in order to measure performance in terms of<br>
> > execution<br>
> > > > time (as I can glance from your previous email that had results in<br>
> > it), you<br>
> > > > first need a notion of time. Barrelfish's notion of time is off on<br>
> > QEMU.<br>
> > > > Hence, you might be seeing wrong results.<br>
> > > ><br>
> > > > We have compared Barrelfish to Linux performance on bare-metal<br>
> > hardware in<br>
> > > > various papers, such as our SOSP paper.<br>
> > > ><br>
> > > > On 02/06/2015 07:02 PM, tomsun.0.7 wrote:<br>
> > > ><br>
> > > >> I don't actually care about the measure of CPU speed, I want to know<br>
> > why<br>
> > > >> Barrelfish performs worse on bare-metal than on QEMU with KVM.<br>
> > > >><br>
> > > >> As my second reply demonstrated, I got worse performance while running<br>
> > > >> applications on bare-metal than both QEMU with KVM and native Linux.<br>
> > > >><br>
> > > >> I made sure that it doesn't result from CPU frequency, because I<br>
> > > >> accessed the hardware performance registers directly within kernel and<br>
> > > >> found it did run with full speed.<br>
> > > >><br>
> > > >> Now, I'm suspecting that is it possible that the performance is<br>
> > > >> influenced by some other factors like, device interrupts?<br>
> > > >> Have you ever measured the performance of Barrelfish and compared it<br>
> > > >> with Linux or other operating system?<br>
> > > >><br>
> > > >> On Sat, Feb 7, 2015 at 3:47 AM, Simon Peter <<a href="mailto:speter@inf.ethz.ch">speter@inf.ethz.ch</a><br>
> > > >> <mailto:<a href="mailto:speter@inf.ethz.ch">speter@inf.ethz.ch</a>>> wrote:<br>
> > > >><br>
> > > >> I'm also suspecting that it might just be jittery CPU emulation<br>
> > > >> speed that's getting you different results. Barrelfish's usleep<br>
> > > >> ultimately uses sys_debug_get_tsc_per_ms, so your 2 ways might<br>
> > > >> actually be the same. Barrelfish measures CPU speed at bootup, but<br>
> > > >> it's very bad at figuring it out correctly on QEMU. I'm not sure<br>
> > > >> what the best way is to get accurate results on QEMU.<br>
> > > >><br>
> > > >><br>
> > > >> On 15-02-04 09:46 PM, tomsun.0.7 wrote:<br>
> > > >><br>
> > > >>> Hi,<br>
> > > >>><br>
> > > >>> I started a network application who is dedicated to produce<br>
> > > >>> packets all the time. However, when I started it on bare-metal, I<br>
> > > >>> found the throughput is only a half of running in QEMU (of<br>
> > course,<br>
> > > >>> with KVM enabled).<br>
> > > >>><br>
> > > >>> This application is only CPU-intensive, it just produces a lot of<br>
> > > >>> packets and then destroys them. So it's none of the devices'<br>
> > > >>> business. At first, I think it results from the low frequency of<br>
> > > >>> cores, so I measured this by two ways: 1. invoking native<br>
> > > >>> Barrelfish interface, sys_debug_get_tsc_per_ms, directly; 2.<br>
> > > >>> reading tsc and sleeping for 1 second using POSIX sleep (which is<br>
> > > >>> implemented by invoking Barrelfish's usleep as I know). However,<br>
> > I<br>
> > > >>> got the full-speed results under both conditions.<br>
> > > >>><br>
> > > >>> So, I don't know whether it results from incorrect measure of<br>
> > > >>> frequency or some other CPU problems because I even tried to<br>
> > start<br>
> > > >>> it with PXE in QEMU, and got full performance.<br>
> > > >>><br>
> > > >>> What can I do to get normal performance on bare-metal? Or, if it<br>
> > > >>> results from low frequency of CPUs, what can I do to tune up the<br>
> > > >>> frequency?<br>
> > > >>><br>
> > > >>><br>
> > > >>> Tom<br>
> > > >>><br>
> > > >>><br>
> > > >>> _______________________________________________<br>
> > > >>> Barrelfish-users mailing list<br>
> > > >>> <a href="mailto:Barrelfish-users@lists.inf.ethz.ch">Barrelfish-users@lists.inf.ethz.ch</a> <mailto:<a href="mailto:Barrelfish-users@">Barrelfish-users@</a><br>
> > > >>> <a href="http://lists.inf.ethz.ch" target="_blank">lists.inf.ethz.ch</a>><br>
> > > >>> <a href="https://lists.inf.ethz.ch/mailman/listinfo/barrelfish-users" target="_blank">https://lists.inf.ethz.ch/mailman/listinfo/barrelfish-users</a><br>
> > > >>><br>
> > > >><br>
> > > >><br>
> > > >><br>
> > > ><br>
> > ><br>
> ><br>
><br>
</div></div></blockquote></div><br></div>