[Barrelfish-users] About CPU performance

Timothy Roscoe troscoe at inf.ethz.ch
Sun Feb 8 20:06:41 CET 2015


So here's a few other ideas:

 - Measure the cost of gettimeofday().   I suspect this is pretty fast
   on Linux, but is no highly optimized on Barrelfish and quite
   possibly results in a system call, potentially followed by a
   reschedule.  I have no idea what QEMU would use to implement this
   either. 

 - Perhaps I'm missing something, but it looks like you have a printf
   inside your timing loop.  On bare metal, this is going to go to the
   UART, which will mean you are limited by the rate at which
   Barrelfish can pump characters down a serial line.  I suggest you
   remove the printf from your timing loop and only print the
   values at the end. 

 - Use rdtsc to get cycle counts, both for gettimeofday() (see above)
   and for iterations of your function. 

 - Look at the machine code being generated for Linux and Barrelfish -
   it's possible the compiler flags are different and result in
   different optimizations being applied. 

Hopefully some of this is helpful.

 -- Timothy Roscoe

At Sun, 8 Feb 2015 19:22:58 +0800, "tomsun.0.7" <tomsun.0.7 at gmail.com> wrote:
> Hi Timothy,
> 
> I don't know why you didn't see that reply with code. Anyway, it's like
> this:
> 
> > CPU: Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz
> >
> > QEMU execution time:           5509153 us
> > Bare-metal execution time: 10802345 us
> > w/o hyper-threading:              6065235 us
> >
> > code:
> >
> > #include <stdio.h>
> > #include <sys/time.h>
> > long fib(long a, long b, long depth)
> > {
> >     if (depth > 0) {
> >         return fib(b, a + b, depth - 1);
> >     }
> >     return b;
> > }
> > int main(void)
> > {
> >     struct timeval start;
> >     gettimeofday(&start, NULL);
> >     printf("fib: %ld\n", fib(1, 1, 10000000000));
> >     struct timeval end;
> >     gettimeofday(&end, NULL);
> >     printf("time: %ld us\n", end.tv_usec - start.tv_usec + (end.tv_sec -
> > start.tv_sec) * 10000\00);
> >     return 0;
> > }
> >
> >
> On Sun, Feb 8, 2015 at 7:10 PM, Timothy Roscoe <troscoe at inf.ethz.ch> wrote:
> 
> >
> > Dear Tom,
> >
> > Perhaps it would be easier for us to understand your problem if you
> > posted the code you running, and also how you are taking your
> > measurements.   Can you share that with us?
> >
> >  -- Timothy
> >
> > At Sun, 8 Feb 2015 12:37:09 +0800, "tomsun.0.7" <tomsun.0.7 at gmail.com>
> > wrote:
> > > Really sorry for misunderstanding your reply but I also considered about
> > > QEMU's imprecise measure of time. I increased the steps of additions by
> > 10
> > > times so that it will take around one minute to finish. Then I started
> > > two Barrelfish (one in QEMU and one on bare-metal) and spawned the same
> > > application at the same time (nearly, no guarantee for submillisecond
> > > difference), but I found there is a obvious gap (about 5 seconds) between
> > > their end time to finish.
> > >
> > > And I have read your SOSP paper, which gives an evaluation with OpenMP.
> > > Indeed, the performance for insert sort is nearly the same with the one
> > on
> > > Linux while running on one core. So, I become more curious about why I
> > got
> > > different performance.
> > >
> > > Is there any tools, like profiling tools, I can leverage to find out the
> > > reasons?
> > >
> > > On Sun, Feb 8, 2015 at 3:51 AM, Simon Peter <speter at inf.ethz.ch> wrote:
> > >
> > > > What I mean is that in order to measure performance in terms of
> > execution
> > > > time (as I can glance from your previous email that had results in
> > it), you
> > > > first need a notion of time. Barrelfish's notion of time is off on
> > QEMU.
> > > > Hence, you might be seeing wrong results.
> > > >
> > > > We have compared Barrelfish to Linux performance on bare-metal
> > hardware in
> > > > various papers, such as our SOSP paper.
> > > >
> > > > On 02/06/2015 07:02 PM, tomsun.0.7 wrote:
> > > >
> > > >> I don't actually care about the measure of CPU speed, I want to know
> > why
> > > >> Barrelfish performs worse on bare-metal than on QEMU with KVM.
> > > >>
> > > >> As my second reply demonstrated, I got worse performance while running
> > > >> applications on bare-metal than both QEMU with KVM and native Linux.
> > > >>
> > > >> I made sure that it doesn't result from CPU frequency, because I
> > > >> accessed the hardware performance registers directly within kernel and
> > > >> found it did run with full speed.
> > > >>
> > > >> Now, I'm suspecting that is it possible that the performance is
> > > >> influenced by some other factors like, device interrupts?
> > > >> Have you ever measured the performance of Barrelfish and compared it
> > > >> with Linux or other operating system?
> > > >>
> > > >> On Sat, Feb 7, 2015 at 3:47 AM, Simon Peter <speter at inf.ethz.ch
> > > >> <mailto:speter at inf.ethz.ch>> wrote:
> > > >>
> > > >>     I'm also suspecting that it might just be jittery CPU emulation
> > > >>     speed that's getting you different results. Barrelfish's usleep
> > > >>     ultimately uses sys_debug_get_tsc_per_ms, so your 2 ways might
> > > >>     actually be the same. Barrelfish measures CPU speed at bootup, but
> > > >>     it's very bad at figuring it out correctly on QEMU. I'm not sure
> > > >>     what the best way is to get accurate results on QEMU.
> > > >>
> > > >>
> > > >>     On 15-02-04 09:46 PM, tomsun.0.7 wrote:
> > > >>
> > > >>>     Hi,
> > > >>>
> > > >>>     I started a network application who is dedicated to produce
> > > >>>     packets all the time. However, when I started it on bare-metal, I
> > > >>>     found the throughput is only a half of running in QEMU (of
> > course,
> > > >>>     with KVM enabled).
> > > >>>
> > > >>>     This application is only CPU-intensive, it just produces a lot of
> > > >>>     packets and then destroys them. So it's none of the devices'
> > > >>>     business. At first, I think it results from the low frequency of
> > > >>>     cores, so I measured this by two ways: 1. invoking native
> > > >>>     Barrelfish interface, sys_debug_get_tsc_per_ms, directly; 2.
> > > >>>     reading tsc and sleeping for 1 second using POSIX sleep (which is
> > > >>>     implemented by invoking Barrelfish's usleep as I know). However,
> > I
> > > >>>     got the full-speed results under both conditions.
> > > >>>
> > > >>>     So, I don't know whether it results from incorrect measure of
> > > >>>     frequency or some other CPU problems because I even tried to
> > start
> > > >>>     it with PXE in QEMU, and got full performance.
> > > >>>
> > > >>>     What can I do to get normal performance on bare-metal? Or, if it
> > > >>>     results from low frequency of CPUs, what can I do to tune up the
> > > >>>     frequency?
> > > >>>
> > > >>>
> > > >>>     Tom
> > > >>>
> > > >>>
> > > >>>     _______________________________________________
> > > >>>     Barrelfish-users mailing list
> > > >>>     Barrelfish-users at lists.inf.ethz.ch  <mailto:Barrelfish-users@
> > > >>> lists.inf.ethz.ch>
> > > >>>     https://lists.inf.ethz.ch/mailman/listinfo/barrelfish-users
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >
> > >
> >
> 



More information about the Barrelfish-users mailing list