[Barrelfish-users] About CPU performance

Mon Feb 9 04:10:19 CET 2015

Thanks a lot, these are indeed reasonable advices! I'll try them later.

On Mon, Feb 9, 2015 at 3:06 AM, Timothy Roscoe <troscoe at inf.ethz.ch> wrote:

>
> So here's a few other ideas:
>
>  - Measure the cost of gettimeofday().   I suspect this is pretty fast
>    on Linux, but is no highly optimized on Barrelfish and quite
>    possibly results in a system call, potentially followed by a
>    reschedule.  I have no idea what QEMU would use to implement this
>    either.
>
>  - Perhaps I'm missing something, but it looks like you have a printf
>    inside your timing loop.  On bare metal, this is going to go to the
>    UART, which will mean you are limited by the rate at which
>    Barrelfish can pump characters down a serial line.  I suggest you
>    remove the printf from your timing loop and only print the
>    values at the end.
>
>  - Use rdtsc to get cycle counts, both for gettimeofday() (see above)
>    and for iterations of your function.
>
>  - Look at the machine code being generated for Linux and Barrelfish -
>    it's possible the compiler flags are different and result in
>    different optimizations being applied.
>
> Hopefully some of this is helpful.
>
>  -- Timothy Roscoe
>
> At Sun, 8 Feb 2015 19:22:58 +0800, "tomsun.0.7" <tomsun.0.7 at gmail.com>
> wrote:
> > Hi Timothy,
> >
> > I don't know why you didn't see that reply with code. Anyway, it's like
> > this:
> >
> > > CPU: Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz
> > >
> > > QEMU execution time:           5509153 us
> > > Bare-metal execution time: 10802345 us
> > > w/o hyper-threading:              6065235 us
> > >
> > > code:
> > >
> > > #include <stdio.h>
> > > #include <sys/time.h>
> > > long fib(long a, long b, long depth)
> > > {
> > >     if (depth > 0) {
> > >         return fib(b, a + b, depth - 1);
> > >     }
> > >     return b;
> > > }
> > > int main(void)
> > > {
> > >     struct timeval start;
> > >     gettimeofday(&start, NULL);
> > >     printf("fib: %ld\n", fib(1, 1, 10000000000));
> > >     struct timeval end;
> > >     gettimeofday(&end, NULL);
> > >     printf("time: %ld us\n", end.tv_usec - start.tv_usec + (end.tv_sec
> -
> > > start.tv_sec) * 10000\00);
> > >     return 0;
> > > }
> > >
> > >
> > On Sun, Feb 8, 2015 at 7:10 PM, Timothy Roscoe <troscoe at inf.ethz.ch>
> wrote:
> >
> > >
> > > Dear Tom,
> > >
> > > Perhaps it would be easier for us to understand your problem if you
> > > posted the code you running, and also how you are taking your
> > > measurements.   Can you share that with us?
> > >
> > >  -- Timothy
> > >
> > > At Sun, 8 Feb 2015 12:37:09 +0800, "tomsun.0.7" <tomsun.0.7 at gmail.com>
> > > wrote:
> > > > Really sorry for misunderstanding your reply but I also considered
> about
> > > > QEMU's imprecise measure of time. I increased the steps of additions
> by
> > > 10
> > > > times so that it will take around one minute to finish. Then I
> started
> > > > two Barrelfish (one in QEMU and one on bare-metal) and spawned the
> same
> > > > application at the same time (nearly, no guarantee for submillisecond
> > > > difference), but I found there is a obvious gap (about 5 seconds)
> between
> > > > their end time to finish.
> > > >
> > > > And I have read your SOSP paper, which gives an evaluation with
> OpenMP.
> > > > Indeed, the performance for insert sort is nearly the same with the
> one
> > > on
> > > > Linux while running on one core. So, I become more curious about why
> I
> > > got
> > > > different performance.
> > > >
> > > > Is there any tools, like profiling tools, I can leverage to find out
> the
> > > > reasons?
> > > >
> > > > On Sun, Feb 8, 2015 at 3:51 AM, Simon Peter <speter at inf.ethz.ch>
> wrote:
> > > >
> > > > > What I mean is that in order to measure performance in terms of
> > > execution
> > > > > time (as I can glance from your previous email that had results in
> > > it), you
> > > > > first need a notion of time. Barrelfish's notion of time is off on
> > > QEMU.
> > > > > Hence, you might be seeing wrong results.
> > > > >
> > > > > We have compared Barrelfish to Linux performance on bare-metal
> > > hardware in
> > > > > various papers, such as our SOSP paper.
> > > > >
> > > > > On 02/06/2015 07:02 PM, tomsun.0.7 wrote:
> > > > >
> > > > >> I don't actually care about the measure of CPU speed, I want to
> know
> > > why
> > > > >> Barrelfish performs worse on bare-metal than on QEMU with KVM.
> > > > >>
> > > > >> As my second reply demonstrated, I got worse performance while
> running
> > > > >> applications on bare-metal than both QEMU with KVM and native
> Linux.
> > > > >>
> > > > >> I made sure that it doesn't result from CPU frequency, because I
> > > > >> accessed the hardware performance registers directly within
> kernel and
> > > > >> found it did run with full speed.
> > > > >>
> > > > >> Now, I'm suspecting that is it possible that the performance is
> > > > >> influenced by some other factors like, device interrupts?
> > > > >> Have you ever measured the performance of Barrelfish and compared
> it
> > > > >> with Linux or other operating system?
> > > > >>
> > > > >> On Sat, Feb 7, 2015 at 3:47 AM, Simon Peter <speter at inf.ethz.ch
> > > > >> <mailto:speter at inf.ethz.ch>> wrote:
> > > > >>
> > > > >>     I'm also suspecting that it might just be jittery CPU
> emulation
> > > > >>     speed that's getting you different results. Barrelfish's
> usleep
> > > > >>     ultimately uses sys_debug_get_tsc_per_ms, so your 2 ways might
> > > > >>     actually be the same. Barrelfish measures CPU speed at
> bootup, but
> > > > >>     it's very bad at figuring it out correctly on QEMU. I'm not
> sure
> > > > >>     what the best way is to get accurate results on QEMU.
> > > > >>
> > > > >>
> > > > >>     On 15-02-04 09:46 PM, tomsun.0.7 wrote:
> > > > >>
> > > > >>>     Hi,
> > > > >>>
> > > > >>>     I started a network application who is dedicated to produce
> > > > >>>     packets all the time. However, when I started it on
> bare-metal, I
> > > > >>>     found the throughput is only a half of running in QEMU (of
> > > course,
> > > > >>>     with KVM enabled).
> > > > >>>
> > > > >>>     This application is only CPU-intensive, it just produces a
> lot of
> > > > >>>     packets and then destroys them. So it's none of the devices'
> > > > >>>     business. At first, I think it results from the low
> frequency of
> > > > >>>     cores, so I measured this by two ways: 1. invoking native
> > > > >>>     Barrelfish interface, sys_debug_get_tsc_per_ms, directly; 2.
> > > > >>>     reading tsc and sleeping for 1 second using POSIX sleep
> (which is
> > > > >>>     implemented by invoking Barrelfish's usleep as I know).
> However,
> > > I
> > > > >>>     got the full-speed results under both conditions.
> > > > >>>
> > > > >>>     So, I don't know whether it results from incorrect measure of
> > > > >>>     frequency or some other CPU problems because I even tried to
> > > start
> > > > >>>     it with PXE in QEMU, and got full performance.
> > > > >>>
> > > > >>>     What can I do to get normal performance on bare-metal? Or,
> if it
> > > > >>>     results from low frequency of CPUs, what can I do to tune up
> the
> > > > >>>     frequency?
> > > > >>>
> > > > >>>
> > > > >>>     Tom
> > > > >>>
> > > > >>>
> > > > >>>     _______________________________________________
> > > > >>>     Barrelfish-users mailing list
> > > > >>>     Barrelfish-users at lists.inf.ethz.ch  <mailto:
> Barrelfish-users@
> > > > >>> lists.inf.ethz.ch>
> > > > >>>     https://lists.inf.ethz.ch/mailman/listinfo/barrelfish-users
> > > > >>>
> > > > >>
> > > > >>
> > > > >>
> > > > >
> > > >
> > >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.inf.ethz.ch/pipermail/barrelfish-users/attachments/20150209/f13740be/attachment.html