[Barrelfish-users] About CPU performance
tomsun.0.7
tomsun.0.7 at gmail.com
Mon Feb 9 04:10:19 CET 2015
Thanks a lot, these are indeed reasonable advices! I'll try them later.
On Mon, Feb 9, 2015 at 3:06 AM, Timothy Roscoe <troscoe at inf.ethz.ch> wrote:
>
> So here's a few other ideas:
>
> - Measure the cost of gettimeofday(). I suspect this is pretty fast
> on Linux, but is no highly optimized on Barrelfish and quite
> possibly results in a system call, potentially followed by a
> reschedule. I have no idea what QEMU would use to implement this
> either.
>
> - Perhaps I'm missing something, but it looks like you have a printf
> inside your timing loop. On bare metal, this is going to go to the
> UART, which will mean you are limited by the rate at which
> Barrelfish can pump characters down a serial line. I suggest you
> remove the printf from your timing loop and only print the
> values at the end.
>
> - Use rdtsc to get cycle counts, both for gettimeofday() (see above)
> and for iterations of your function.
>
> - Look at the machine code being generated for Linux and Barrelfish -
> it's possible the compiler flags are different and result in
> different optimizations being applied.
>
> Hopefully some of this is helpful.
>
> -- Timothy Roscoe
>
> At Sun, 8 Feb 2015 19:22:58 +0800, "tomsun.0.7" <tomsun.0.7 at gmail.com>
> wrote:
> > Hi Timothy,
> >
> > I don't know why you didn't see that reply with code. Anyway, it's like
> > this:
> >
> > > CPU: Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz
> > >
> > > QEMU execution time: 5509153 us
> > > Bare-metal execution time: 10802345 us
> > > w/o hyper-threading: 6065235 us
> > >
> > > code:
> > >
> > > #include <stdio.h>
> > > #include <sys/time.h>
> > > long fib(long a, long b, long depth)
> > > {
> > > if (depth > 0) {
> > > return fib(b, a + b, depth - 1);
> > > }
> > > return b;
> > > }
> > > int main(void)
> > > {
> > > struct timeval start;
> > > gettimeofday(&start, NULL);
> > > printf("fib: %ld\n", fib(1, 1, 10000000000));
> > > struct timeval end;
> > > gettimeofday(&end, NULL);
> > > printf("time: %ld us\n", end.tv_usec - start.tv_usec + (end.tv_sec
> -
> > > start.tv_sec) * 10000\00);
> > > return 0;
> > > }
> > >
> > >
> > On Sun, Feb 8, 2015 at 7:10 PM, Timothy Roscoe <troscoe at inf.ethz.ch>
> wrote:
> >
> > >
> > > Dear Tom,
> > >
> > > Perhaps it would be easier for us to understand your problem if you
> > > posted the code you running, and also how you are taking your
> > > measurements. Can you share that with us?
> > >
> > > -- Timothy
> > >
> > > At Sun, 8 Feb 2015 12:37:09 +0800, "tomsun.0.7" <tomsun.0.7 at gmail.com>
> > > wrote:
> > > > Really sorry for misunderstanding your reply but I also considered
> about
> > > > QEMU's imprecise measure of time. I increased the steps of additions
> by
> > > 10
> > > > times so that it will take around one minute to finish. Then I
> started
> > > > two Barrelfish (one in QEMU and one on bare-metal) and spawned the
> same
> > > > application at the same time (nearly, no guarantee for submillisecond
> > > > difference), but I found there is a obvious gap (about 5 seconds)
> between
> > > > their end time to finish.
> > > >
> > > > And I have read your SOSP paper, which gives an evaluation with
> OpenMP.
> > > > Indeed, the performance for insert sort is nearly the same with the
> one
> > > on
> > > > Linux while running on one core. So, I become more curious about why
> I
> > > got
> > > > different performance.
> > > >
> > > > Is there any tools, like profiling tools, I can leverage to find out
> the
> > > > reasons?
> > > >
> > > > On Sun, Feb 8, 2015 at 3:51 AM, Simon Peter <speter at inf.ethz.ch>
> wrote:
> > > >
> > > > > What I mean is that in order to measure performance in terms of
> > > execution
> > > > > time (as I can glance from your previous email that had results in
> > > it), you
> > > > > first need a notion of time. Barrelfish's notion of time is off on
> > > QEMU.
> > > > > Hence, you might be seeing wrong results.
> > > > >
> > > > > We have compared Barrelfish to Linux performance on bare-metal
> > > hardware in
> > > > > various papers, such as our SOSP paper.
> > > > >
> > > > > On 02/06/2015 07:02 PM, tomsun.0.7 wrote:
> > > > >
> > > > >> I don't actually care about the measure of CPU speed, I want to
> know
> > > why
> > > > >> Barrelfish performs worse on bare-metal than on QEMU with KVM.
> > > > >>
> > > > >> As my second reply demonstrated, I got worse performance while
> running
> > > > >> applications on bare-metal than both QEMU with KVM and native
> Linux.
> > > > >>
> > > > >> I made sure that it doesn't result from CPU frequency, because I
> > > > >> accessed the hardware performance registers directly within
> kernel and
> > > > >> found it did run with full speed.
> > > > >>
> > > > >> Now, I'm suspecting that is it possible that the performance is
> > > > >> influenced by some other factors like, device interrupts?
> > > > >> Have you ever measured the performance of Barrelfish and compared
> it
> > > > >> with Linux or other operating system?
> > > > >>
> > > > >> On Sat, Feb 7, 2015 at 3:47 AM, Simon Peter <speter at inf.ethz.ch
> > > > >> <mailto:speter at inf.ethz.ch>> wrote:
> > > > >>
> > > > >> I'm also suspecting that it might just be jittery CPU
> emulation
> > > > >> speed that's getting you different results. Barrelfish's
> usleep
> > > > >> ultimately uses sys_debug_get_tsc_per_ms, so your 2 ways might
> > > > >> actually be the same. Barrelfish measures CPU speed at
> bootup, but
> > > > >> it's very bad at figuring it out correctly on QEMU. I'm not
> sure
> > > > >> what the best way is to get accurate results on QEMU.
> > > > >>
> > > > >>
> > > > >> On 15-02-04 09:46 PM, tomsun.0.7 wrote:
> > > > >>
> > > > >>> Hi,
> > > > >>>
> > > > >>> I started a network application who is dedicated to produce
> > > > >>> packets all the time. However, when I started it on
> bare-metal, I
> > > > >>> found the throughput is only a half of running in QEMU (of
> > > course,
> > > > >>> with KVM enabled).
> > > > >>>
> > > > >>> This application is only CPU-intensive, it just produces a
> lot of
> > > > >>> packets and then destroys them. So it's none of the devices'
> > > > >>> business. At first, I think it results from the low
> frequency of
> > > > >>> cores, so I measured this by two ways: 1. invoking native
> > > > >>> Barrelfish interface, sys_debug_get_tsc_per_ms, directly; 2.
> > > > >>> reading tsc and sleeping for 1 second using POSIX sleep
> (which is
> > > > >>> implemented by invoking Barrelfish's usleep as I know).
> However,
> > > I
> > > > >>> got the full-speed results under both conditions.
> > > > >>>
> > > > >>> So, I don't know whether it results from incorrect measure of
> > > > >>> frequency or some other CPU problems because I even tried to
> > > start
> > > > >>> it with PXE in QEMU, and got full performance.
> > > > >>>
> > > > >>> What can I do to get normal performance on bare-metal? Or,
> if it
> > > > >>> results from low frequency of CPUs, what can I do to tune up
> the
> > > > >>> frequency?
> > > > >>>
> > > > >>>
> > > > >>> Tom
> > > > >>>
> > > > >>>
> > > > >>> _______________________________________________
> > > > >>> Barrelfish-users mailing list
> > > > >>> Barrelfish-users at lists.inf.ethz.ch <mailto:
> Barrelfish-users@
> > > > >>> lists.inf.ethz.ch>
> > > > >>> https://lists.inf.ethz.ch/mailman/listinfo/barrelfish-users
> > > > >>>
> > > > >>
> > > > >>
> > > > >>
> > > > >
> > > >
> > >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.inf.ethz.ch/pipermail/barrelfish-users/attachments/20150209/f13740be/attachment.html
More information about the Barrelfish-users
mailing list