[Oberon] Fwd: Re: New Oberon to Lua Transpiler
rochus.keller at bluewin.ch
rochus.keller at bluewin.ch
Wed Dec 18 01:30:56 CET 2019
@ Luca Boasso:
Meanwhile I managed to implement a new translator which can rewrite the AST so that call-by-reference is replaced by call-by-value combined with multiple returns (instead of thunc functions). Here are the new figures:
Test OBNLC 2019-12-17
Perm 17
Towers 13
Queens 20
Intmm 14
Mm 15
Quick 28
Bubble 16
Tree 39
FFT 17
NFP 472.76
FP 591.24
The code now runs within a factor of two compared to native (OBNC) performance.
Comparing with https://benchmarksgame-team.pages.debian.net/benchmarksgame/which-programs-are-fastest.html and http://luajit.org/performance_x86.html I conclude that the performance is in the range of the JVM which is faster than I expected.
Best
R.
----Ursprüngliche Nachricht----
Von : rochus.keller at bluewin.ch
Datum : 14/10/2019 - 11:55 (CEST)
An : oberon at lists.inf.ethz.ch
Betreff : Re: [Oberon] New Oberon to Lua Transpiler
@ Luca Boasso:
Thanks for the data.
It's long time ago I had to deal with JVM bytecode, but I thought to remember that there is a way to get the address of local variables, but maybe I mix it up with CIL/CLR. Allocating a new array for each local variable looks like a rather expensive operation and I'm not sure the LuaJIT optimizer would get rid of it. I already have this concept with structured thunks (i.e. call-by-reference to structure/array elements), but currently it looks like this was one of the bottlenecks. A much cheaper operation would be to use multiple return values for the changed values, but as far as I remember JVM doesn't support it (in contrast to LuaJIT).
But anyway I already had a short-term success: I was able to move the allocation of thunk functions away from the call and use local variables to reference them instead. I found a quite decent way to do that in my current generator without a full re-design. The JIT is now able to find the relevant traces without hitting FNEW and aborting. The speedup is significant. Here are some numbers for comparison (bear with my higher figures than yours since I do this on a ten years old 32 bit Linux laptop):
Test OBNC
Perm 11
Towers 10
Queens 9
Intmm 8
Mm 14
Quick 10
Bubble 14
Tree 11
FFT 24
NFP 152.19
Test OBNLC 2019-10-08
Perm 237
Towers 11
Queens 278
Intmm 51
Mm 53
Quick 1030
Bubble 15
Tree 39
FFT 22
NFP 3123
FP 3376
FP 302.55
Test OBNCL 2019-10-13
Perm 239
Towers 11
Queens 25
Intmm 55
Mm 58
Quick 32
Bubble 16
Tree 39
FFT 22
NFP 705.93
FP 975.11
As you can see my most recent implementation is now pretty close to the expected performance of LuaJIT compared to a native implementation. I will still try other optimizations, especially for Perm.
Best
R.
More information about the Oberon
mailing list