[Oberon] Fwd: Re: New Oberon to Lua Transpiler

Wed Dec 18 01:30:56 CET 2019

@ Luca Boasso:

Meanwhile I managed to implement a new translator which can rewrite the AST so that call-by-reference is replaced by call-by-value combined with multiple returns (instead of thunc functions).  Here are the new figures:

Test	OBNLC 2019-12-17
Perm     	17
Towers       	13
Queens       	20
Intmm      	14
Mm      	15
Quick     	28
Bubble      	16
Tree      	39
FFT       	17
NFP	472.76
FP	591.24

The code now runs within a factor of two compared to native (OBNC) performance. 

Comparing with https://benchmarksgame-team.pages.debian.net/benchmarksgame/which-programs-are-fastest.html and http://luajit.org/performance_x86.html I conclude that the performance is in the range of the JVM which is faster than I expected.

Best
R.

----Ursprüngliche Nachricht----
Von : rochus.keller at bluewin.ch
Datum : 14/10/2019 - 11:55 (CEST)
An : oberon at lists.inf.ethz.ch
Betreff : Re: [Oberon] New Oberon to Lua Transpiler

@ Luca Boasso:

Thanks for the data. 

It's long time ago I had to deal with JVM bytecode, but I thought to remember that there is a way to get the address of local variables, but maybe I mix it up with CIL/CLR. Allocating a new array for each local variable looks like a rather expensive operation and I'm not sure the LuaJIT optimizer would get rid of it. I already have this concept with structured thunks (i.e. call-by-reference to structure/array elements), but currently it looks like this was one of the bottlenecks. A much cheaper operation would be to use multiple return values for the changed values, but as far as I remember JVM doesn't support it (in contrast to LuaJIT).

But anyway I already had a short-term success: I was able to move the allocation of thunk functions away from the call and use local variables to reference them instead. I found a quite decent way to do that in my current generator without a full re-design. The JIT is now able to find the relevant traces without hitting FNEW and aborting. The speedup is significant. Here are some numbers for comparison (bear with my higher figures than yours since I do this on a ten years old 32 bit Linux laptop):

Test	OBNC
Perm     	11
Towers       	10
Queens       	9
Intmm      	8
Mm      	14
Quick     	10
Bubble      	14
Tree      	11
FFT       	24
NFP	152.19

Test	OBNLC 2019-10-08
Perm     	237
Towers       	11
Queens       	278
Intmm      	51
Mm      	53
Quick     	1030
Bubble      	15
Tree      	39
FFT       	22
NFP	3123
FP	3376
FP	302.55

Test	OBNCL 2019-10-13
Perm     	239
Towers       	11
Queens       	25
Intmm      	55
Mm      	58
Quick     	32
Bubble      	16
Tree      	39
FFT       	22
NFP	705.93
FP	975.11

As you can see my most recent implementation is now pretty close to the expected performance of LuaJIT compared to a native implementation. I will still try other optimizations, especially for Perm.

Best
R.