[Oberon] Oberon performance (long)

Thu Apr 17 23:37:56 CEST 2003

Hi

> This is correct. A few instructions could be removed.
> There are basically two approaches: finding a small local optimization
> which achieves this (e.g. the register allocator could remember the
> last value loaded in a register and propose the same register instead
> of loading it again), or implementing CSE (common subexpression
> elimination) in the compiler.
> 
> The compiler first emits the code in an intermediate representation (LIR)
> (defined
> in PCLIR). Then the installed backend (PCG386 + PCO for i386) will
> translate the LIR to i386 after a small optimization step to reconstruct the
> complex addressing modes instead (and at the same time remove a few
> instructions).
> 

Well, it was the first thing I was thinking about, and may be I'll go this way (or both woys). But just now
a small modification in PCC.Field has compiled my toy example rather well, and I'm interested to see if
it could be done at the level of intermediate code generation. Here are the relevant bits.

PROCEDURE Field*(code: Code;  VAR x: Item;  fld: PCT.Field);
BEGIN
	(* new code *)
	IF (lastLoad.level = x.level) & (lastLoad.offs = x.offs) THEN
		x := lastLoad;
	ELSE
	 	LoadAdr(code, x);
	 	lastLoad := x;
	END; (* if *)

	(* old code
	LoadAdr(code, x);
	*)

	x.mode := RegRel;  x.offs := fld.adr(PCBT.Variable).offset;  x.type := fld.type;
	x.deref := FALSE
END Field;

  Of course, ``lastLoad'' is new variable of type ``Item'', and it is initialazed like this
  lastLoad.level := -1; 

  As for the results:

MODULE Simple;

TYPE
	Simple	= OBJECT
	VAR 
		x, y, z: LONGINT;

		PROCEDURE &Init;
		BEGIN
			x := 1; y := 2; z := 3;
		END Init;

		PROCEDURE Sum () : LONGINT;
		BEGIN
			RETURN x + y + z
		END Sum;

	END Simple;
END Simple.

Decoder.Decode Simple.Obx ~

PROCEDURE Simple.Init
0007H: 55                                  PUSH    EBP
0008H: 8B EC                               MOV     EBP,ESP
000AH: 8B 5D 08                            MOV     EBX,8[EBP]
000DH: C7 03 01 00 00 00                   MOV     0[EBX],1
0013H: C7 43 04 02 00 00 00                MOV     4[EBX],2
001AH: C7 43 08 03 00 00 00                MOV     8[EBX],3
0021H: 8B E5                               MOV     ESP,EBP
0023H: 5D                                  POP     EBP
0024H: C2 04 00                            RET     4

PROCEDURE Simple.Sum
0027H: 55                                  PUSH    EBP
0028H: 8B EC                               MOV     EBP,ESP
002AH: 8B 5D 08                            MOV     EBX,8[EBP]
002DH: 8B 03                               MOV     EAX,0[EBX]
002FH: 03 43 04                            ADD     EAX,4[EBX]
0032H: 03 43 08                            ADD     EAX,8[EBX]
0035H: 8B E5                               MOV     ESP,EBP
0037H: 5D                                  POP     EBP
0038H: C2 04 00                            RET     4
003BH: 6A 03                               PUSH    3
003DH: CC                                  INT     3

  (Of course it fails, on any real-world programs.)
  I have tried to generalize this scheme, but I didn't succeed very much. In fact at all.

  There is one more question. Is the code generation done concurrently?
And how can I know which data belongs to which process? Using AosActive.ActiveObject() ?
May this be a problem if I choose to modify the register allocator in x86 backend?

  Any way, I will have to look deeply into the code. But this is a nice intelectual challenge and I like it.

Regards

--
How apt the poor are to be proud.
		-- William Shakespeare, "Twelfth-Night"