[Oberon] Item technique.

Mon Sep 4 11:33:14 CEST 2017

NOTE ON THE “ITEM TECHNIQUE” USED IN THE OBERON COMPILER

One key idea and motivation behind the “Item” technique used in the Oberon compiler is to represent *anonymous* entities that are generated during compilation of an Oberon program in an *efficient* way. This is achieved by using the *stack* for them instead of the *heap*.

Recall that when *declarations* (i.e. identifiers such as constants, types, variables or procedures) are processed by the compiler, objects (of type ORB.Object in Oberon on RISC) in a dynamic structure are created, so that they can be referred to later during the compilation, for example during code generation. There is one object per declared identifier. Naturally, such objects are allocated in the *heap*. They are rooted in a global variable (variable ORB.topScope in Oberon on RISC), as they need to be available throughout compilation. For example, an instruction at the end of an Oberon program might refer to a variable declared at the beginning of it.

However, *in addition* to these permanently (i.e, for the duration of the compilation") available objects, there are also objects that are only of *transitory* nature, for example when an expression b*c is parsed in an assignment a := b*c.

It would simply be too expensive - both in terms of memory requirements and execution speed - to also allocate such objects in the heap during compilation. A much better way is to use the *stack* instead. Why the stack? Recall that the stack is used during procedure activations and holds parameters and local variables, plus a few other data fields such as the return address. The important observation here is that “allocation” and “deallocation” of parameters and local variables is both automatic and super-fast - as one only needs to adjust the stack pointer when calling a procedure, and re-adjust it again when exiting the procedure.

So, if you parse the expression b*c in the above example, the idea is to simply pass two variables of type Item (ORG.Item in Oberon on RISC) to the corresponding code generation procedure. And indeed, the code generation procedure for a multiplication (procedure ORG.MulOp) has the signature

  PROCEDURE MulOp*(VAR x, y: Item);   (* x := x * y *)

Note that x and y are variable (!) parameters, which means that the items are passed *by reference* and not by value (i.e. they are not copied). In practice, the caller typically uses this arrangement as follows. The first parameter, say x, is itself passed as a (variable) parameter to the caller (the “incoming item” so to speak), while the second parameter, say y, is simply declared locally to the callee. Both are first parsed, and *then* passed together to the code generation procedure. As an example, procedure ‘term' in the Oberon parser (procedure ORP.term in Oberon on RISC), has the signature

  PROCEDURE term(VAR x: ORG.Item);

Here, the parameter x is simply the “incoming item”. It is itself typically a local variable in some other procedure of the parser. But procedure ’term’ itself also declares a *local* (which simply means that it will automatically go away then moment procedure ’term’ exits) variable ‘y' to hold the second argument in the expression x*y. There is is:

  VAR y: ORG.Item

Then, in the code section of ORG.term, you can simply write something along the lines of

  factor(x);
  ORS.GetSym;
  IF sym = times THEN …  factor(y) … ;  ORG.MulOp(x, y)  END

The first instruction factor(x) simply parses the “incoming item” x. As mentioned above, x may the local variable of another procedure in the parser (here procedure ‘expression’), or it may itself have been passed as a parameter to the caller. This allows for unlimited recursion of course. The instruction factor(y) then parses the expression ‘y,’ which uses a *local* a variable ‘y’ as its item, since this item is not used outside procedure ’term’. Once both x (represented by an item passed as a variable parameter) and y (represented by an item which is simply a locally declared record) have been parsed, both items are together passed (again by reference) to the code generating procedure, here procedure ORG.MulOp.

As this example nicely illustrates, the two *transitory* items x and y are simply "passed around” by reference on the stack by virtue of repeated calls to other procedures in the parser. Of course, all these items are simply local variables themselves, i.e. are also allocated in the stack. In the above example, the local variable y in procedure ORP.term *only* exists as long as procedure term is executed, while procedure ORG.MulOp can access x and y in an efficient way, as these parameters are passed by reference.

And this simply means that once the procedures are done compiling a particular expression (x*y in the above example), these temporary entities also automatically “disappear” with near-zero overhead. No allocation or deallocation on the heap for x and y is necessary, and it is highly efficient. And once compilation of an Oberon program is finished, *all* items "magically disappear”. But there is no magic to it. This is simply because all procedures in the compiler have been executed. Of course, it all started with one local variable of type Item somewhere at the top level in the compiler, for example procedure ORP.StatSequence - which itself declares items as local (record) variables.

That’s the basic idea - you use the *heap* for *declared identifiers* as you must be able refer to them throughout compilation, and you use the (super-efficient) *stack* for objects which are only of *transitory* nature.

Note that items have a few additional properties, for example as related to addressing modes of the underlying computer (there is one item mode per addressing mode), but these are of technical nature and are not related to the main idea described above. These are described in the documentation of the Oberon system and its compiler, all available in the web site https://www.inf.ethz.ch/personal/wirth .

AP

[Oberon] Item technique.

Srinivas Nayak sinu.nayak2001 at gmail.com  <mailto:oberon%40lists.inf.ethz.ch?Subject=Re:%20Re%3A%20%5BOberon%5D%20Item%20technique.&In-Reply-To=%3C5985DDBE.6080305%40gmail.com%3E>
Sat Aug 5 17:01:18 CEST 2017

Previous message: [Oberon] RISC5 Oberon now on Spartan 7 <http://lists.inf.ethz.ch/pipermail/oberon/2017/010815.html>
Messages sorted by: [ date ] <http://lists.inf.ethz.ch/pipermail/oberon/2017/date.html#10817> [ thread ] <http://lists.inf.ethz.ch/pipermail/oberon/2017/thread.html#10817> [ subject ] <http://lists.inf.ethz.ch/pipermail/oberon/2017/subject.html#10817> [ author ] <http://lists.inf.ethz.ch/pipermail/oberon/2017/author.html#10817>
Dear Friends,

In his paper "Compiler Construction The Art of Niklaus Wirth",
ftp://ftp.ssw.uni-linz.ac.at/pub/Papers/Moe00b.pdf, <ftp://ftp.ssw.uni-linz.ac.at/pub/Papers/Moe00b.pdf,>
Hanspeter Mossenbock writes, "The spine of Wirth's
code generators is what I call the Item technique."
He also says, it is described in
Wirth's "Compiler Construction" book.

Do we have any other thesis or paper or book,
where this idea is more simply presented,
than Wirth's book?

I loved the technique very much,
from whatever little I understood
from my exercise of porting Wirth's compiler to Linux.
I wish to learn it deeper.

By the way, do we have source code files of
Oberon-0 compiler for RISC5 anywhere?

With thanks and best regards,

Yours sincerely,
Srinivas Nayak

Home: http://www.mathmeth.com/sn/ <http://www.mathmeth.com/sn/>
Blog: http://srinivas-nayak.blogspot.in/ <http://srinivas-nayak.blogspot.in/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.inf.ethz.ch/pipermail/oberon/attachments/20170904/e932858b/attachment.html>