[Oberon] Bit-fiddling: SETs and type casts in Oberon-07

joerg.straube at iaeth.ch joerg.straube at iaeth.ch
Mon Aug 8 07:24:04 CEST 2022


Hans

As Chris wrote a few mails ago, in Oberon-07 you normally write:
   PROCEDURE AsciiCAP(ch: CHAR): CHAR; RETURN CHR(ORD(ch) - 32) END AsciiCAP;
The compiler generates a SUB.

However, if you want the compiler to generate an AND you write:
    PROCEDURE AsciiCAP(ch: CHAR): CHAR; RETURN CHR(ORD(SYSTEM.VAL(SET, ORD(ch)) - {5})) END AsciiCAP;

The above assumes your compiler has the same size for INTEGER and SET.

On RISC5 both functions generate one instruction (so have the same length) and have the same execution time as SUB and ANN both need one cycle.
Because of this, I prefer the first form over the last.

br
Jörg

Von: Oberon <oberon-bounces at lists.inf.ethz.ch> im Auftrag von Hans Klaver <hklaver at dds.nl>
Datum: Sonntag, 7. August 2022 um 21:53
An: ETH Oberon and related systems <oberon at lists.inf.ethz.ch>
Betreff: Re: [Oberon] Bit-fiddling: SETs and type casts in Oberon-07
Chris, Florian and Jörg, thanks for your answers so far!

Maybe I was not clear enough.
My questions are not so much about capitalization algorithms (that was just an illustration of a use case) but more about how several Oberon-07 compilers differ in their implementation of the (function) procedures SYSTEM.VAL and INCL.

I am aware that Oberon-07 does not define a certain character set. But even then a function along the lines of the classic CAP function of Pascal, Modula-2 and Oberon-90 (which just clears a bit of a CHAR, without checking its input) will be useful as part of more elaborate capitalization algorithms. This is because several often used parts of many character encodings (like ASCII, ISO-8859-1 (8-bit single byte encoded Latin-1) and the least significant byte of UTF-8 characters) can be capitalized by flipping one or a few bits of several ranges of characters.

Given that fact I tried to write a simple Oberon-07 CAP function which just clears bit 5 of a byte, and stumbled upon the different implementations of SYSTEM.VAL and INCL in the two compilers I use regularly.

Florian wrote:

EXCL(SYSTEM.VAL(SET, ch), 5)

All compilers I know access the variable ch as if it was declared as a SET in this case. This means that EXCL actually overwrites SIZE(SET) bytes rather than SIZE(CHAR) which may definitely lead to undefined behavour if these sizes don't match.

That's OK; I could still do useful things, but of course implementation-dependent. As a programmer I am warned for this because the VAL function is from the SYSTEM pseudomodule; that's one nice feature of Oberon I like very much.

My problem though is that neither of the two Oberon-07 compilers I checked accepts the above statement. Which was a bit of a surprise to me. After all it is a SYSTEM function, so now it is the responsability of the programmer to check if the desired result is reached.

In one of the two compilers SYSTEM.VAL(SET, ...) only is allowed INTEGER variables as second parameter, so a CHAR variable of an ASCII character set first has to make a detour via BYTE and INTEGER:

  VAR ch: CHAR;  b: BYTE;  i: INTEGER;
  ...
  EXCL(SYSTEM.VAL(SET, ch), 5)   (* error: "casting not allowed" *)
  ...
  b := SYSTEM.VAL(BYTE, ch);
  EXCL(SYSTEM.VAL(SET, b), 5)    (* error: "casting not allowed" *)
  ...

  i := SYSTEM.VAL(BYTE, ch);     (* cast CHAR to BYTE and assign to INTEGER *)
  EXCL(SYSTEM.VAL(SET, i), 5)    (* clear bit 5 *)
  RETURN SYSTEM.VAL(CHAR, i)     (* cast INTEGER directly back to CHAR *)

Only the last three lines are accepted by this compiler, and return the desired result (bit 5 of ch cleared).
Unexpectedly (for me) the casting back from INTEGER to CHAR *is* allowed by this compiler!

The other compiler I checked does allow the direct casting of CHAR to SET, but (in contrast to the above compiler) requires a variable as first parameter of procedure EXCL:

  VAR ch: CHAR;  b: BYTE;  i: INTEGER;  s: SET;
  ...
  EXCL(SYSTEM.VAL(SET, ch), 5) (* error: "variable expected in substitution of parameter 1: EXCL" *)
  ...
  s := SYSTEM.VAL(SET, ch);    (* cast CHAR to SET *)
  EXCL(s, 5)                   (* clear bit 5 *)
  RETURN SYSTEM.VAL(CHAR, s)   (* cast SET back to CHAR *)

I find this code still is a bit convoluted.

I would rather have my Oberon-07 compiler to accept the following (my responsibility for the correct result):

 VAR ch: CHAR;
 ...
 EXCL(SYSTEM.VAL(SET, ch), 5)   (* cast CHAR to SET and clear bit 5 *)
 RETURN ch
 ...

My questions now are:
- the behaviour regarding SYSTEM.VAL and EXCL of which of the two above mentioned compilers do you like best?
- do you know an Oberon-07 compiler that could compile the last lines of code, and what do you think of it?

In other words: what is your opinion about the desired behaviour of procedures SYSTEM.VAL and EXCL in an ideal Oberon-07 compiler?

Regards,

Hans




On 7 Aug. 2022, at 07:48 joerg.straube at iaeth.ch<mailto:joerg.straube at iaeth.ch> wrote:

We are in heavy grey zone here 😊
First, Oberon-07 does not define any character set. Whether the compiler internally uses US-ASCII, EBCDIC, ISO-5428 (Greek) is not defined by Oberon-07
We only know that CHR and ORD are opposite function and ORD returns an INTEGER. Oberon-07 does not define the size of an INTEGER, so we don’t know how many characters we have. It’s up to the compiler
US-ASCII: ORD(“M”) = 041H
EBCDIC: ORD(“M”) = 0D4H
ISO-5428: ORD(“M”) = 04FH

The only thing in Oberon-07 we know for sure:   ch = CHR(ORD(ch))

Second, what is the definition of CAP?
Does CAP() only convert (so the input is a small letter) or does it check as well?
Let’s assume you want to program a module Strings.Cap(s). Do you write:
i := 0; WHILE s[i] # 0X DO s[i] := CAP(s[i]); INC(i) END;
or do you write
i := 0; WHILE s[i] # 0X DO IF (“a” <= s[i]) AND (s[i] <= “z”) THEN s[i] := CAP(s[i]) END; INC(i) END;

br
Jörg


Von: Oberon <oberon-bounces at lists.inf.ethz.ch<mailto:oberon-bounces at lists.inf.ethz.ch>> im Auftrag von Chris Burrows <cfbsoftware at gmail.com<mailto:cfbsoftware at gmail.com>>
Datum: Sonntag, 7. August 2022 um 03:51
An: ETH Oberon and related systems <oberon at lists.inf.ethz.ch<mailto:oberon at lists.inf.ethz.ch>>
Betreff: Re: [Oberon] Bit-fiddling: SETs and type casts in Oberon-07



On Sun, Aug 7, 2022 at 11:14 AM Chris Burrows <cfbsoftware at gmail.com<mailto:cfbsoftware at gmail.com>> wrote:

Oops!

 IF (ch >= "a") & (ch <= "z") THEN cap := CHR(ORD(ch) - 32) ELSE cap := ch END;

Third time lucky? ;-)

  PROCEDURE CAP(ch: CHAR): CHAR;
  BEGIN
    IF (ch >= "a") & (ch <= "z") THEN ch := CHR(ORD(ch) - 32) END;
    RETURN ch
  END CAP;

--
Oberon at lists.inf.ethz.ch mailing list for ETH Oberon and related systems
https://lists.inf.ethz.ch/mailman/listinfo/oberon

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.inf.ethz.ch/pipermail/oberon/attachments/20220808/4485ffed/attachment.html>


More information about the Oberon mailing list