[Oberon] Bit-fiddling: SETs and type casts in Oberon-07
hklaver at dds.nl
Sun Aug 7 21:53:03 CEST 2022
Chris, Florian and Jörg, thanks for your answers so far!
Maybe I was not clear enough.
My questions are not so much about capitalization algorithms (that was just an illustration of a use case) but more about how several Oberon-07 compilers differ in their implementation of the (function) procedures SYSTEM.VAL and INCL.
I am aware that Oberon-07 does not define a certain character set. But even then a function along the lines of the classic CAP function of Pascal, Modula-2 and Oberon-90 (which just clears a bit of a CHAR, without checking its input) will be useful as part of more elaborate capitalization algorithms. This is because several often used parts of many character encodings (like ASCII, ISO-8859-1 (8-bit single byte encoded Latin-1) and the least significant byte of UTF-8 characters) can be capitalized by flipping one or a few bits of several ranges of characters.
Given that fact I tried to write a simple Oberon-07 CAP function which just clears bit 5 of a byte, and stumbled upon the different implementations of SYSTEM.VAL and INCL in the two compilers I use regularly.
>> EXCL(SYSTEM.VAL(SET, ch), 5)
> All compilers I know access the variable ch as if it was declared as a SET in this case. This means that EXCL actually overwrites SIZE(SET) bytes rather than SIZE(CHAR) which may definitely lead to undefined behavour if these sizes don't match.
That's OK; I could still do useful things, but of course implementation-dependent. As a programmer I am warned for this because the VAL function is from the SYSTEM pseudomodule; that's one nice feature of Oberon I like very much.
My problem though is that neither of the two Oberon-07 compilers I checked accepts the above statement. Which was a bit of a surprise to me. After all it is a SYSTEM function, so now it is the responsability of the programmer to check if the desired result is reached.
In one of the two compilers SYSTEM.VAL(SET, ...) only is allowed INTEGER variables as second parameter, so a CHAR variable of an ASCII character set first has to make a detour via BYTE and INTEGER:
VAR ch: CHAR; b: BYTE; i: INTEGER;
EXCL(SYSTEM.VAL(SET, ch), 5) (* error: "casting not allowed" *)
b := SYSTEM.VAL(BYTE, ch);
EXCL(SYSTEM.VAL(SET, b), 5) (* error: "casting not allowed" *)
i := SYSTEM.VAL(BYTE, ch); (* cast CHAR to BYTE and assign to INTEGER *)
EXCL(SYSTEM.VAL(SET, i), 5) (* clear bit 5 *)
RETURN SYSTEM.VAL(CHAR, i) (* cast INTEGER directly back to CHAR *)
Only the last three lines are accepted by this compiler, and return the desired result (bit 5 of ch cleared).
Unexpectedly (for me) the casting back from INTEGER to CHAR *is* allowed by this compiler!
The other compiler I checked does allow the direct casting of CHAR to SET, but (in contrast to the above compiler) requires a variable as first parameter of procedure EXCL:
VAR ch: CHAR; b: BYTE; i: INTEGER; s: SET;
EXCL(SYSTEM.VAL(SET, ch), 5) (* error: "variable expected in substitution of parameter 1: EXCL" *)
s := SYSTEM.VAL(SET, ch); (* cast CHAR to SET *)
EXCL(s, 5) (* clear bit 5 *)
RETURN SYSTEM.VAL(CHAR, s) (* cast SET back to CHAR *)
I find this code still is a bit convoluted.
I would rather have my Oberon-07 compiler to accept the following (my responsibility for the correct result):
VAR ch: CHAR;
EXCL(SYSTEM.VAL(SET, ch), 5) (* cast CHAR to SET and clear bit 5 *)
My questions now are:
- the behaviour regarding SYSTEM.VAL and EXCL of which of the two above mentioned compilers do you like best?
- do you know an Oberon-07 compiler that could compile the last lines of code, and what do you think of it?
In other words: what is your opinion about the desired behaviour of procedures SYSTEM.VAL and EXCL in an ideal Oberon-07 compiler?
> On 7 Aug. 2022, at 07:48 joerg.straube at iaeth.ch wrote:
> We are in heavy grey zone here 😊
> First, Oberon-07 does not define any character set. Whether the compiler internally uses US-ASCII, EBCDIC, ISO-5428 (Greek) is not defined by Oberon-07
> We only know that CHR and ORD are opposite function and ORD returns an INTEGER. Oberon-07 does not define the size of an INTEGER, so we don’t know how many characters we have. It’s up to the compiler
> US-ASCII: ORD(“M”) = 041H
> EBCDIC: ORD(“M”) = 0D4H
> ISO-5428: ORD(“M”) = 04FH
> The only thing in Oberon-07 we know for sure: ch = CHR(ORD(ch))
> Second, what is the definition of CAP?
> Does CAP() only convert (so the input is a small letter) or does it check as well?
> Let’s assume you want to program a module Strings.Cap(s). Do you write:
> i := 0; WHILE s[i] # 0X DO s[i] := CAP(s[i]); INC(i) END;
> or do you write
> i := 0; WHILE s[i] # 0X DO IF (“a” <= s[i]) AND (s[i] <= “z”) THEN s[i] := CAP(s[i]) END; INC(i) END;
> Von: Oberon <oberon-bounces at lists.inf.ethz.ch> im Auftrag von Chris Burrows <cfbsoftware at gmail.com>
> Datum: Sonntag, 7. August 2022 um 03:51
> An: ETH Oberon and related systems <oberon at lists.inf.ethz.ch>
> Betreff: Re: [Oberon] Bit-fiddling: SETs and type casts in Oberon-07
> On Sun, Aug 7, 2022 at 11:14 AM Chris Burrows <cfbsoftware at gmail.com> wrote:
> IF (ch >= "a") & (ch <= "z") THEN cap := CHR(ORD(ch) - 32) ELSE cap := ch END;
> Third time lucky? ;-)
> PROCEDURE CAP(ch: CHAR): CHAR;
> IF (ch >= "a") & (ch <= "z") THEN ch := CHR(ORD(ch) - 32) END;
> RETURN ch
> END CAP;
> Oberon at lists.inf.ethz.ch mailing list for ETH Oberon and related systems
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Oberon