[Oberon] CHAR to/from BYTE conversion

Jörg Straube joerg.straube at iaeth.ch
Sat Jul 9 08:34:51 CEST 2016


Chris

Indeed, you're right the Oberon-07 report allows for any charcter coding as the formulation "ordinal value" (of type INTEGER) is quite open. Even EBCDIC would do.
One small remark though, the Oberon EBNF is not complete as the rule "character" (used in the definition of "string") is not defined.

The ProjectOberon implementation supports CHAR in the range 0X to 7FX.

If the implementation decides for Unicode you should probably take UTF-32 as internal representation of CHAR, and INTEGER has to have at least 32 bit. If you would take UTF-16 or UTF-8 the rather simple ch := s[5] get's quite complicated to implement.
If you restrict yourself to Unicode BMP (no emoji), UTF-16 will do and INTEGERs can have 16 bit.

Jörg

Am 09.07.2016 um 04:03 schrieb Chris Burrows <chris at cfbsoftware.com>:

>> -----Original Message-----
>> From: Oberon [mailto:oberon-bounces at lists.inf.ethz.ch] On Behalf Of
>> Jörg Straube
>> Sent: Saturday, 9 July 2016 5:06 AM
>> To: ETH Oberon and related systems
>> Cc: paulreed at paddedcell.com
>> Subject: Re: [Oberon] CHAR to/from BYTE conversion
>> 
>> Don't open that can of worms :-)
>> - Should LEN(s) return the number of characters in s (logical length)
>> or the number of bytes of s (physical length)?
> 
> Neither.
> 
> LEN is the number of elements of an array. An array declared as 
> 
>  VAR s: ARRAY 12 OF CHAR 
> 
> will always have 12 elements no matter how long the string is that is currently stored in the array. 
> 
> LEN("abc") = 4  (* including the terminating NULL character *)
> 
> s := "abc";
> LEN(s) = 12
> 
> Similarly for an array declared as
> 
>  VAR a: ARRAY 12 OF INTEGER
> 
> LEN(a) = 12.
> 
> SYSTEM.SIZE will give you the number of bytes allocated to a variable. SYSTEM indicates that the value that SIZE returns is implementation-dependent.
> 
> The length of a string is the number of characters up to but not including the null terminating character. The longest string that can be stored in the array s where LEN(s) = 12 is 11 characters.
> 
>> - shoudn't the type CHAR better be called "ASCII"?
> 
> No. Since 2013 the Oberon Report has not required it be Latin-1 / ASCII. CHAR is now defined as 'the characters of a standard character set'. 
> 
>> - As Unicode characters need up to 32 bit, should CHAR be 32 bit?
>> 
> 
> There is nothing stopping an implementer of Oberon from defining characters as 32-bit (or 16-bit for that matter) items if they wanted it to be used to develop applications that required a character set other than ASCII.
> 
>> Strings are quite tricky, especially a compact internal
>> representation of them.
>> 
> 
> True.
> 
> Regards,
> 
> Chris Burrows
> CFB Software
> http://www.astrobe.com
> 
> 
>>> Am 08.07.2016 um 20:50 schrieb Skulski, Wojciech
>> <skulski at pas.rochester.edu>:
>>> 
>>> 
>>>> That means both CHAR and BYTE holds values 0 to 255.
>>> 
>>>> No, CHAR holds characters, 0X to 0FFX, including "A", "B" etc.;
>> and
>>>> BYTE holds 0 to 255, a sub-range of the INTEGER range.
>>> 
>>>> Your statement is indicative of how much damage C has done to the
>>>> world :)
>>> 
>>> Characters should not be just English characters.
>>> 
>>> Your statement is indicative of how much damage ASCII has done to
>> the
>>> world :)
>>> 
>>> W.
>>> --
>>> Oberon at lists.inf.ethz.ch mailing list for ETH Oberon and related
>>> systems https://lists.inf.ethz.ch/mailman/listinfo/oberon
>> 
>> --
>> Oberon at lists.inf.ethz.ch mailing list for ETH Oberon and related
>> systems https://lists.inf.ethz.ch/mailman/listinfo/oberon
> 
> --
> Oberon at lists.inf.ethz.ch mailing list for ETH Oberon and related systems
> https://lists.inf.ethz.ch/mailman/listinfo/oberon



More information about the Oberon mailing list