[Oberon] Files.Write and 2-byte CHARs
artur.efimov at gmail.com
Wed Jul 21 17:31:22 CEST 2021
In some Oberon versions, type CHAR has a size of 2 or 4 bytes.
(i. e. BlackBox has a 2-byte CHAR, Active Oberon has 4-byte CHAR.)
In the latest Oberon language report (2016), the size of type CHAR
is not defined, instead CHAR is said to hold "the characters of a
standard character set". A new type BYTE is added that is said to
hold "the integers between 0 and 255". Now BYTE is used instead of
CHAR where it is necessary to work with binary data (or files).
Thus, in Project Oberon, module Files now has the following procedures:
PROCEDURE ReadByte*(VAR r: Rider; VAR x: BYTE);
PROCEDURE ReadBytes*(VAR r: Rider; VAR x: ARRAY OF BYTE; n: INTEGER);
PROCEDURE Read*(VAR r: Rider; VAR ch: CHAR);
PROCEDURE ReadString*(VAR R: Rider; VAR x: ARRAY OF CHAR);
PROCEDURE WriteByte*(VAR r: Rider; x: BYTE);
PROCEDURE WriteBytes*(VAR r: Rider; x: ARRAY OF BYTE; n: INTEGER);
PROCEDURE Write*(VAR r: Rider; ch: CHAR);
PROCEDURE WriteString*(VAR R: Rider; x: ARRAY OF CHAR);
The procedure Write is internally the same as WriteByte, and likewise
procedure Read is the same as ReadByte, but with different signatures.
The only difference is i.e. that
r.buf.data[r.bpos] := ORD(ch)
is used in Write instead of
r.buf.data[r.bpos] := x
(as in WriteByte).
In Project Oberon, the size of CHAR is 1 byte.
But, if CHAR were 2 bytes, module Files should provide a way to read and
write the characters in the way that is convenient for the further usage of
the file. If CHARs are to be written in a file raw, as a 2-byte integer,
the file would have an encoding of UTF-16, UCS-2 or similar (without BOM),
and thus probably it will not display properly in any modern text viewer.
My proposal (in case of 2-byte or 4-byte CHARs) is to make Files.Read
and Files.Write work with CHARs in the following manner:
1. The file is assumed to be UTF-8 encoded.
2. Files.Read gets one or more bytes from a file and constructs
a value of CHAR.
3. Files.Write converts the given CHAR in UTF-8 and puts one
or more bytes in a file.
The number of bytes read or written for a 2-byte CHAR can be 1, 2 or 3,
as UTF-8 takes some bits for itself.
For your information:
2-byte version of Unicode covers all modern languages of the world,
Chinese, Japanese, Korean and Thai. The rest 2 bytes of Unicode are used to
encode ancient writings, emoji, and some strange things like playing card
tiles of the game Mah Jongg, and even dominoes.
Additionally, two procedures WriteChar and ReadChar can be added, that
write the values of CHARs directly (for fast local non-portable data
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Oberon