[Oberon] RISC emulator

Claudio Nieder private at claudio.ch
Tue Mar 18 20:16:10 CET 2014


Hi,

> After some thought I'm not so sure if the "international Project Oberon" should really be able to hold ALL Unicode characters, so ALL 17 planes. I think we can limit to plane 0, namely BMP.

Sounds a bit of what Java has done. They started by supporting only plane 0. So char is a 16 bit type. Then they changed their mind and now support all planes, so now

- String is still a sequence of char (16 byte values) and non-plane 0 unicode characters are encoded by two char values a so called high-surrogate (D800H - DBFFH) and a low-surrogate (DC00H - DFFFH) value. (UTF-16)

- String.charAt(i) which returned the i-th character and now just returns the i-th char which can be a proper Unicode character or a high or low surrogate value. But even when it returns the i-th char this might be a character with number <1 in the string if before that one there was a character that was formed by a surrogate pair.

- String.codePointAt(i) was added so it will return the integer unicode value of the character provided the programmer has chosen the proper i, i.e. i still refers to the i-th char value in the string and not to the i-th unicode character within the string, but at least as long one does not supply an index which corresponds to a low-surrogate char value the "decoded" unicode character value is returned, but of course as type int because type char would not allow values beyond FFFFH.

Thus as soon as you go beyond plane 0 String is not anymore a clean abstraction of a string where you can easily get the its i-th unicode character. So I feel this would be a bad solution. Either you do it the way you originally proposed and just use UTF-8 and leave it to the applications to handle them, or go for a CHAR type which really represents all unicode characters.

claudio
-- 
Claudio Nieder, Talweg 6, CH-8610 Uster, Tel +4179 357 6743, www.claudio.ch




-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 163 bytes
Desc: Message signed with OpenPGP using GPGMail
Url : https://lists.inf.ethz.ch/pipermail/oberon/attachments/20140318/7d05b2b7/attachment.bin 


More information about the Oberon mailing list