[Oberon] Project Oberon for Mac

Douglas G. Danforth danforth at greenwoodfarm.com
Fri Jan 22 13:57:40 CET 2016


On 1/22/2016 3:38 AM, eas lab wrote:
> Who started this absurdity of replacing single-quote/apostrophe by  3 bytes.
UTF-8 is a character encoding capable of encoding all possible 
characters, or code points, in Unicode.

The encoding is variable-length and uses 8-bit code units. It was 
designed for backward compatibility with ASCII, and to avoid the 
complications of endianness and byte order marks in the alternative 
UTF-16 and UTF-32 encodings. The name is derived from: Universal Coded 
Character Set + Transformation Format—8-bit.[1]
Graph indicates that UTF-8 (light blue) exceeded other main encodings of 
text on the Web, that by 2010 it was nearing 50% prevalent. Encodings 
were detected by examining the text, not from the encoding tag in the 
header,[2] and were sorted to the least inclusive set;[3] thus, ASCII 
text tagged as UTF-8 or ISO-8859-1 is identified as ASCII. By January 
2016 the declared usage was up to 86%.[4]

UTF-8 is the dominant character encoding for the World Wide Web, 
accounting for 86.1% of all Web pages in January 2016 (with the most 
popular East Asian encoding, GB 2312, at 0.9%).[4][2][5] The Internet 
Mail Consortium (IMC) recommends that all e-mail programs be able to 
display and create mail using UTF-8,[6] and the W3C recommends UTF-8 as 
the default encoding in XML and HTML.[7]

UTF-8 encodes each of the 1,112,064 valid code points in the Unicode 
code space (1,114,112 code points minus 2,048 surrogate code points) 
using *one to four *8-bit bytes (a group of 8 bits is known as an octet 
in the Unicode Standard). Code points with lower numerical values (i.e., 
earlier code positions in the Unicode character set, which tend to occur 
more frequently) are encoded using fewer bytes. The first 128 characters 
of Unicode, which correspond one-to-one with ASCII, are encoded using a 
single octet with the same binary value as ASCII, making valid ASCII 
text valid UTF-8-encoded Unicode as well. And ASCII bytes do not occur 
when encoding non-ASCII code points into UTF-8, making UTF-8 safe to use 
within most programming and document languages that interpret certain 
ASCII characters in a special way, e.g. as end of string.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.inf.ethz.ch/pipermail/oberon/attachments/20160122/b4758588/attachment.html>


More information about the Oberon mailing list