<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    On 1/22/2016 3:38 AM, eas lab wrote:<br>

    <blockquote

cite="mid:CAN3-DLEemRQ0mNaJrBFWf6Z6t2f4P5=2O7mc3q6tb5ixe8RZ-A@mail.gmail.com"

      type="cite">

      <pre wrap="">Who started this absurdity of replacing single-quote/apostrophe by  3 bytes.</pre>

    </blockquote>

    UTF-8 is a character encoding capable of encoding all possible

    characters, or code points, in Unicode.<br>

    <br>

    The encoding is variable-length and uses 8-bit code units. It was

    designed for backward compatibility with ASCII, and to avoid the

    complications of endianness and byte order marks in the alternative

    UTF-16 and UTF-32 encodings. The name is derived from: Universal

    Coded Character Set + Transformation Format—8-bit.[1]<br>

    Graph indicates that UTF-8 (light blue) exceeded other main

    encodings of text on the Web, that by 2010 it was nearing 50%

    prevalent. Encodings were detected by examining the text, not from

    the encoding tag in the header,[2] and were sorted to the least

    inclusive set;[3] thus, ASCII text tagged as UTF-8 or ISO-8859-1 is

    identified as ASCII. By January 2016 the declared usage was up to

    86%.[4]<br>

    <br>

    UTF-8 is the dominant character encoding for the World Wide Web,

    accounting for 86.1% of all Web pages in January 2016 (with the most

    popular East Asian encoding, GB 2312, at 0.9%).[4][2][5] The

    Internet Mail Consortium (IMC) recommends that all e-mail programs

    be able to display and create mail using UTF-8,[6] and the W3C

    recommends UTF-8 as the default encoding in XML and HTML.[7]<br>

    <br>

    UTF-8 encodes each of the 1,112,064 valid code points in the Unicode

    code space (1,114,112 code points minus 2,048 surrogate code points)

    using <b>one to four </b>8-bit bytes (a group of 8 bits is known

    as an octet in the Unicode Standard). Code points with lower

    numerical values (i.e., earlier code positions in the Unicode

    character set, which tend to occur more frequently) are encoded

    using fewer bytes. The first 128 characters of Unicode, which

    correspond one-to-one with ASCII, are encoded using a single octet

    with the same binary value as ASCII, making valid ASCII text valid

    UTF-8-encoded Unicode as well. And ASCII bytes do not occur when

    encoding non-ASCII code points into UTF-8, making UTF-8 safe to use

    within most programming and document languages that interpret

    certain ASCII characters in a special way, e.g. as end of string.<br>

  </body>

</html>