[Oberon] Why is RSC string data word-aligned?

Sun Jan 31 01:50:14 CET 2021

Colby

Strings in memory are word-aligned as string assignments can be done 4 times faster than byte-access. As drawback of this decision, we might have some superfluous consecutive 0X.

Now, do I understand you correctly that while writing the rsc file you want to compress consecutive 0X to only one 0X? Do you really want to compress the string area while writing to (huge) disk, where normally space is no issue?

To optimize the File IO you would turn this
  Files.WriteInt(R, strx);
  FOR i:= 0 TO strx-1 DO
    Files.Write(str[i])
  END;

into something like this
  eos := FALSE;
  FOR i := 0 TO strx-1 DO
    IF str[i] = 0X THEN eos := TRUE
    ELSE
      IF eos THEN Files.Write(R, 0X) END;
      Files.Write(R, str[i]); eos := FALSE
    END
  END;
  Files.Write(R, 0X);

BTW: There is a reason why strings are written byte-wise although they are word-aligned. It has to do with endianness. The endianess is not defined in the Oberon system, as it doesn‘t matter as long as you don‘t leave the system, eg by writing to files.
If the compiler would write strings as words (it could do that as strings are word-aligned) the writing and reading system must have the same endianess. As strings are written byte by byte, the possibly different endianesses of the reading and writing systems do not matter.

br
Jörg

> Am 31.01.2021 um 00:46 schrieb Chris Burrows <chris at cfbsoftware.com>:
> 
> 
>> 
>> -----Original Message-----
>> From: Oberon [mailto:oberon-bounces at lists.inf.ethz.ch] On Behalf Of
>> Colby Russell
>> Sent: Sunday, 31 January 2021 8:22 AM
>> To: ETH Oberon and related systems; Jörg
>> Subject: Re: [Oberon] Why is RSC string data word-aligned?
>> 
>> Copying my orginal response to Charles here, because I forgot to
>> reply to the list.
>> 
>>> On 1/30/21 2:23 PM, Charles Perkins wrote:
>>> if string constants and string variables start on a word boundary
>> and  > are padded with nul to a word boundary then a number of string
>>> operations only require word access
>> 
>> That's my intuition as well, but that's what I'm trying to verify.
>> I'm trying to track down some specific, concrete examples where we
>> can see that the cost of not having padding is higher than the cost
>> of dealing with a string with padding.  Operations like printing a
>> string, for example, are what I have in mind.  If you have a 5-
>> character string, word alignment means you can get the full contents
>> in 2 memory accesses, but the operations are still going to be
>> occurring at the byte level...
>> So where are the savings?
>> 
>>> On 1/30/21 3:16 PM, J rg wrote:
>>> Right. Look at ORG.CopyString then you know why
>> 
>> ORG.CopyString operates on ORG `Item` objects representing in-memory
>> strings, not the binary data contained within an RSC file.  By the
>> time the generated code is executing, the module's string section has
>> already been read from disk by the module loader--which happens a
>> single character at a time.  Viz:
>> 
>> <https://hypothes.is/a/Wx_nmmNEEeuHEi-6X3lPnQ>
>> 
>> It appears this is an (unnoticed?) opportunity for further
>> optimization of the Oberon system.  The caveat being that it
>> constitutes a breaking change to the RSC file format.  So perhaps
>> Wirth has noticed and just decided that it wasn't worth the trouble?
>> 
> 
> I checked a couple of the compiler modules (ORP and ORS) and the number of nulls in strings in the object file is about 1% of the total file size so I would agree that it is not worth the trouble.
> 
> Regards,
> Chris Burrows
> CFB Software
> https://www.astrobe.com/RISC5
> 
> 
> --
> Oberon at lists.inf.ethz.ch mailing list for ETH Oberon and related systems
> https://lists.inf.ethz.ch/mailman/listinfo/oberon