[Oberon] FPGA - Bit reversal

Fri Sep 22 13:35:20 CEST 2017

Hi, Tomas!

22.09.2017, 12:53, "Tomas Kral" <thomas.kral at email.cz>:
> Hi Alex (or prefer Sasha?),

  Alex is better, thanks. : ))

> How would you recommend to further optimise word bit reversal?
>
>   PROCEDURE ReverseWord(wd: INTEGER): INTEGER;
>     VAR i: INTEGER; rwd, u: SET;
>   BEGIN
>     u := SYSTEM.VAL(SET, wd);
>     rwd := {}; FOR i := 0 TO 31 DO
>       IF i IN u THEN INCL(rwd, 31-i) END
>     END
>   RETURN SYSTEM.VAL(INTEGER, rwd) END ReverseWord;
>
> I was also thinking to use WHILE loop and let i run from 0 to 15 and j
> from 31 to 16. Not sure if faster with only 16 iterations.

  Partially unrolling the loop like that seems the way to go. It all depends on where you want your memory-codesize-speed trade-off to be. If we are talking about embedded development, I would create several implementations and experiment.

  A combination approach may be worth trying: make lookup table of 16 bytes (or even 4 bytes), then unroll the loop as much as it makes sense performance-wise.

  There is one optimization I could suggest for the code above:

PROCEDURE ReverseWord(wd: INTEGER): INTEGER;
  VAR i: INTEGER; rwd, u: SET;
BEGIN
  u := SYSTEM.VAL(SET, wd);
  rwd := {}; i := 0;
  WHLE u # {} DO
    IF i IN u THEN INCL(rwd, 31-i) END;
    EXCL(u, i);
    INC(i)
  END
RETURN SYSTEM.VAL(INTEGER, rwd) END ReverseWord;

  This will terminate the loop as soon as there are no more bits in `u` to move to a new place. If your typical input data has MSB bits cleared, this may outweigh the cost of the additional EXCL operation in the loop body.

> `C' pseudo code gives this, if could be modified for array lookup, as
> per Joerg's suggestion on BYTE type reversal.
>
> unsigned int
> reverse(register unsigned int x)
> {
>     x = (((x & 0xaaaaaaaa) >> 1) | ((x & 0x55555555) << 1));
>     x = (((x & 0xcccccccc) >> 2) | ((x & 0x33333333) << 2));
>     x = (((x & 0xf0f0f0f0) >> 4) | ((x & 0x0f0f0f0f) << 4));
>     x = (((x & 0xff00ff00) >> 8) | ((x & 0x00ff00ff) << 8));
>     return((x >> 16) | (x << 16));
>
> }

  This code seems to have many benefits, I would give it a try if only to see the binary size compared to the looped versions.

---=====--- 
 Александр