[Oberon] Wrong results of selected REAL multiplications

Mon Oct 16 06:58:41 CEST 2023

Hi

Here the corrected version. I splitted “round and normalize” into two steps.
Introduced a new variable z1 for the rounding phase.

PROCEDURE FML*(x, y: REAL; VAR z: REAL); (* floating multiply *)
                VAR
                               S: INTEGER; (* state *)
                               P0, P24: INTEGER; (* 48 bit product: 0..23 and 24..47 *)
                               sign: BOOLEAN;
                               xe, ye: INTEGER; (* 8 bits *)
                               e0, e1: INTEGER; (* 9 bits *)
                               z0: INTEGER; (* 23 bits *)
                               w0: INTEGER; (* 24 bits *)
                               w1: INTEGER; (* 25 bits *)
                               z1: INTEGER; (* 26 bits *)
                BEGIN
                               sign := Bit(ORD(x), 31, 31) # Bit(ORD(y), 31, 31);
                               xe := Bit(ORD(x), 30, 23);
                               ye := Bit(ORD(y), 30, 23);
                               Out.printf("xe = %d, ye = %d\n", xe, ye, "");

                               e0 := xe + ye;
                               P24 := 0; P0 := 800000H + Bit(ORD(x), 22, 0);
                               FOR S := 0 TO 23 DO (* look at all bits in p0 aka x *)
                                               IF ODD(P0) THEN w0 := 800000H + Bit(ORD(y), 22, 0) ELSE w0 := 0 END;
                                               w1 := P24 + w0; (* 25 bit result *)
                                               (* P <= {w1, P[23:1]} *)
                                               P0 := Bit(P0, 23, 1); (* this is P0 := P0 DIV 2 *)
                                               IF ODD(w1) THEN P0 := P0 + 800000H END;
                                               P24 := w1 DIV 2
                               END;
                               Out.printf("product P =%x%x\n", P24, P0, "");
                               z1 := P24*4 + Bit(P0, 23, 22) + 1 + Bit(P24, 23, 23); (* round *)
                               IF Bit(z1, 25, 25) = 1 THEN (* normalize *)
                                               z0 := Bit(z1, 24, 2)
                               ELSE
                                               z0 := Bit(z1, 23, 1)
                               END;
                               e1 := e0 - 127 + Bit(z1, 25, 25); (* P[47],  should be z1[25]*)
                               Out.printf("z0 = %x, e1 = %d\n", z0, e1, "");
                               IF (xe = 0) OR (ye = 0) THEN z := 0R
                               ELSIF Bit(e1, 8, 8) = 0 THEN z := SYSTEM.VAL(REAL, Bit(e1, 7, 0)* 800000H + z0)
                               ELSIF Bit(e1, 7, 7) = 0 THEN z := SYSTEM.VAL(REAL, 7F800000H+ z0)
                               ELSE z := 0R
                               END;
                               IF sign THEN z := -z END
                END FML;

This is in Verlog:
`timescale 1ns / 1ps  // NW 15.9.2015  8.8.2016 jr/16.10.2023
module FPMultiplier(
  input clk, run,
  input [31:0] x, y,
  output stall,
  output [31:0] z);

reg [4:0] S;  // state
reg [47:0] P; // product

wire sign;
wire [7:0] xe, ye;
wire [8:0] e0, e1;
wire [22:0] z0;
wire [23:0] w0;
wire [24:0] w1;
wire [25:0] z1;

assign sign = x[31] ^ y[31];
assign xe = x[30:23];
assign ye = y[30:23];
assign e0 = xe + ye;
assign e1 = e0 - 127 + z1[25];

assign stall = run & ~(S == 25);
assign w0 = P[0] ? {1'b1, y[22:0]} : 0;
assign w1 = {1'b0, P[47:24]} + {1'b0, w0};
assign z1 = P[47:22]+1+P[47];  // round
assign z0 = z1[25] ? z1[24:2] : z1[23:1];  // normalize
assign z = (xe == 0) | (ye == 0) ? 0 :
   (~e1[8]) ? {sign, e1[7:0], z0} :
   (~e1[7]) ? {sign, 8'b11111111, z0} : 0;
always @ (posedge(clk)) begin
    P <= (S == 0) ? {24'b0, 1'b1, x[22:0]} : {w1, P[23:1]};
    S <= run ? S+1 : 0;
end
endmodule

Jörg

Von: Oberon <oberon-bounces at lists.inf.ethz.ch> im Auftrag von joerg.straube at iaeth.ch <joerg.straube at iaeth.ch>
Datum: Montag, 16. Oktober 2023 um 00:21
An: ETH Oberon and related systems <oberon at lists.inf.ethz.ch>
Betreff: Re: [Oberon] Wrong results of selected REAL multiplications
Hi

I ported the Verilog code of the floating multiplication to Oberon-07.
Some of the Oberon code could be written more elegantly in Oberon, but for the sake of comparing 1:1, I let the code as it is in Verlog.

PROCEDURE Bit(x: INTEGER; hi, lo: INTEGER): INTEGER;
                RETURN ROR(x, lo) MOD LSL(2, hi-lo)
                END Bit;

PROCEDURE FML*(x, y: REAL; VAR z: REAL); (* floating multiply *)
                VAR
                               S: INTEGER; (* state *)
                               P0, P24: INTEGER; (* 48 bit product: 0..23 and 24..47 *)
                               sign: BOOLEAN;
                               xe, ye: INTEGER; (* 8 bits *)
                               e0, e1: INTEGER; (* 9 bits *)
                               w0: INTEGER; (* 24 bits *)
                               w1, z0: INTEGER; (* 25 bits *)
                BEGIN
                               sign := Bit(ORD(x), 31, 31) # Bit(ORD(y), 31, 31);
                               xe := Bit(ORD(x), 30, 23);
                               ye := Bit(ORD(y), 30, 23);
                               Out.printf("xe = %d, ye = %d\n", xe, ye, "");

                               e0 := xe + ye;
                               P24 := 0; P0 := 800000H + Bit(ORD(x), 22, 0);
                               FOR S := 0 TO 23 DO (* look at all bits in p0 aka x *)
                                               IF ODD(P0) THEN w0 := 800000H + Bit(ORD(y), 22, 0) ELSE w0 := 0 END;
                                               w1 := P24 + w0; (* 25 bit result *)
                                               (* P <= {w1, P[23:1]} *)
                                               P0 := Bit(P0, 23, 1); (* this is P0 := P0 DIV 2 *)
                                               IF ODD(w1) THEN P0 := P0 + 800000H END;
                                               P24 := w1 DIV 2
                               END;
                               Out.printf("product P =%x%x\n", P24, P0, "");
                               IF Bit(P24, 23, 23) = 1 THEN
                                               z0 := P24*2 + Bit(P0, 23, 23) + 1
                               ELSE
                                               z0 := P24*4 + Bit(P0, 23, 22) + 1 (* e1 is not adjusted if rounding sets the highest bit!! *)
                               END;
                               e1 := e0 - 127 + Bit(P24, 23, 23); (* P[47] *)
                               Out.printf("z0 = %x, e1 = %d\n", z0, e1, "");
                               IF (xe = 0) OR (ye = 0) THEN z := 0R
                               ELSIF Bit(e1, 8, 8) = 0 THEN z := SYSTEM.VAL(REAL, Bit(e1, 7, 0)* 800000H + Bit(z0, 23, 1))
                               ELSIF Bit(e1, 7, 7) = 0 THEN z := SYSTEM.VAL(REAL, 7F800000H+ Bit(z0, 23, 1))
                               ELSE z := 0R
                               END;
                               IF sign THEN z := -z END
                END FML;

If the highest bit of the 48 bit product is set, we have to adjust the final exponent e1 by one.
There are TWO possibilities how the highest bit of the product P is set

  1.  Either the values of x and y are big enough, so their products sets the highest bit.

This case is taken care of by assign e1 = e0 - 127 + P[47];

Or in Oberon e1 := e0 - 127 + Bit(P24, 23, 23);

  1.  During the final rounding step from P to z0 a carry might flow over to the highest bit
If this is the case e1 is not adjusted. So basically this step is wrong

        assign z0 = P[47] ? P[47:23]+1 : P[46:22]+1;  // round and normalize

So far the analysis. I’ll try to come up with a solution.

br
Jörg

Von: Oberon <oberon-bounces at lists.inf.ethz.ch> im Auftrag von Michael Schierl <schierlm at gmx.de>
Datum: Freitag, 13. Oktober 2023 um 23:55
An: oberon at lists.inf.ethz.ch <oberon at lists.inf.ethz.ch>
Betreff: Re: [Oberon] Wrong results of selected REAL multiplications
Hello,

Am 13.10.2023 um 18:39 schrieb Hans Klaver:

>> The problem does not occur on a April 2016 verison of Peter's
>> emulator. This leads me to suspect that the problem may have been
>> introduced during the changes to the Floating Point-related Verilog
>> files that were made between 8 Aug 2016 and 3 Oct 2016 as
>> summarised here:
>>
>> https://people.inf.ethz.ch/wirth/news.txt

For the record, the changes talked about are in this diff:
<https://github.com/Spirit-of-Oberon/wirth-personal/commit/359240b9d1c63d23aff6c90aa90fdd46917c743b>.

> Thanks for checking this.
>
> As I don't know much about Verilog programming I am afraid that I
> can't be of much help to pinpoint this bug and propose a solution.

I don't think having experience about Verilog programming really helps
me much in understanding this very dense code of parallel bit-twiddling
(and a significant part of the twiddling and its state machine is for
doing the "integer" multiplication of the mantissae).

> Hoping others in the community are able and willing to look into
> this.

Just starting from the facts you stated (sometimes, when multiplying a
value with mantissa # 1.0 with a value with mantissa # 1.0 and the
result has a mantissa = 1.0, the exponent is off by one), and when
looking at the changes to FPMultiplier in the commit above, I have a
suspicion ...

Just picking a few lines of the assignments from the new version (I
slightly reordered them for the argument, but that does not matter as
all those assignments happen in parallel anyway):

reg [47:0] P; // product
wire [24:0] z0;

assign e1 = e0 - 127 + P[47];

assign z0 = P[47] ? P[47:23]+1 : P[46:22]+1;  // round and normalize

assign z = (xe == 0) | (ye == 0) ? 0 :
    (~e1[8]) ? {sign, e1[7:0], z0[23:1]} :
    (~e1[7]) ? {sign, 8'b11111111, z0[23:1]} : 0;

Or condensed to the part I'd like to point out

assign e1 = <something> + P[47];

assign z0 = P[47] ? P[47:23]+1 : P[46:22]+1;  // round and normalize

assign z = <some corner cases omitted, but the main case is>
            {sign, e1[7:0], z0[23:1]};

z is our final output, and in the final step of the state machine it
gets set by using e1 as exponent and the "middle" of z0 as mantissa
(while z0 is a 25-bit value, we omit the first and last bit of it. First
bit has to be 1 and is omitted in IEEE float representation, and last
bit was there for better rounding).

Both exponent and mantissa have special logic for handing the most
significant bit of the intermediate product, P[47], to be 1. In that
case we increase the exponent by 1, while at the same time shifting the
mantissa one more bit. That way, even if P[47] is not 1, P[46] is 1, so
we end up with a normalized mantissa.

Or at least, that is what you might think at first point.

Now consider the case, where (the relevant part of) P has the value of

0111111111111111111111111

The most significant bit is 0. Yet for the actual computation of the
mantissa, we add 1 after shifting (this bit is cut off afterwards and as
the comment suggests, this is to improve rounding of results), resulting
in a mantissa of all zeros (as the top bit is cut off as well), instead
of a mantissa of 1000000000000000000000000. The leading bit does not
matter, though, as we will discard it anyway, assuming it to be a 1.

But what does matter: For the computation of the exponent we don't take
care of the overflow of the addition and do not add an extra 1.

An easy fix would be to remove the rounding (the two +1), but then your
result would be 437FFFFFH instead of 43800000H (still a lot better than
43000000H, though). But a correct fix that keeps the rounding does not
seem to be trivial.

In the emulator source, I could easily fix this by adding an additional
if statement and just do the addition again before comparing the most
significant bit for incrementing the exponent, but I don't feel able to
do this in Verilog (while still keeping the whole circuit being fast),
especially since I don't have a way to test it on real hardware. Maybe
somebody else wants to give it a stab?

So either, change the computation of P in the previous states to add 1
there (while for the best rounding you would have to add 2 in case the
value is shifted one bit later). Or account for the extra 1 exponent in
case of overflow.

As a compromise, one could also just omit the rounding in case the value
before rounding but after normalization (or at least a significant part
of its digits) happens to be all ones; in that case, the rounding would
still happen in most cases but not in the one that currently triggers
the bug.

Or did I miss anything obvious? :)

Regards,

Michael
--
Oberon at lists.inf.ethz.ch mailing list for ETH Oberon and related systems
https://lists.inf.ethz.ch/mailman/listinfo/oberon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.inf.ethz.ch/pipermail/oberon/attachments/20231016/fc3872c3/attachment-0001.html>