[Oberon] ALU 2015 and 2018

Tue May 10 11:40:11 CEST 2022

Wojtek,

On Mo, 2022-05-09 at 22:23 +0000, Skulski, Wojciech wrote:
> 
>   trying to somewhat understand RISC5, I looked at two implementations dated 25.9.2015 and
> 31.8.2018. Four questions:
> 
> 1. Is the latter implementation the most recent one? 
> 

yes, as far as I know, although the site
https://people.inf.ethz.ch/wirth/news.txt
mentions "15.5.2019 - Floating-point rounding corrected" as the
latest change for the hardware.

> 2. The ALU implementation in the latter one looks very significantly different from the former.
> The former uses the instruction mnemonics like MOV, LSL, etc., as explained in RISC.pdf on page 9.
> The latter one does not use any mnemonics. It is a cascaded case statement (or a priority decoder,
> if you will) whose coding style looks mind boggling to me. Why is it so that the coding style was
> changed so significantly?
> 

Only NW can tell for sure, I suspect ;-)
It is perhaps dependent on the way Lola does its work, but
I have never looked into that.

> 3. Almost all the RISC5.v (either version) is coded combinatorially with "assign" statements.
> There is only one clocked "always block" at the end, where it is not even clear how these
> combinatorial paths get executed. For example, the ALU result "aluRes" is never assigned to any
> register, as one would expect in a register-based FPGA design. In the RISC5 design, instruction
> execution looks almost like a byproduct. How was this design motivated, when it it is generally
> believed in the FPGA literature that registers are the most fundamental bulding blocks of any FPGA
> firmware?
> 

There are many ways to describe a synchronous circuit in Verilog.
In the end, all "always" blocks with identical clock specifications
(e.g., posedge clk) are combined into a single block, and the
non-blocking assignments within that block are carried out concurrently
on exactly this clock edge. So you have the freedom to group these
blocks as you see fit. NW's style is simply at one extreme end of the
spectrum.

Regarding "aluRes": In RISC5.v you can see that aluRes is multiplexed
to "regmux" (the other signals being "inbus" in case of a load, and
the return address {8'b0, nxpc, 2'b0} in case of a call.) This is
the data multiplexer for the register file. "regmux" is connected
to the "din" input of the register array (described in file
"Registers.v"), and is clocked into the destination register there.

> 4. Looking at the "assign" equations (especially the ALU) I suspect that they created long
> combinatorial paths which slow down the CPU operation. Is it true? Can the CPU run faster if it
> was explicitly coded with registers? Or is Xilinx compiler smart enough to automatically infer the
> registers by itself? 
> 

Looking at the equations is not enough to decide whether long combinatorial
paths will be synthesized - the modern hardware synthesizers do an impressive
job when optimizing combinatorial circuits. But you can inspect the result
after synthesis, either as circuit diagram, or more easily as timing report.
The maximum attainable frequency directly relates to the longest combinatorial
path between (clocked) registers.

Normally, synthesizers don't insert registers by themselves (caveat: I didn't
check Vivado for a long time). They move combinatorial circuits around registers,
if allowed to do so, without changing the behavior of the circuit (other than
a better timing). But inserting registers freely would destroy the behavior of,
e.g, a balanced pipeline.

NW's design is a two-stage pipeline, as he explains in his documents (at least
indirectly: "In the first cycle the address is computed and the data are fetched
or stored. In the second cycle, the next instruction is fetched"). It exhibits
rather long delays indeed, and could possibly be sped up by dividing the pipeline
into more stages. I once thought about doing that, but then one would want to
integrate caches too.

Hellwig