[Oberon] ALU 2015 and 2018

Wed May 11 17:52:08 CEST 2022

Hi Jörg,

On Mi, 2022-05-11 at 16:27 +0200, Jörg wrote:
> 
> My understanding of the RISC5 archiecture is that the „handshake“ of CPU and memory is done via
> „adr“ and „codebus“
> During one cycle IR is stable as it is in a register.
> When the cycle starts, the decoding of IR needs some combinational delay to get the „adr“ lines
> stable (typically the first half of the cycle). They got stable and are fed (through RISC5Top) to
> the memory and because SRAM is fast enough the „codebus“ (output from memory and input to RISC5)
> got stable BEFORE the next cycle starts and clocks the next IR in.
> So, decoding and fetching are in the same cycle. I mean for „normal“ instruction not being LD/ST.
> 

yes and no - you discuss the address-forming part correctly, but
forgot the data path for "normal" arithmetic instructions (Fig 16.8
of PO.Computer.pdf has an arrow pointing from IR to "decode" in
the data path). So what you essentially say is: in every clock cycle
a new instruction is fetched. I agree completely; the throughput is
one instruction per clock cycle. But this is not the latency - the
data path uses up another clock cycle to compute the result and write
it into the destination register.

I stand by my statement that we have a two-stage pipeline for "normal"
arithmetic instructions. Things get much fuzzier when we turn our attention
to branches. Indeed, after writing my last mail in the discussion with Paul,
a nagging question remained: why isn't there a pipeline flush when branching
occurs? A proper two-stage pipeline would have fetched an instruction which
should not be executed; it must be nulled in the pipeline ("flush" the pipe).
A quick look into the sources reveals why (and you describe it in your
statements above): the current instruction is allowed to modify the "next"
address of the control unit. The pipeline is effectively shortened by one
stage. This trick is normally abhorred in pipeline design, as it lengthens
the cycle time noticeably (already almost at the end of the cycle the address
for the next instruction is changed, so it adds the time needed for decoding
the branch instruction to the memory access time for the next instruction:
sum instead of maximum). And this price is payed for every instruction - not
only for branches.

I think these findings (an instruction-dependent number of pipeline stages)
explain the different standpoints in the foregoing discussion very well.

All: Thanks for your "food for thought"!

Best regards,
Hellwig