[Oberon] ALU 2015 and 2018

Wed May 11 19:51:09 CEST 2022

Hellwig

I think you are not right. Registers.v uses the the primitive RAM16X1D.
This XILINX primitive does something different in the first halve of the clock than in the second halve.
This is a dual port ram. The first port is used to read and write, the second port is only for read.
In the first halve it writes the result from the previous instruction and in the second halve it reads the registers for the current instruction.

Br, Jörg

> Am 11.05.2022 um 17:52 schrieb Hellwig Geisse <hellwig.geisse at mni.thm.de>:
> 
> Hi Jörg,
> 
>> On Mi, 2022-05-11 at 16:27 +0200, Jörg wrote:
>> 
>> My understanding of the RISC5 archiecture is that the „handshake“ of CPU and memory is done via
>> „adr“ and „codebus“
>> During one cycle IR is stable as it is in a register.
>> When the cycle starts, the decoding of IR needs some combinational delay to get the „adr“ lines
>> stable (typically the first half of the cycle). They got stable and are fed (through RISC5Top) to
>> the memory and because SRAM is fast enough the „codebus“ (output from memory and input to RISC5)
>> got stable BEFORE the next cycle starts and clocks the next IR in.
>> So, decoding and fetching are in the same cycle. I mean for „normal“ instruction not being LD/ST.
>> 
> 
> yes and no - you discuss the address-forming part correctly, but
> forgot the data path for "normal" arithmetic instructions (Fig 16.8
> of PO.Computer.pdf has an arrow pointing from IR to "decode" in
> the data path). So what you essentially say is: in every clock cycle
> a new instruction is fetched. I agree completely; the throughput is
> one instruction per clock cycle. But this is not the latency - the
> data path uses up another clock cycle to compute the result and write
> it into the destination register.
> 
> I stand by my statement that we have a two-stage pipeline for "normal"
> arithmetic instructions. Things get much fuzzier when we turn our attention
> to branches. Indeed, after writing my last mail in the discussion with Paul,
> a nagging question remained: why isn't there a pipeline flush when branching
> occurs? A proper two-stage pipeline would have fetched an instruction which
> should not be executed; it must be nulled in the pipeline ("flush" the pipe).
> A quick look into the sources reveals why (and you describe it in your
> statements above): the current instruction is allowed to modify the "next"
> address of the control unit. The pipeline is effectively shortened by one
> stage. This trick is normally abhorred in pipeline design, as it lengthens
> the cycle time noticeably (already almost at the end of the cycle the address
> for the next instruction is changed, so it adds the time needed for decoding
> the branch instruction to the memory access time for the next instruction:
> sum instead of maximum). And this price is payed for every instruction - not
> only for branches.
> 
> I think these findings (an instruction-dependent number of pipeline stages)
> explain the different standpoints in the foregoing discussion very well.
> 
> All: Thanks for your "food for thought"!
> 
> Best regards,
> Hellwig
> --
> Oberon at lists.inf.ethz.ch mailing list for ETH Oberon and related systems
> https://lists.inf.ethz.ch/mailman/listinfo/oberon