[Oberon] Fast version of Oberon RISC5 for Pepino

Magnus Karlsson magnus at saanlima.com
Tue Mar 1 01:00:44 CET 2016

The current RISC5 verilog code does not take advantage of the fact the 
both Spartan3 and Spartan6 have hardware multipliers but instead does 
the multiply by doing 16 additions, which will case the CPU to stall for 
16 clocks for each multiplication.

As a test I rewrote the multiplier code so that the built-in hardware 
multipliers are used instead, and with this code the CPU is only stalled 
for 1 clock instead of 16 clocks.  The code uses the verilog2001 syntax 
to specify a signed and an unsigned multiplier, and the u input signal 
determines if the unsigned or signed result is used.  This make the code 
in my opinion easier to understand compared to the current adder-based 

Here is the code:

`timescale 1ns / 1ps   // MK 29.2.2016

module Multiplier(
   input clk, run, u,
   output stall,
   input [31:0] x, y,
   output [63:0] z);

wire [63:0] z_signed, z_unsigned;
reg [63:0] P;
reg S;

assign z = P;
assign stall = run & ~S;

mult_signed (.x(x), .y(y), .z(z_signed));
mult_unsigned (.x(x), .y(y), .z(z_unsigned));

always @ (posedge(clk)) begin
   P <= u ? z_unsigned : z_signed;
   S <= run;


module mult_signed (
   input signed [31:0] x,
   input signed [31:0] y,
   output signed [63:0] z);

assign z = x * y;


module mult_unsigned (
   input [31:0] x,
   input [31:0] y,
   output [63:0] z);

assign z = x * y;


This version of the code have succesfully been tested on Pepino board.


On 2/26/2016 9:59 AM, Magnus Karlsson wrote:
> One outcome of the discussion about the Oberon RISC5 verilog code is 
> that I did a deeper study about the clock limits for the project and 
> found that the RISC5 CPU in itself can be clocked at up to about 66 
> MHz but the external SRAM path is too slow for that speed (read is the 
> problem).  The asynchronous nature of the SRAM interface makes it hard 
> to constrain the ISE compiler to work hard on this path.
> I did trace the SRAM read data path and found that it takes about 10 
> ns from the SRAM data input pins to the Z register bit (this is the 
> longest path).  The address output path is about 5 nS and with a 10 nS 
> SRAM access time the fastest system clock cycle should be around 25 nS.
> To test this out I created a version of the code that runs the CPU at 
> 37.5 MHz (26.666 nS) instead of 25 MHz, i.e. the CPU is running 1/2 
> the video clock rate instead of 1/3,  and it seems to run fine of both 
> the LX9 and the LX25 version of Pepino.  All timing constants (UART 
> Rx, UART Tx, SPI and millisecond timer) have been changed to reflect 
> the 50% faster clock rate.
> If anyone want to try it, the project (including bit files) is 
> available here:
> https://github.com/Saanlima/Pepino/tree/master/Projects/RISC5Verilog_Pepino_fast 
> Cheers,
> Magnus
> -- 
> Oberon at lists.inf.ethz.ch mailing list for ETH Oberon and related systems
> https://lists.inf.ethz.ch/mailman/listinfo/oberon

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.inf.ethz.ch/pipermail/oberon/attachments/20160229/71074841/attachment.html>

More information about the Oberon mailing list