[Oberon] Fast version of Oberon RISC5 for Pepino
Magnus Karlsson
magnus at saanlima.com
Tue Mar 1 01:16:57 CET 2016
On 2/29/2016 4:00 PM, Magnus Karlsson wrote:
Sorry, I meant 32 clocks for the multiply, not 16 clocks.
Magnus
> The current RISC5 verilog code does not take advantage of the fact the
> both Spartan3 and Spartan6 have hardware multipliers but instead does
> the multiply by doing 32 additions, which will cause the CPU to stall
> for 32 clocks for each multiplication.
>
> As a test I rewrote the multiplier code so that the built-in hardware
> multipliers are used instead, and with this code the CPU is only
> stalled for 1 clock instead of 32 clocks. The code uses the
> verilog2001 syntax to specify a signed and an unsigned multiplier, and
> the u input signal determines if the unsigned or signed result is
> used. This make the code in my opinion easier to understand compared
> to the current adder-based code.
>
> Here is the code:
>
> `timescale 1ns / 1ps // MK 29.2.2016
>
> module Multiplier(
> input clk, run, u,
> output stall,
> input [31:0] x, y,
> output [63:0] z);
>
> wire [63:0] z_signed, z_unsigned;
> reg [63:0] P;
> reg S;
>
> assign z = P;
> assign stall = run & ~S;
>
> mult_signed (.x(x), .y(y), .z(z_signed));
> mult_unsigned (.x(x), .y(y), .z(z_unsigned));
>
> always @ (posedge(clk)) begin
> P <= u ? z_unsigned : z_signed;
> S <= run;
> end
>
> endmodule
>
> module mult_signed (
> input signed [31:0] x,
> input signed [31:0] y,
> output signed [63:0] z);
>
> assign z = x * y;
>
> endmodule
>
> module mult_unsigned (
> input [31:0] x,
> input [31:0] y,
> output [63:0] z);
>
> assign z = x * y;
>
> endmodule
>
>
> This version of the code have succesfully been tested on Pepino board.
>
> Cheers,
> Magnus
>
>
>
> On 2/26/2016 9:59 AM, Magnus Karlsson wrote:
>> One outcome of the discussion about the Oberon RISC5 verilog code is
>> that I did a deeper study about the clock limits for the project and
>> found that the RISC5 CPU in itself can be clocked at up to about 66
>> MHz but the external SRAM path is too slow for that speed (read is
>> the problem). The asynchronous nature of the SRAM interface makes it
>> hard to constrain the ISE compiler to work hard on this path.
>>
>> I did trace the SRAM read data path and found that it takes about 10
>> ns from the SRAM data input pins to the Z register bit (this is the
>> longest path). The address output path is about 5 nS and with a 10
>> nS SRAM access time the fastest system clock cycle should be around
>> 25 nS.
>>
>> To test this out I created a version of the code that runs the CPU at
>> 37.5 MHz (26.666 nS) instead of 25 MHz, i.e. the CPU is running 1/2
>> the video clock rate instead of 1/3, and it seems to run fine of
>> both the LX9 and the LX25 version of Pepino. All timing constants
>> (UART Rx, UART Tx, SPI and millisecond timer) have been changed to
>> reflect the 50% faster clock rate.
>>
>> If anyone want to try it, the project (including bit files) is
>> available here:
>> https://github.com/Saanlima/Pepino/tree/master/Projects/RISC5Verilog_Pepino_fast
>>
>>
>> Cheers,
>> Magnus
>> --
>> Oberon at lists.inf.ethz.ch mailing list for ETH Oberon and related systems
>> https://lists.inf.ethz.ch/mailman/listinfo/oberon
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.inf.ethz.ch/pipermail/oberon/attachments/20160229/12bd4d94/attachment.html>
More information about the Oberon
mailing list