<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">The current RISC5 verilog code does not
take advantage of the fact the both Spartan3 and Spartan6 have
hardware multipliers but instead does the multiply by doing 16
additions, which will case the CPU to stall for 16 clocks for each
multiplication.<br>
<br>
As a test I rewrote the multiplier code so that the built-in
hardware multipliers are used instead, and with this code the CPU
is only stalled for 1 clock instead of 16 clocks. The code uses
the verilog2001 syntax to specify a signed and an unsigned
multiplier, and the u input signal determines if the unsigned or
signed result is used. This make the code in my opinion easier to
understand compared to the current adder-based code. <br>
<br>
Here is the code:<br>
<br>
<font face="Courier New, Courier, monospace">`timescale 1ns /
1ps // MK 29.2.2016<br>
<br>
module Multiplier(<br>
input clk, run, u,<br>
output stall,<br>
input [31:0] x, y,<br>
output [63:0] z);<br>
<br>
wire [63:0] z_signed, z_unsigned;<br>
reg [63:0] P;<br>
reg S;<br>
<br>
assign z = P;<br>
assign stall = run & ~S;<br>
<br>
mult_signed (.x(x), .y(y), .z(z_signed));<br>
mult_unsigned (.x(x), .y(y), .z(z_unsigned));<br>
<br>
always @ (posedge(clk)) begin<br>
P <= u ? z_unsigned : z_signed;<br>
S <= run;<br>
end<br>
<br>
endmodule<br>
<br>
module mult_signed (<br>
input signed [31:0] x,<br>
input signed [31:0] y,<br>
output signed [63:0] z);<br>
<br>
assign z = x * y;<br>
<br>
endmodule<br>
<br>
module mult_unsigned (<br>
input [31:0] x,<br>
input [31:0] y,<br>
output [63:0] z);<br>
<br>
assign z = x * y;<br>
<br>
endmodule</font><br>
<br>
<br>
This version of the code have succesfully been tested on Pepino
board.<br>
<br>
Cheers,<br>
Magnus<br>
<br>
<br>
<br>
On 2/26/2016 9:59 AM, Magnus Karlsson wrote:<br>
</div>
<blockquote cite="mid:56D0928E.8070509@saanlima.com" type="cite">One
outcome of the discussion about the Oberon RISC5 verilog code is
that I did a deeper study about the clock limits for the project
and found that the RISC5 CPU in itself can be clocked at up to
about 66 MHz but the external SRAM path is too slow for that speed
(read is the problem). The asynchronous nature of the SRAM
interface makes it hard to constrain the ISE compiler to work hard
on this path.
<br>
<br>
I did trace the SRAM read data path and found that it takes about
10 ns from the SRAM data input pins to the Z register bit (this is
the longest path). The address output path is about 5 nS and with
a 10 nS SRAM access time the fastest system clock cycle should be
around 25 nS.
<br>
<br>
To test this out I created a version of the code that runs the CPU
at 37.5 MHz (26.666 nS) instead of 25 MHz, i.e. the CPU is running
1/2 the video clock rate instead of 1/3, and it seems to run fine
of both the LX9 and the LX25 version of Pepino. All timing
constants (UART Rx, UART Tx, SPI and millisecond timer) have been
changed to reflect the 50% faster clock rate.
<br>
<br>
If anyone want to try it, the project (including bit files) is
available here:
<br>
<a class="moz-txt-link-freetext" href="https://github.com/Saanlima/Pepino/tree/master/Projects/RISC5Verilog_Pepino_fast">https://github.com/Saanlima/Pepino/tree/master/Projects/RISC5Verilog_Pepino_fast</a>
<br>
<br>
Cheers,
<br>
Magnus
<br>
--
<br>
<a class="moz-txt-link-abbreviated" href="mailto:Oberon@lists.inf.ethz.ch">Oberon@lists.inf.ethz.ch</a> mailing list for ETH Oberon and related
systems
<br>
<a class="moz-txt-link-freetext" href="https://lists.inf.ethz.ch/mailman/listinfo/oberon">https://lists.inf.ethz.ch/mailman/listinfo/oberon</a>
<br>
<br>
</blockquote>
<br>
</body>
</html>