<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">On 2/29/2016 4:00 PM, Magnus Karlsson

      wrote:<br>

      <br>

      Sorry, I meant 32 clocks for the multiply, not 16 clocks.<br>

      <br>

      Magnus<br>

      <br>

      <br>

    </div>

    <blockquote cite="mid:56D4DBAC.80908@saanlima.com" type="cite">

      <meta content="text/html; charset=windows-1252"

        http-equiv="Content-Type">

      <div class="moz-cite-prefix">The current RISC5 verilog code does

        not take advantage of the fact the both Spartan3 and Spartan6

        have hardware multipliers but instead does the multiply by doing

        32 additions, which will cause the CPU to stall for 32 clocks

        for each multiplication.<br>

        <br>

        As a test I rewrote the multiplier code so that the built-in

        hardware multipliers are used instead, and with this code the

        CPU is only stalled for 1 clock instead of 32 clocks.  The code

        uses the verilog2001 syntax to specify a signed and an unsigned

        multiplier, and the u input signal determines if the unsigned or

        signed result is used.  This make the code in my opinion easier

        to understand compared to the current adder-based code.  <br>

        <br>

        Here is the code:<br>

        <br>

        <font face="Courier New, Courier, monospace">`timescale 1ns /

          1ps   // MK 29.2.2016<br>

          <br>

          module Multiplier(<br>

            input clk, run, u,<br>

            output stall,<br>

            input [31:0] x, y,<br>

            output [63:0] z);<br>

          <br>

          wire [63:0] z_signed, z_unsigned;<br>

          reg [63:0] P;<br>

          reg S;<br>

          <br>

          assign z = P;<br>

          assign stall = run & ~S;<br>

          <br>

          mult_signed (.x(x), .y(y), .z(z_signed));<br>

          mult_unsigned (.x(x), .y(y), .z(z_unsigned));<br>

          <br>

          always @ (posedge(clk)) begin<br>

            P <= u ? z_unsigned : z_signed;<br>

            S <= run;<br>

          end<br>

          <br>

          endmodule<br>

          <br>

          module mult_signed (<br>

            input signed [31:0] x,<br>

            input signed [31:0] y,<br>

            output signed [63:0] z);<br>

            <br>

          assign z = x * y;<br>

          <br>

          endmodule<br>

          <br>

          module mult_unsigned (<br>

            input [31:0] x,<br>

            input [31:0] y,<br>

            output [63:0] z);<br>

            <br>

          assign z = x * y;<br>

          <br>

          endmodule</font><br>

        <br>

        <br>

        This version of the code have succesfully been tested on Pepino

        board.<br>

        <br>

        Cheers,<br>

        Magnus<br>

        <br>

        <br>

        <br>

        On 2/26/2016 9:59 AM, Magnus Karlsson wrote:<br>

      </div>

      <blockquote cite="mid:56D0928E.8070509@saanlima.com" type="cite">One


        outcome of the discussion about the Oberon RISC5 verilog code is

        that I did a deeper study about the clock limits for the project

        and found that the RISC5 CPU in itself can be clocked at up to

        about 66 MHz but the external SRAM path is too slow for that

        speed (read is the problem).  The asynchronous nature of the

        SRAM interface makes it hard to constrain the ISE compiler to

        work hard on this path. <br>

        <br>

        I did trace the SRAM read data path and found that it takes

        about 10 ns from the SRAM data input pins to the Z register bit

        (this is the longest path).  The address output path is about 5

        nS and with a 10 nS SRAM access time the fastest system clock

        cycle should be around 25 nS. <br>

        <br>

        To test this out I created a version of the code that runs the

        CPU at 37.5 MHz (26.666 nS) instead of 25 MHz, i.e. the CPU is

        running 1/2 the video clock rate instead of 1/3,  and it seems

        to run fine of both the LX9 and the LX25 version of Pepino.  All

        timing constants (UART Rx, UART Tx, SPI and millisecond timer)

        have been changed to reflect the 50% faster clock rate. <br>

        <br>

        If anyone want to try it, the project (including bit files) is

        available here: <br>

        <a moz-do-not-send="true" class="moz-txt-link-freetext"

href="https://github.com/Saanlima/Pepino/tree/master/Projects/RISC5Verilog_Pepino_fast">https://github.com/Saanlima/Pepino/tree/master/Projects/RISC5Verilog_Pepino_fast</a>

        <br>

        <br>

        Cheers, <br>

        Magnus <br>

        -- <br>

        <a moz-do-not-send="true" class="moz-txt-link-abbreviated"

          href="mailto:Oberon@lists.inf.ethz.ch">Oberon@lists.inf.ethz.ch</a>

        mailing list for ETH Oberon and related systems <br>

        <a moz-do-not-send="true" class="moz-txt-link-freetext"

          href="https://lists.inf.ethz.ch/mailman/listinfo/oberon">https://lists.inf.ethz.ch/mailman/listinfo/oberon</a>

        <br>

        <br>

      </blockquote>

      <br>

    </blockquote>

    <br>

  </body>

</html>