[Oberon] RISC5 implementation issues.

Wed Feb 17 18:41:22 CET 2016

Walter,

I agree with you that the "pure" way of doing this is as you stated, 
with a DCM to directly generate both clk and pclk.  So how come Paul 
didn't do that?  It's not like he doesn't know how to use the DCM, after 
all the current code generates pclk from clk using a DCM, and there 
would probably be less code to do it like you suggest.  No, the reason 
for this is very subtle and is easy to miss if you just take a quick 
look at the code, and it has to the asynchronous SRAM interface.

One of the most critical aspects of using SRAM is to control the write 
signal - ideally the write signal should be asserted after all other 
control signals (like address, data, byte-enable, read, oe) are valid, 
and should be de-asserted well before any of the other control signals 
go invalid, to avoid spurious writes. However, this is not that easy to 
do in a synchronous system where all signals change at the clock edge.  
The most common way to do this is to have a state machine that is 
clocked at say 4x the CPU clock so that you can divided the SRAM access 
cycle into several phases and assert the write signal on some of those 
phases.

However, this is not the way Paul choose to do it, instead he choose to 
do a less "pure" clock generation by generating clk from a flip-flop 
rather than from a DCM.  By doing so, he actually generates an early 
version of the clock signal called clk that is leading the global clock 
signal clk_BUFG by the delay of the BUFG buffer.  Since this early 
version of the clock signal is generated like any other logic signal, he 
could use this signal to gate the write signal to the SRAM such that 
write signal will be de-asserted well before the other control signals 
(clocked by clk_BUFG) will change, and thus avoiding the need to have a 
state machine controlling the write signal.  The price for this is that 
the clock signal is now generated in a less "pure" way, but still a 
valid way as long as you know what you are doing.  The BUFG clock driver 
can be driven from a PLL, a DCM or from the logic fabric.  The first two 
are speed optimized paths going directly from the PLL or DCM to the BUFG 
and can be clock at much higher clock rate, while the logic fabric path 
is limited by the maximum clock rate of the logic fabric.  However, at 
the clock rate we use (25 MHz) this is not an issue.  When you do this 
there are no warnings generated by ISE that this is not a good idea, and 
I have not read anywhere in the Xilinx clocking resource guide that you 
should avoid doing this.  Basically, the BUFG clock driver is designed 
to do this, the tool will allow you to do it and at the clock rate we 
use it has no performance implications.  As I see it, this is another 
place where the goal of simplification has driven the implementation of 
the system at the expense of a slightly less "pure" clock generation.

Just my 2c

Magnus

On 2/17/2016 4:18 AM, Walter Gallegos wrote:
> Hi Paul,
>
> My apologies for this off topic, ProjectOberon is basically a learning 
> tool some hardware comments should not be bad.
>
> Yes the tool add the clock buffers because FF clock edge detectors 
> must be connected to clock distribution tree, this do no correct the 
> issue.
> The problem is, to connect a FF output, as clk signal is, to the clock 
> buffer input the signal need be routed by general propose lines and 
> interconnection matrix. This generate an uncontrolled delay.
> This issue has minor effect in RISC5 because is very special case 
> where all project is self contained.
>
> A correct technique could be, RISC5 use a DCM to generate 75MHZ, use 
> the same DCM to generate both 25MHZ (CLKDV) and 75MHZ (CLKFX) from 
> 50MHZ (CLKIN).
>
> Regards,
> Walter
>
>
> El 2016-02-17 a las 06:39, Paul Reed escribió:
>> Hi Walter,
>>
>>> So, RISC5 use general propose resources to routing a clock signal.
>> I agree with Magnus, the tools add the relevant clock buffer as part of
>> their job, and the source code is kept simple and clear.
>>
>> FPGAs are a little off-topic for many Oberoners, but hopefully the below
>> simple hardware LED counter for the Spartan 3 board (easily adapted to
>> almost any other board!) might be indulged, and interesting for enough
>> people :)
>>
>> If you create a project in Xilinx ISE for the xc3s200-4ft256 and add 
>> these
>> source files, then "Generate Programming File", then as far as I can see
>> from the reports, the tools add the appropriate clock buffers and global
>> resources - correct me if I'm wrong!
>>
>> Cheers,
>> Paul
>>
>>
>> (test.v)
>>
>> `timescale 1ns / 1ps
>>
>> module TestTop(
>>      input CLK50M,   //50MHz
>>      output [7:0] leds);
>>
>> reg clk;
>> reg [31:0] cnt;
>>
>> assign leds = cnt[31:24];
>>
>> always @(posedge clk) //25MHz
>>    cnt <= cnt + 1;
>>
>> always @(posedge CLK50M) clk <= ~clk;
>>
>> endmodule
>>
>> (test.ucf)
>>
>> NET "CLK50M" LOC = "T9" ;
>> NET "leds[0]" LOC = "K12";
>> NET "leds[1]" LOC = "P14";
>> NET "leds[2]" LOC = "L12";
>> NET "leds[3]" LOC = "N14";
>> NET "leds[4]" LOC = "P13";
>> NET "leds[5]" LOC = "N12";
>> NET "leds[6]" LOC = "P12";
>> NET "leds[7]" LOC = "P11";
>>
>>
>> -- 
>> Oberon at lists.inf.ethz.ch mailing list for ETH Oberon and related systems
>> https://lists.inf.ethz.ch/mailman/listinfo/oberon
>>
>