[Oberon] RISC5 implementation issues.

Magnus Karlsson magnus at saanlima.com
Tue Feb 23 17:49:05 CET 2016


Walter,

I tried your proposed solution and couldn't make it work.  It "almost" 
works in that it starts the boot process and reads from the SD-card, but 
then hangs with the LEDs showing 0x84.
Maybe I didn't do it correct, I instantiated the ODDR2 buffer directly 
at the top module and modified the DCM instantiation to create the clk90 
clock.

Here is the code, can you look it over and see if this is what you had 
in mind or if something is wrong:

DCM #(.CLKFX_MULTIPLY(3), .CLKIN_DIVIDE_BY_2("TRUE"), .CLKIN_PERIOD(20.000))
   dcm(.CLKIN(CLK50M), .CLKFB(clk), .RST(1'b0), .PSEN(1'b0),
       .PSINCDEC(1'b0), .PSCLK(1'b0), .DSSEN(1'b0), .CLKFX(pclk), 
.CLK0(clk),
       .CLK90(clk90));

ODDR2 #(.INIT(1'b1))
   oddr2(.Q(wr_enable), .C0(clk90), .C1(~clk90), .CE(1'b1), .D0(1'b1),
       .D1(~wr), .R(1'b0), .S(1'b0));

assign SRwe = wr_enable;


wr is the active-high write signal from the RISC5 CPU, and SRwe is the 
short active-low write-enable signal to the SRAM.

Just to recap, this is the code I used to generate the short write 
pulse, using a 2x clk instead.  This version seems to run fine:

reg wr_enable;

DCM #(.CLKFX_MULTIPLY(3), .CLKFX_DIVIDE(2), .CLKDV_DIVIDE(2), 
.CLKIN_PERIOD(20.000))
   dcm(.CLKIN(CLK50M), .CLKFB(clk2x), .RST(1'b0), .PSEN(1'b0),
       .PSINCDEC(1'b0), .PSCLK(1'b0), .DSSEN(1'b0), .CLKFX(pclk), 
.CLKDV(clk), .CLK0(clk2x));

always @(negedge clk2x)
   wr_enable <= wr & ~wr_enable;

assign SRwe = ~wr_enable;


Any idea what's wrong?
Magnus


On 2/19/2016 9:58 AM, Walter Gallegos wrote:
> Let us concentrate in the problem...
>
> How to generate an output pulse shorter than clock period?
>
> You can make a pulse shorter with help of DCM and DDR ( dual data rate 
> output registers ) see the simulation capture
>
>
>
> clk25f90 is the DCM clock output with 90° phase shift.
>
> The VHDL code is : ( without DCM instantiation )
>
> ENTITY PulseShaper IS
>     PORT  (CLK25F90 :  IN STD_LOGIC;
>              RESET :  IN STD_LOGIC;
>                 WE : IN STD_LOGIC;
>             WESRAM : OUT STD_LOGIC
>          );
> END PulseShaper;
>
> ARCHITECTURE RTL OF PulseShaper IS
>
>    CONSTANT zero : STD_LOGIC := '0';
>    CONSTANT one : STD_LOGIC := '1';
>
>    SIGNAL clk90n: STD_LOGIC;
>
> BEGIN
>
>    clk90n <= NOT(CLK25F90);  -- This is valid because the tool use the 
> clock inverter available in IOB
>
>    Dly : ODDR2
>    PORT MAP (
>       Q => WESRAM,
>       C0 => CLK25F90,
>       C1 => clk90n,
>       CE => one,
>       D0 => one,
>       D1 => WE,
>       R => zero,
>       S => zero
>    );
>
> END RTL;
>
> Someone know if exist a FPGA testbench for RISC5 ?
>
> Regards,
> Walter
>
> El 2016-02-19 a las 11:33, Magnus Karlsson escribió:
>> In all fairness, since I have generated and tested code for one way 
>> to solve the problem, can you then give us your code proposal for 
>> generating the SRAM write signal, with 25MHz and 75MHz generated by a 
>> DCM?  The write signal must be asserted after all other SRAM control 
>> signals are valid, and be de-asserted before any of the control 
>> signals go invalid, and last for at least 5 nS.  The SRAM control 
>> signals are generated by the 25MHz clock and last for one clock cycle.
>>
>> I will be more than happy to try it out on a board and report the 
>> result.
>>
>> Magnus
>>
>>
>> On 2/19/2016 4:57 AM, Walter Gallegos wrote:
>>> Have a solid and coherent clock distribution is basic for FPGA 
>>> design, my proposition was keep both 25MHZ and 75MHZ, generated by DCM.
>>>
>>> Run all in 75 MHZ is unnecessary; also, clock enable approach add 
>>> unnecessary complexity to the design. Make a design synchronous 
>>> don't necessary means use the same clock for all the design.
>>>
>>> Continue using 25MHZ for the core and peripherals, the only concern 
>>> is the CDC (clock domain crossing). As both clock was generated by 
>>> the same DCM is minor problem; correspondent rising edges are 
>>> aligned by design in DCMs; metastability is not an issue. Using DCMs 
>>> and constraining input clock (50MHZ) the constraints propagation 
>>> rules constraint both clocks, 25MHZ and 75MHZ. If no methodology 
>>> errors all design are constrained from first to last register 
>>> element in the chain. Beware of combinational logic in outside this 
>>> elements.
>>>
>>> In my opinion, taking care of CDC keep both clock is the appropriate 
>>> solution.
>>>
>>> Best regards,
>>> Walter
>>>
>>> El 2016-02-18 a las 19:58, Magnus Karlsson escribió:
>>>> So I have been thinking about this some more and decided to 
>>>> modify/update the design to remove all the concerns raised by 
>>>> Walter and Wojtek.
>>>>
>>>> Just to recap, Walter's concern is that the clocks are generated 
>>>> using flip-flops and use logic fabric interconnect instead of 
>>>> dedicated clocking elements and pathways, and that all clocks 
>>>> should be generated by a DCM module instead (DCM = Digital Clock 
>>>> Manager).  Wojtek's concern is that there are unspecified timing 
>>>> relations between the 25MHz and the 75MHz clock domains.
>>>>
>>>> Both concerns are valid and in my opinion the correct way to fix 
>>>> both issues is to make the design completely synchronous.  This 
>>>> means that all clocked elements in the design (like flip-flops, 
>>>> memories etc.) should be clocked with a single clock signal, which 
>>>> in this case is the 75MHz clock.  The CPU and I/O subsystem, which 
>>>> before was clocked by a separate 25MHz clock, are now also clocked 
>>>> by the 75MHz clock but are only enabled to be clocked on every 
>>>> third clock cycle.  This means that all "always @ (posedge clk)" 
>>>> statements have been changed to include "If (enable) ...", where 
>>>> "enable" is a signal that is true on every third clock cycle.  The 
>>>> asynchronous SRAM interface is also changed so that the write 
>>>> signal is asserted on the middle-third clock phase of the three 
>>>> clock CPU cycle.
>>>>
>>>> While the Verilog changes to do this are very straight forward, one 
>>>> complication here is that the Xilinx ISE tool used to create the 
>>>> bit file for the FPGA do not understand that the CPU and I/O 
>>>> subsystem are only clocked on every third clock and will basically 
>>>> try to make the CPU run at 75MHz, and will fail since this is too 
>>>> fast the FPGA.  The solution to this problem is to tell the tool 
>>>> that all clock paths in the CPU and I/O subsystem can actually take 
>>>> three clocks to complete (this is called multi-cycle paths). With 
>>>> the multi-cycle paths added to the .ucf file the design compiles 
>>>> with no timing violations
>>>>
>>>> With those changes the 75MHz clock is now generated by a DCM and 
>>>> the unspecified timing relations that Wojtek brought up are now 
>>>> gone since everything is clocked with a single clock.   The 
>>>> modified design have been tested on Pepino and seems to run fine.
>>>>
>>>> The complete ISE project with those changes are available at the 
>>>> Pepino GitHub repository: 
>>>> https://github.com/Saanlima/Pepino/tree/master/Projects/RISC5Verilog_Pepino
>>>>
>>>> Any comments or critique are welcome.
>>>>
>>>> Cheers,
>>>> Magnus
>>>>
>>>>
>>>>
>>>> On 2/17/2016 2:16 PM, Walter Gallegos wrote:
>>>>> Hi Magnus,
>>>>>
>>>>> You are welcome to continue with FPGA specific topics by private 
>>>>> e-mail if you want.
>>>>>
>>>>> Regards
>>>>> Walter
>>>>>
>>>>> El 2016-02-17 a las 18:30, Magnus Karlsson escribió:
>>>>>> Hi Walter,
>>>>>>
>>>>>> Since this is really Paul's design, I guess it would be more 
>>>>>> appropriate to discuss it with him, I was just trying to explain 
>>>>>> why it looks like it does.
>>>>>>
>>>>>> Cheers,
>>>>>> Magnus
>>>>>>
>>>>>>
>>>>>> On 2/17/2016 1:15 PM, Walter Gallegos wrote:
>>>>>>> Magnus,
>>>>>>>
>>>>>>> Some of messages was delayed; so, I continue from here to not 
>>>>>>> overload the list.
>>>>>>>
>>>>>>> If I understand you correctly, you justify a uncontrolled delay 
>>>>>>> because they simplify the SRAM handling.
>>>>>>> Sorry, is as using the old circuit with an and/inverted to 
>>>>>>> generate a pulse. If you need a delayed signal you should use 
>>>>>>> the DCM 90°, 180° or 270° clock outputs and keep all under 
>>>>>>> control, I think don't need a state machine in this case.
>>>>>>>
>>>>>>> About ISE warnings, be careful, non warning do not means good 
>>>>>>> methodology.
>>>>>>>
>>>>>>> About XILINX docs; really, I don't remember. Doing training, 
>>>>>>> first as Xilinx ATP and now as independent consultant, I touch 
>>>>>>> this problem in my trainings. Have an uncontrolled delay in 
>>>>>>> clock is a big door to random problems. FPGA design must be 
>>>>>>> synchronous all times; no exceptions.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Walter
>>>>>>>
>>>>>>>
>>>>>>> El 2016-02-17 a las 14:41, Magnus Karlsson escribió:
>>>>>>>> Walter,
>>>>>>>>
>>>>>>>> I agree with you that the "pure" way of doing this is as you 
>>>>>>>> stated, with a DCM to directly generate both clk and pclk. So 
>>>>>>>> how come Paul didn't do that? It's not like he doesn't know how 
>>>>>>>> to use the DCM, after all the current code generates pclk from 
>>>>>>>> clk using a DCM, and there would probably be less code to do it 
>>>>>>>> like you suggest. No, the reason for this is very subtle and is 
>>>>>>>> easy to miss if you just take a quick look at the code, and it 
>>>>>>>> has to the asynchronous SRAM interface.
>>>>>>>>
>>>>>>>> One of the most critical aspects of using SRAM is to control 
>>>>>>>> the write signal - ideally the write signal should be asserted 
>>>>>>>> after all other control signals (like address, data, 
>>>>>>>> byte-enable, read, oe) are valid, and should be de-asserted 
>>>>>>>> well before any of the other control signals go invalid, to 
>>>>>>>> avoid spurious writes. However, this is not that easy to do in 
>>>>>>>> a synchronous system where all signals change at the clock 
>>>>>>>> edge.  The most common way to do this is to have a state 
>>>>>>>> machine that is clocked at say 4x the CPU clock so that you can 
>>>>>>>> divided the SRAM access cycle into several phases and assert 
>>>>>>>> the write signal on some of those phases.
>>>>>>>>
>>>>>>>> However, this is not the way Paul choose to do it, instead he 
>>>>>>>> choose to do a less "pure" clock generation by generating clk 
>>>>>>>> from a flip-flop rather than from a DCM. By doing so, he 
>>>>>>>> actually generates an early version of the clock signal called 
>>>>>>>> clk that is leading the global clock signal clk_BUFG by the 
>>>>>>>> delay of the BUFG buffer.  Since this early version of the 
>>>>>>>> clock signal is generated like any other logic signal, he could 
>>>>>>>> use this signal to gate the write signal to the SRAM such that 
>>>>>>>> write signal will be de-asserted well before the other control 
>>>>>>>> signals (clocked by clk_BUFG) will change, and thus avoiding 
>>>>>>>> the need to have a state machine controlling the write signal.  
>>>>>>>> The price for this is that the clock signal is now generated in 
>>>>>>>> a less "pure" way, but still a valid way as long as you know 
>>>>>>>> what you are doing. The BUFG clock driver can be driven from a 
>>>>>>>> PLL, a DCM or from the logic fabric.  The first two are speed 
>>>>>>>> optimized paths going directly from the PLL or DCM to the BUFG 
>>>>>>>> and can be clock at much higher clock rate, while the logic 
>>>>>>>> fabric path is limited by the maximum clock rate of the logic 
>>>>>>>> fabric. However, at the clock rate we use (25 MHz) this is not 
>>>>>>>> an issue.  When you do this there are no warnings generated by 
>>>>>>>> ISE that this is not a good idea, and I have not read anywhere 
>>>>>>>> in the Xilinx clocking resource guide that you should avoid 
>>>>>>>> doing this. Basically, the BUFG clock driver is designed to do 
>>>>>>>> this, the tool will allow you to do it and at the clock rate we 
>>>>>>>> use it has no performance implications. As I see it, this is 
>>>>>>>> another place where the goal of simplification has driven the 
>>>>>>>> implementation of the system at the expense of a slightly less 
>>>>>>>> "pure" clock generation.
>>>>>>>>
>>>>>>>> Just my 2c
>>>>>>>>
>>>>>>>> Magnus
>>>>
>>>> -- 
>>>> Oberon at lists.inf.ethz.ch mailing list for ETH Oberon and related 
>>>> systems
>>>> https://lists.inf.ethz.ch/mailman/listinfo/oberon
>>>>
>>>
>>
>> -- 
>> Oberon at lists.inf.ethz.ch mailing list for ETH Oberon and related systems
>> https://lists.inf.ethz.ch/mailman/listinfo/oberon
>>
>
> -- 
>
> Walter Daniel Gallegos
> Programmable Logic & Software
> Consultoría, Diseño, Entrenamiento.
> Montevideo, Uruguay
> EMAILwalter at waltergallegos.com   
> Tel +598 26 23 44 60 | Cel +598 99 18 58 88
>
>
> El presente correo y cualquier posible archivo adjunto está dirigido únicamente
> al destinatario del mensaje y contiene información que puede ser confidencial.
> Si Ud. no es el destinatario correcto por favor notifique al remitente
> respondiendo anexando este mensaje y elimine inmediatamente el e-mail y los
> posibles archivos adjuntos al mismo de su sistema. Está prohibida cualquier
> utilización, difusión o copia de este e-mail por cualquier persona o entidad
> que no sean las específicas destinatarias del mensaje.
>
> This e-mail and any attachment is confidential and is intended solely for the
> addressee(s). If you are not intended recipient please inform the sender
> immediately, answering this e-mail and delete it as well as the attached files.
> Any use, circulation or copy of this e-mail by any person or entity that is not
> the specific addressee(s) is prohibited.
>
>
>
> --
> Oberon at lists.inf.ethz.ch mailing list for ETH Oberon and related systems
> https://lists.inf.ethz.ch/mailman/listinfo/oberon

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.inf.ethz.ch/pipermail/oberon/attachments/20160223/7c4a806d/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 6951 bytes
Desc: not available
URL: <http://lists.inf.ethz.ch/pipermail/oberon/attachments/20160223/7c4a806d/attachment.png>


More information about the Oberon mailing list