[Oberon] RISC5 implementation issues.

Fri Feb 19 18:58:06 CET 2016

Let us concentrate in the problem...

How to generate an output pulse shorter than clock period?

You can make a pulse shorter with help of DCM and DDR ( dual data rate 
output registers ) see the simulation capture

clk25f90 is the DCM clock output with 90° phase shift.

The VHDL code is : ( without DCM instantiation )

ENTITY PulseShaper IS
     PORT  (CLK25F90 :  IN STD_LOGIC;
              RESET :  IN STD_LOGIC;
                 WE : IN STD_LOGIC;
             WESRAM : OUT STD_LOGIC
          );
END PulseShaper;

ARCHITECTURE RTL OF PulseShaper IS

    CONSTANT zero : STD_LOGIC := '0';
    CONSTANT one : STD_LOGIC := '1';

    SIGNAL clk90n: STD_LOGIC;

BEGIN

    clk90n <= NOT(CLK25F90);  -- This is valid because the tool use the 
clock inverter available in IOB

    Dly : ODDR2
    PORT MAP (
       Q => WESRAM,
       C0 => CLK25F90,
       C1 => clk90n,
       CE => one,
       D0 => one,
       D1 => WE,
       R => zero,
       S => zero
    );

END RTL;

Someone know if exist a FPGA testbench for RISC5 ?

Regards,
Walter

El 2016-02-19 a las 11:33, Magnus Karlsson escribió:
> In all fairness, since I have generated and tested code for one way to 
> solve the problem, can you then give us your code proposal for 
> generating the SRAM write signal, with 25MHz and 75MHz generated by a 
> DCM?  The write signal must be asserted after all other SRAM control 
> signals are valid, and be de-asserted before any of the control 
> signals go invalid, and last for at least 5 nS.  The SRAM control 
> signals are generated by the 25MHz clock and last for one clock cycle.
>
> I will be more than happy to try it out on a board and report the result.
>
> Magnus
>
>
> On 2/19/2016 4:57 AM, Walter Gallegos wrote:
>> Have a solid and coherent clock distribution is basic for FPGA 
>> design, my proposition was keep both 25MHZ and 75MHZ, generated by DCM.
>>
>> Run all in 75 MHZ is unnecessary; also, clock enable approach add 
>> unnecessary complexity to the design. Make a design synchronous don't 
>> necessary means use the same clock for all the design.
>>
>> Continue using 25MHZ for the core and peripherals, the only concern 
>> is the CDC (clock domain crossing). As both clock was generated by 
>> the same DCM is minor problem; correspondent rising edges are aligned 
>> by design in DCMs; metastability is not an issue. Using DCMs and 
>> constraining input clock (50MHZ) the constraints propagation rules 
>> constraint both clocks, 25MHZ and 75MHZ. If no methodology errors all 
>> design are constrained from first to last register element in the 
>> chain. Beware of combinational logic in outside this elements.
>>
>> In my opinion, taking care of CDC keep both clock is the appropriate 
>> solution.
>>
>> Best regards,
>> Walter
>>
>> El 2016-02-18 a las 19:58, Magnus Karlsson escribió:
>>> So I have been thinking about this some more and decided to 
>>> modify/update the design to remove all the concerns raised by Walter 
>>> and Wojtek.
>>>
>>> Just to recap, Walter's concern is that the clocks are generated 
>>> using flip-flops and use logic fabric interconnect instead of 
>>> dedicated clocking elements and pathways, and that all clocks should 
>>> be generated by a DCM module instead (DCM = Digital Clock Manager).  
>>> Wojtek's concern is that there are unspecified timing relations 
>>> between the 25MHz and the 75MHz clock domains.
>>>
>>> Both concerns are valid and in my opinion the correct way to fix 
>>> both issues is to make the design completely synchronous. This means 
>>> that all clocked elements in the design (like flip-flops, memories 
>>> etc.) should be clocked with a single clock signal, which in this 
>>> case is the 75MHz clock.  The CPU and I/O subsystem, which before 
>>> was clocked by a separate 25MHz clock, are now also clocked by the 
>>> 75MHz clock but are only enabled to be clocked on every third clock 
>>> cycle.  This means that all "always @ (posedge clk)" statements have 
>>> been changed to include "If (enable) ...", where "enable" is a 
>>> signal that is true on every third clock cycle.  The asynchronous 
>>> SRAM interface is also changed so that the write signal is asserted 
>>> on the middle-third clock phase of the three clock CPU cycle.
>>>
>>> While the Verilog changes to do this are very straight forward, one 
>>> complication here is that the Xilinx ISE tool used to create the bit 
>>> file for the FPGA do not understand that the CPU and I/O subsystem 
>>> are only clocked on every third clock and will basically try to make 
>>> the CPU run at 75MHz, and will fail since this is too fast the 
>>> FPGA.  The solution to this problem is to tell the tool that all 
>>> clock paths in the CPU and I/O subsystem can actually take three 
>>> clocks to complete (this is called multi-cycle paths). With the 
>>> multi-cycle paths added to the .ucf file the design compiles with no 
>>> timing violations
>>>
>>> With those changes the 75MHz clock is now generated by a DCM and the 
>>> unspecified timing relations that Wojtek brought up are now gone 
>>> since everything is clocked with a single clock.   The modified 
>>> design have been tested on Pepino and seems to run fine.
>>>
>>> The complete ISE project with those changes are available at the 
>>> Pepino GitHub repository: 
>>> https://github.com/Saanlima/Pepino/tree/master/Projects/RISC5Verilog_Pepino
>>>
>>> Any comments or critique are welcome.
>>>
>>> Cheers,
>>> Magnus
>>>
>>>
>>>
>>> On 2/17/2016 2:16 PM, Walter Gallegos wrote:
>>>> Hi Magnus,
>>>>
>>>> You are welcome to continue with FPGA specific topics by private 
>>>> e-mail if you want.
>>>>
>>>> Regards
>>>> Walter
>>>>
>>>> El 2016-02-17 a las 18:30, Magnus Karlsson escribió:
>>>>> Hi Walter,
>>>>>
>>>>> Since this is really Paul's design, I guess it would be more 
>>>>> appropriate to discuss it with him, I was just trying to explain 
>>>>> why it looks like it does.
>>>>>
>>>>> Cheers,
>>>>> Magnus
>>>>>
>>>>>
>>>>> On 2/17/2016 1:15 PM, Walter Gallegos wrote:
>>>>>> Magnus,
>>>>>>
>>>>>> Some of messages was delayed; so, I continue from here to not 
>>>>>> overload the list.
>>>>>>
>>>>>> If I understand you correctly, you justify a uncontrolled delay 
>>>>>> because they simplify the SRAM handling.
>>>>>> Sorry, is as using the old circuit with an and/inverted to 
>>>>>> generate a pulse. If you need a delayed signal you should use the 
>>>>>> DCM 90°, 180° or 270° clock outputs and keep all under control, I 
>>>>>> think don't need a state machine in this case.
>>>>>>
>>>>>> About ISE warnings, be careful, non warning do not means good 
>>>>>> methodology.
>>>>>>
>>>>>> About XILINX docs; really, I don't remember. Doing training, 
>>>>>> first as Xilinx ATP and now as independent consultant, I touch 
>>>>>> this problem in my trainings. Have an uncontrolled delay in clock 
>>>>>> is a big door to random problems. FPGA design must be synchronous 
>>>>>> all times; no exceptions.
>>>>>>
>>>>>> Regards,
>>>>>> Walter
>>>>>>
>>>>>>
>>>>>> El 2016-02-17 a las 14:41, Magnus Karlsson escribió:
>>>>>>> Walter,
>>>>>>>
>>>>>>> I agree with you that the "pure" way of doing this is as you 
>>>>>>> stated, with a DCM to directly generate both clk and pclk. So 
>>>>>>> how come Paul didn't do that?  It's not like he doesn't know how 
>>>>>>> to use the DCM, after all the current code generates pclk from 
>>>>>>> clk using a DCM, and there would probably be less code to do it 
>>>>>>> like you suggest. No, the reason for this is very subtle and is 
>>>>>>> easy to miss if you just take a quick look at the code, and it 
>>>>>>> has to the asynchronous SRAM interface.
>>>>>>>
>>>>>>> One of the most critical aspects of using SRAM is to control the 
>>>>>>> write signal - ideally the write signal should be asserted after 
>>>>>>> all other control signals (like address, data, byte-enable, 
>>>>>>> read, oe) are valid, and should be de-asserted well before any 
>>>>>>> of the other control signals go invalid, to avoid spurious 
>>>>>>> writes. However, this is not that easy to do in a synchronous 
>>>>>>> system where all signals change at the clock edge. The most 
>>>>>>> common way to do this is to have a state machine that is clocked 
>>>>>>> at say 4x the CPU clock so that you can divided the SRAM access 
>>>>>>> cycle into several phases and assert the write signal on some of 
>>>>>>> those phases.
>>>>>>>
>>>>>>> However, this is not the way Paul choose to do it, instead he 
>>>>>>> choose to do a less "pure" clock generation by generating clk 
>>>>>>> from a flip-flop rather than from a DCM. By doing so, he 
>>>>>>> actually generates an early version of the clock signal called 
>>>>>>> clk that is leading the global clock signal clk_BUFG by the 
>>>>>>> delay of the BUFG buffer.  Since this early version of the clock 
>>>>>>> signal is generated like any other logic signal, he could use 
>>>>>>> this signal to gate the write signal to the SRAM such that write 
>>>>>>> signal will be de-asserted well before the other control signals 
>>>>>>> (clocked by clk_BUFG) will change, and thus avoiding the need to 
>>>>>>> have a state machine controlling the write signal.  The price 
>>>>>>> for this is that the clock signal is now generated in a less 
>>>>>>> "pure" way, but still a valid way as long as you know what you 
>>>>>>> are doing. The BUFG clock driver can be driven from a PLL, a DCM 
>>>>>>> or from the logic fabric. The first two are speed optimized 
>>>>>>> paths going directly from the PLL or DCM to the BUFG and can be 
>>>>>>> clock at much higher clock rate, while the logic fabric path is 
>>>>>>> limited by the maximum clock rate of the logic fabric. However, 
>>>>>>> at the clock rate we use (25 MHz) this is not an issue.  When 
>>>>>>> you do this there are no warnings generated by ISE that this is 
>>>>>>> not a good idea, and I have not read anywhere in the Xilinx 
>>>>>>> clocking resource guide that you should avoid doing this. 
>>>>>>> Basically, the BUFG clock driver is designed to do this, the 
>>>>>>> tool will allow you to do it and at the clock rate we use it has 
>>>>>>> no performance implications. As I see it, this is another place 
>>>>>>> where the goal of simplification has driven the implementation 
>>>>>>> of the system at the expense of a slightly less "pure" clock 
>>>>>>> generation.
>>>>>>>
>>>>>>> Just my 2c
>>>>>>>
>>>>>>> Magnus
>>>
>>> -- 
>>> Oberon at lists.inf.ethz.ch mailing list for ETH Oberon and related 
>>> systems
>>> https://lists.inf.ethz.ch/mailman/listinfo/oberon
>>>
>>
>
> -- 
> Oberon at lists.inf.ethz.ch mailing list for ETH Oberon and related systems
> https://lists.inf.ethz.ch/mailman/listinfo/oberon
>

-- 

Walter Daniel Gallegos
Programmable Logic & Software
Consultoría, Diseño, Entrenamiento.
Montevideo, Uruguay
EMAIL walter at waltergallegos.com
Tel +598 26 23 44 60 | Cel +598 99 18 58 88

El presente correo y cualquier posible archivo adjunto está dirigido únicamente
al destinatario del mensaje y contiene información que puede ser confidencial.
Si Ud. no es el destinatario correcto por favor notifique al remitente
respondiendo anexando este mensaje y elimine inmediatamente el e-mail y los
posibles archivos adjuntos al mismo de su sistema. Está prohibida cualquier
utilización, difusión o copia de este e-mail por cualquier persona o entidad
que no sean las específicas destinatarias del mensaje.

This e-mail and any attachment is confidential and is intended solely for the
addressee(s). If you are not intended recipient please inform the sender
immediately, answering this e-mail and delete it as well as the attached files.
Any use, circulation or copy of this e-mail by any person or entity that is not
the specific addressee(s) is prohibited.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.inf.ethz.ch/pipermail/oberon/attachments/20160219/7a146b1b/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ffgabeje.png
Type: image/png
Size: 6951 bytes
Desc: not available
URL: <http://lists.inf.ethz.ch/pipermail/oberon/attachments/20160219/7a146b1b/attachment.png>