<div dir="ltr">Skulski, good day to you.<div><br><div>before I address some issues you raise I want to ask you a question:</div><div>I seem to think that you very much need some kind of a real time response out of the software.</div><div>Where does the Oberon garbage collection fit into your scheme of things? I takes quite some time you know (relatively speaking).</div><div><br></div><div>Otherwise let me think a bit, I am sure things can be arranged. The runtime will need to be chopped perhaps but that is reasonably normal in this type of setup. But I would definitely balk at hacking the compiler, although it has been done in the past.</div><div><br></div><div>&gt; These &quot;toys&quot; are not toys for an electrical engineer<br></div><div><br></div><div>Well I have been around, nothing human is a big surprise these days. And I just spend a few months arranging Ada runtimes that _did_ affect the way the language behaves. since that is the only way to do certain things in that language. </div><div>For the time being, unless convinced otherwise, I do not think we need to do that here. there will be a much more elegant solution.</div><div><br></div><div>j.</div><div><br></div><div>&quot;No, no, you&#39;re not thinking; you&#39;re just being logical.&quot; - Niels Bohr.<br></div><div><br></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Aug 27, 2015 at 4:52 PM,  <span dir="ltr">&lt;<a href="mailto:skulski@pas.rochester.edu" target="_blank">skulski@pas.rochester.edu</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Juerg:<br>

<span class="">&gt; 1) sometimes it is necessary to put data in fast memory<br>

&gt; (I think you mean BRAM)<br>

</span>&gt; 2) you would like to process your ADC data (stored in BRAM)<br>

&gt;     in Oberon.<br>

<br>

Let me make it clear for those in the audience who are new to the FPGA<br>

design. BRAM is a dual port memory. It has many uses which I can discuss<br>

if there is interest. Here we are talking of data acquisition (DAQ) with a<br>

few nanoseconds per data sample. A typical high performance DAQ design<br>

uses BRAM port A to write data from the converter, and it uses port B to<br>

access the data when the DAQ stops. (Think of a digital oscilloscope.) The<br>

data needs to be accessed in situ. Copying it to slow memory would render<br>

the instrument slow and unusable.<br>

<br>

The bottom line: there is need to map a BRAM to the processor&#39;s memory<br>

space. This has been discussed by Jan (thank you, Jan!). His prescription<br>

is low level. OK, let be it. But let me make it clear that moving the data<br>

is not permitted. Accessing the data without moving is necessary.<br>

<br>

Furthermore, the code which implements the data processing needs to be<br>

executed from BRAM as well. (It will be a different block of BRAM.) I<br>

think that RISC5 can run at some 100+ MHz when it executes from BRAM. We<br>

have looked at the RISC5 Verilog and we see that it needs to be reworked<br>

in this direction. Hopefully it can reach 100+ MHz from BRAM. Then the<br>

slow code can be put into off-chip SRAM and we can use the stall signal to<br>

slow the processor down to match the SRAM.<br>

<br>

&gt;From this discussion it follows that we need to map both the variables and<br>

the code to particular addresses. The mechanism for doing this needs to be<br>

specified somehow. Of course we can take the Oberon System and hack it,<br>

but this would be ugly. I am hoping that a sort of agreed-upon approach<br>

will emerge from this discussion.<br>

<span class=""><br>

<br>

&gt; Oberon works with what you call slow memory. So if you want to process<br>

your fast data with the slow Oberon, why not put the data in the slow<br>

memory in the first place?<br>

<br>

</span>Performance will be unacceptable. In our design we will use &quot;ping-pong<br>

buffers&quot; implemented in BRAM. When one BRAM buffer acquires the data, the<br>

previous data from the other buffer is being processed in situ.<br>

<span class=""><br>

&gt; - Jan proposes a copy approach BRAM to SRAM<br>

<br>

</span>Unacceptable for the performance reasons.<br>

<span class=""><br>

&gt; and then process by Oberon - Chris proposes a mapped memory aaproach.<br>

<br>

</span>This is the way.<br>

<span class=""><br>

<br>

&gt; To understand the Oberon memory layout have a look at figure 8.1 in<br>

chapter 8 of project oberon.<br>

&gt; You see basically that the Oberon system splits memory in four blocks A<br>

memory for module code<br>

&gt; B memory for procedure variables (called stack)<br>

&gt; C memory for dynamic variables allocated with NEW (called heap)<br>

&gt; D memory for IO (display frame buffer and IO registers)<br>

<br>

</span>We need to statically allocate variables to HW addresses. Would be really<br>

nice to have this facility for both the scalar variables (aka &quot;registers&quot;)<br>

and arrays which will be overlaid over BRAM blocks.<br>

<br>

Furthermore, we need the facility to specify that certain code is executed<br>

from BRAM to gain performance. This can be specified per module to keep<br>

with the modular spirit of the language. A finer granularity would be<br>

complicated and not necessary.<br>

<br>

Here we are talking of a factor 5x in performance between BRAM and<br>

external SRAM. This is not a small optimization. It is quite crucial.<br>

<br>

At present there is the single MODULE* which is locked into BRAM, so we<br>

are almost there. I hope that we can use the MODULE* as the &quot;system<br>

library&quot; of sorts, where the performance-critical code will be put and<br>

called by the &quot;slow code&quot; that lives in SRAM. However, it is a hack. It<br>

would be really nice if the memory allocation could be specified for<br>

regular modules as well, on a per-module basis.<br>

<span class=""><br>

<br>

&gt; This layout is flexibly established during booting by two constants in<br>

the<br>

&gt; boot record, called &quot;heapOrg&quot; and &quot;heapLimit&quot;.<br>

&gt; A starts at 0 and grows upwards<br>

&gt; B starts at heapOrg and grows downwards<br>

&gt; C starts at heapOrg and grows upward til heapLimit<br>

&gt; D starts at heapOrg+heapLimit<br>

&gt; Now, I think you don&#39;t like Jan&#39;s copy approach. So you could do the<br>

following memory map approach: Reduce &quot;heapLimit&quot; and map the memory<br>

range<br>

&gt; just below the VGA framebuffer to your BRAM (this mapping has to be done<br>

in Verilog)<br>

&gt; You can declare in Oberon a safe TYPE to your ADC data (e.g. POINTER TO<br>

ARRAY 256 OF BYTE) and allocate the start address of your mapped BRAM<br>

memory to a pointer variable.<br>

<br>

</span>I think this is all good. It is a bit of a hack put on top the classic<br>

design, but it looks workable to me. Thank you for the suggestions.<br>

<span class=""><br>

Jan wrote:<br>

<br>

&gt;&gt; But we can do without a version for every toy about town<br>

<br>

</span>I am discussing features that may seem exotic to a computer scientist.<br>

These &quot;toys&quot; are not toys for an electrical engineer working with FPGAs.<br>

They are in fact fundamental in the realm of FPGA design. My point is that<br>

the FPGA Oberon System is running in the FPGA. It would be good to know<br>

how the FPGA Oberon can help using the FPGA to the fullest extent. If it<br>

does  then it will become a much more attractive tool.<br>

<br>

Thank you,<br>

<div class="HOEnZb"><div class="h5">Wojtek<br>

<br>

<br>

--<br>

<a href="mailto:Oberon@lists.inf.ethz.ch">Oberon@lists.inf.ethz.ch</a> mailing list for ETH Oberon and related systems<br>

<a href="https://lists.inf.ethz.ch/mailman/listinfo/oberon" rel="noreferrer" target="_blank">https://lists.inf.ethz.ch/mailman/listinfo/oberon</a><br>

</div></div></blockquote></div><br></div>