<div dir="ltr">Skulski, good day to you.<div><br><div>before I address some issues you raise I want to ask you a question:</div><div>I seem to think that you very much need some kind of a real time response out of the software.</div><div>Where does the Oberon garbage collection fit into your scheme of things? I takes quite some time you know (relatively speaking).</div><div><br></div><div>Otherwise let me think a bit, I am sure things can be arranged. The runtime will need to be chopped perhaps but that is reasonably normal in this type of setup. But I would definitely balk at hacking the compiler, although it has been done in the past.</div><div><br></div><div>> These "toys" are not toys for an electrical engineer<br></div><div><br></div><div>Well I have been around, nothing human is a big surprise these days. And I just spend a few months arranging Ada runtimes that _did_ affect the way the language behaves. since that is the only way to do certain things in that language. </div><div>For the time being, unless convinced otherwise, I do not think we need to do that here. there will be a much more elegant solution.</div><div><br></div><div>j.</div><div><br></div><div>"No, no, you're not thinking; you're just being logical." - Niels Bohr.<br></div><div><br></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Aug 27, 2015 at 4:52 PM, <span dir="ltr"><<a href="mailto:skulski@pas.rochester.edu" target="_blank">skulski@pas.rochester.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Juerg:<br>
<span class="">> 1) sometimes it is necessary to put data in fast memory<br>
> (I think you mean BRAM)<br>
</span>> 2) you would like to process your ADC data (stored in BRAM)<br>
> in Oberon.<br>
<br>
Let me make it clear for those in the audience who are new to the FPGA<br>
design. BRAM is a dual port memory. It has many uses which I can discuss<br>
if there is interest. Here we are talking of data acquisition (DAQ) with a<br>
few nanoseconds per data sample. A typical high performance DAQ design<br>
uses BRAM port A to write data from the converter, and it uses port B to<br>
access the data when the DAQ stops. (Think of a digital oscilloscope.) The<br>
data needs to be accessed in situ. Copying it to slow memory would render<br>
the instrument slow and unusable.<br>
<br>
The bottom line: there is need to map a BRAM to the processor's memory<br>
space. This has been discussed by Jan (thank you, Jan!). His prescription<br>
is low level. OK, let be it. But let me make it clear that moving the data<br>
is not permitted. Accessing the data without moving is necessary.<br>
<br>
Furthermore, the code which implements the data processing needs to be<br>
executed from BRAM as well. (It will be a different block of BRAM.) I<br>
think that RISC5 can run at some 100+ MHz when it executes from BRAM. We<br>
have looked at the RISC5 Verilog and we see that it needs to be reworked<br>
in this direction. Hopefully it can reach 100+ MHz from BRAM. Then the<br>
slow code can be put into off-chip SRAM and we can use the stall signal to<br>
slow the processor down to match the SRAM.<br>
<br>
>From this discussion it follows that we need to map both the variables and<br>
the code to particular addresses. The mechanism for doing this needs to be<br>
specified somehow. Of course we can take the Oberon System and hack it,<br>
but this would be ugly. I am hoping that a sort of agreed-upon approach<br>
will emerge from this discussion.<br>
<span class=""><br>
<br>
> Oberon works with what you call slow memory. So if you want to process<br>
your fast data with the slow Oberon, why not put the data in the slow<br>
memory in the first place?<br>
<br>
</span>Performance will be unacceptable. In our design we will use "ping-pong<br>
buffers" implemented in BRAM. When one BRAM buffer acquires the data, the<br>
previous data from the other buffer is being processed in situ.<br>
<span class=""><br>
> - Jan proposes a copy approach BRAM to SRAM<br>
<br>
</span>Unacceptable for the performance reasons.<br>
<span class=""><br>
> and then process by Oberon - Chris proposes a mapped memory aaproach.<br>
<br>
</span>This is the way.<br>
<span class=""><br>
<br>
> To understand the Oberon memory layout have a look at figure 8.1 in<br>
chapter 8 of project oberon.<br>
> You see basically that the Oberon system splits memory in four blocks A<br>
memory for module code<br>
> B memory for procedure variables (called stack)<br>
> C memory for dynamic variables allocated with NEW (called heap)<br>
> D memory for IO (display frame buffer and IO registers)<br>
<br>
</span>We need to statically allocate variables to HW addresses. Would be really<br>
nice to have this facility for both the scalar variables (aka "registers")<br>
and arrays which will be overlaid over BRAM blocks.<br>
<br>
Furthermore, we need the facility to specify that certain code is executed<br>
from BRAM to gain performance. This can be specified per module to keep<br>
with the modular spirit of the language. A finer granularity would be<br>
complicated and not necessary.<br>
<br>
Here we are talking of a factor 5x in performance between BRAM and<br>
external SRAM. This is not a small optimization. It is quite crucial.<br>
<br>
At present there is the single MODULE* which is locked into BRAM, so we<br>
are almost there. I hope that we can use the MODULE* as the "system<br>
library" of sorts, where the performance-critical code will be put and<br>
called by the "slow code" that lives in SRAM. However, it is a hack. It<br>
would be really nice if the memory allocation could be specified for<br>
regular modules as well, on a per-module basis.<br>
<span class=""><br>
<br>
> This layout is flexibly established during booting by two constants in<br>
the<br>
> boot record, called "heapOrg" and "heapLimit".<br>
> A starts at 0 and grows upwards<br>
> B starts at heapOrg and grows downwards<br>
> C starts at heapOrg and grows upward til heapLimit<br>
> D starts at heapOrg+heapLimit<br>
> Now, I think you don't like Jan's copy approach. So you could do the<br>
following memory map approach: Reduce "heapLimit" and map the memory<br>
range<br>
> just below the VGA framebuffer to your BRAM (this mapping has to be done<br>
in Verilog)<br>
> You can declare in Oberon a safe TYPE to your ADC data (e.g. POINTER TO<br>
ARRAY 256 OF BYTE) and allocate the start address of your mapped BRAM<br>
memory to a pointer variable.<br>
<br>
</span>I think this is all good. It is a bit of a hack put on top the classic<br>
design, but it looks workable to me. Thank you for the suggestions.<br>
<span class=""><br>
Jan wrote:<br>
<br>
>> But we can do without a version for every toy about town<br>
<br>
</span>I am discussing features that may seem exotic to a computer scientist.<br>
These "toys" are not toys for an electrical engineer working with FPGAs.<br>
They are in fact fundamental in the realm of FPGA design. My point is that<br>
the FPGA Oberon System is running in the FPGA. It would be good to know<br>
how the FPGA Oberon can help using the FPGA to the fullest extent. If it<br>
does then it will become a much more attractive tool.<br>
<br>
Thank you,<br>
<div class="HOEnZb"><div class="h5">Wojtek<br>
<br>
<br>
--<br>
<a href="mailto:Oberon@lists.inf.ethz.ch">Oberon@lists.inf.ethz.ch</a> mailing list for ETH Oberon and related systems<br>
<a href="https://lists.inf.ethz.ch/mailman/listinfo/oberon" rel="noreferrer" target="_blank">https://lists.inf.ethz.ch/mailman/listinfo/oberon</a><br>
</div></div></blockquote></div><br></div>