[Oberon] Wojtek's comment

Thu Aug 27 16:52:29 CEST 2015

Juerg:
> 1) sometimes it is necessary to put data in fast memory
> (I think you mean BRAM)
> 2) you would like to process your ADC data (stored in BRAM)
>     in Oberon.

Let me make it clear for those in the audience who are new to the FPGA
design. BRAM is a dual port memory. It has many uses which I can discuss
if there is interest. Here we are talking of data acquisition (DAQ) with a
few nanoseconds per data sample. A typical high performance DAQ design
uses BRAM port A to write data from the converter, and it uses port B to
access the data when the DAQ stops. (Think of a digital oscilloscope.) The
data needs to be accessed in situ. Copying it to slow memory would render
the instrument slow and unusable.

The bottom line: there is need to map a BRAM to the processor's memory
space. This has been discussed by Jan (thank you, Jan!). His prescription
is low level. OK, let be it. But let me make it clear that moving the data
is not permitted. Accessing the data without moving is necessary.

Furthermore, the code which implements the data processing needs to be
executed from BRAM as well. (It will be a different block of BRAM.) I
think that RISC5 can run at some 100+ MHz when it executes from BRAM. We
have looked at the RISC5 Verilog and we see that it needs to be reworked
in this direction. Hopefully it can reach 100+ MHz from BRAM. Then the
slow code can be put into off-chip SRAM and we can use the stall signal to
slow the processor down to match the SRAM.

>From this discussion it follows that we need to map both the variables and
the code to particular addresses. The mechanism for doing this needs to be
specified somehow. Of course we can take the Oberon System and hack it,
but this would be ugly. I am hoping that a sort of agreed-upon approach
will emerge from this discussion.

> Oberon works with what you call slow memory. So if you want to process
your fast data with the slow Oberon, why not put the data in the slow
memory in the first place?

Performance will be unacceptable. In our design we will use "ping-pong
buffers" implemented in BRAM. When one BRAM buffer acquires the data, the
previous data from the other buffer is being processed in situ.

> - Jan proposes a copy approach BRAM to SRAM

Unacceptable for the performance reasons.

> and then process by Oberon - Chris proposes a mapped memory aaproach.

This is the way.

> To understand the Oberon memory layout have a look at figure 8.1 in
chapter 8 of project oberon.
> You see basically that the Oberon system splits memory in four blocks A
memory for module code
> B memory for procedure variables (called stack)
> C memory for dynamic variables allocated with NEW (called heap)
> D memory for IO (display frame buffer and IO registers)

We need to statically allocate variables to HW addresses. Would be really
nice to have this facility for both the scalar variables (aka "registers")
and arrays which will be overlaid over BRAM blocks.

Furthermore, we need the facility to specify that certain code is executed
from BRAM to gain performance. This can be specified per module to keep
with the modular spirit of the language. A finer granularity would be
complicated and not necessary.

Here we are talking of a factor 5x in performance between BRAM and
external SRAM. This is not a small optimization. It is quite crucial.

At present there is the single MODULE* which is locked into BRAM, so we
are almost there. I hope that we can use the MODULE* as the "system
library" of sorts, where the performance-critical code will be put and
called by the "slow code" that lives in SRAM. However, it is a hack. It
would be really nice if the memory allocation could be specified for
regular modules as well, on a per-module basis.

> This layout is flexibly established during booting by two constants in
the
> boot record, called "heapOrg" and "heapLimit".
> A starts at 0 and grows upwards
> B starts at heapOrg and grows downwards
> C starts at heapOrg and grows upward til heapLimit
> D starts at heapOrg+heapLimit
> Now, I think you don't like Jan's copy approach. So you could do the
following memory map approach: Reduce "heapLimit" and map the memory
range
> just below the VGA framebuffer to your BRAM (this mapping has to be done
in Verilog)
> You can declare in Oberon a safe TYPE to your ADC data (e.g. POINTER TO
ARRAY 256 OF BYTE) and allocate the start address of your mapped BRAM
memory to a pointer variable.

I think this is all good. It is a bit of a hack put on top the classic
design, but it looks workable to me. Thank you for the suggestions.

Jan wrote:

>> But we can do without a version for every toy about town

I am discussing features that may seem exotic to a computer scientist.
These "toys" are not toys for an electrical engineer working with FPGAs.
They are in fact fundamental in the realm of FPGA design. My point is that
the FPGA Oberon System is running in the FPGA. It would be good to know
how the FPGA Oberon can help using the FPGA to the fullest extent. If it
does  then it will become a much more attractive tool.

Thank you,
Wojtek