[Oberon] RISC-5 and memory
magnus at saanlima.com
Thu Oct 5 19:04:57 CEST 2017
There are two different needs discussed here - more memory and color
A limited amount of memory can be added without any change to the RISC-5
architecture. I used a sell 2MB Pepino board well suited for Project
Oberon for $25 more than the 1MB version, and in principal an 8MB
SRAM-based RISC-5 board should be straight forward to design. However,
in my mind SRAM makes no sense from cost standpoint beyond 8MB. Note
that using non-SRAM memory will have huge impact on the RISC-5
architecture on many levels - there needs to be a fairly complex memory
controller implemented and both instruction and data caches needs to be
added to avoid performance loss, and the current deterministic behavior
of the RISC-5 system will be lost once you have cache-based system which
might have big impact on realtime systems.
Adding color display is a different story since you need both more
memory to store the pixels, and more memory bandwidth to pump out the
pixels to the screen. Today the RISC-5 system runs at 25 MHz so the
single cycle 32-bit SRAM memory can provide up to 100 MB/s of
bandwidth. The display pixel clock is 75 MHz and with 1 bit/pixel the
memory bandwidth requirement is 75 MHz/8 = 9.375 MB/s. This memory
bandwidth is "stolen" from the CPU by stalling it for one clock about 10
% of the time. Going to color with say 16 bits/pixel would increase the
memory bandwidth requirement to 150 MB/s (75 MHz * 2 bytes/pixel), which
is way more than the RISC-5 memory architecture can provide and will
force a radical architecture change to a memory system that can provide
much higher bandwidth. And then you will have the performance side
effect of having to render 16 bits/pixels vs rendering 1 bit/pixel so
there might be a need to increase the CPU clock rate from 25 MHz to
compensate for the performance loss. All this points to a completely
different architecture instead of RISC-5 if you truly want color. If
you want to stay in a Xilinx FPGA-based system that would mean a switch
to either a Microblaze or a MIPS-based soft-CPU architecture with a
high-performance memory system, which is readily available to use at
fairly low effort level, but comes at the expense of porting the
Oberon-07 system code to a different CPU architecture. See this forum
thread about running DOOM on Pipistrello, which shows the Pipistrello
board with 64 MB of LPDDR memory that implements a Microblaze 100 MHz
CPU with instruction and data caches and 16 bit/pixel video system that
could be a starting point for an Oberon-07 system implemented on an FPGA
board (sorry about the gun violence in the videos):
Then of course you could decide to ditch the FPGA requirement all
together and do it on one of the many ARM-based boards out there for
very little money.
On 10/4/2017 7:12 PM, Skulski, Wojciech wrote:
> I am not a HW guru. I just happen to design and build some of the most advanced FPGA-based boards in my corner of the research community, to the extent that some national labs are buying these. But it does not mean that my words should be taken as reference.
> My position is simple. I need color. So I assume that others (or some others) also need color. I need lots of RAM for buffering or histogramming my data. So I assume that others also need some more RAM, sometimes. I heard that more advanced Oberon Systems need more RAM than 1 meg to be implemented. The reason that we do not have System-3 or V4 or Component Pascal on RISC5 is not only strong opinions by some (this too), but simply lack of RAM. So I believe that, while your points are well taken, we need more RAM. Note that some newer commercial boards, like Arty from Digilent, provide lots of RAM, albeit the Arty design leaves a bit to be desired. (I am studying it right now.) Nevertheless, this board exists and is reasonably priced, especially for educational customers. So designing a board is not a necessary precondition for moving forward. This much said, a dedicated Oberon FPGA board would be of some advantage for the community.
> Now concerning the memories. I would consider the following technologies: (1) SRAM, (2) ZBT SRAM, (3) SDRAM, (4) DDR and LPDDR, and (5) DDRx SDRAM. These are listed in the order of increasing performance and also difficulty.
> It is sort of easy to interface SRAM. It is discussed in textbooks. (Either Pedroni or Chu, I forgot which one, devoted an entire chapter to SRAM.) ZBT SRAM is similar. It offers 2x performance. The ZBT interfaces are available from both Xilinx and Altera, as well as Open Cores. Then we enter SDRAM and here things get hairy because of the refresh cycles. I would not venture into this territory on my own. It is wiser to read about SDRAM interfacing. For example:
> If you wish I can also send you a Master Thesis by Mohammad Talal Bonny (131 pages) on SDRAM interfacing (Technische Universität Braunschweig 2002). The author is now an EE professor at the Sharjah University. I am sure he would be happy to offer some advice.
> Finally, we enter the DDR3 territory. An interesting story is here: opencores.org/project,wbddr3. A simple message seems to be "don't". Don't do it yourself. Use Xilinx Core Generator which will interface the hard silicon logic built into Xilinx FPGA banks. This is what Arty is doing. The performance will depend on the fine details of clocking inside the FPGA, as well as board layout. (Arty performance is sort of low because they did not provide the recommended reference voltage despite the Xilinx recommendation. I wonder why.)
> The bottom line: if you want to use high speed DDRx memories at close to their ratings, you cannot achieve it with LOLA-generated Verilog. You need to follow the FPGA manufacturer hard silicon solution, which is provided with their design tools. Do I need to say it is total mess? Yes it is. But there is no other route.
> Note that a well designed DDRx controller may offer a sort of a cache which will make it look a bit like SRAM from the CPU perspective. I also think that the RISC5 "stall" can remedy the short hiccups due to refresh cycles, but I have not studied this topic.
> Finally, the video controller which can suck the bitmaps from the SDRAM of any kind. I would start from here: opencores.org/project,vga_lcd. This project by Richard Herveille provides a 46-page manual which seems very well written. I have not studied the actual HDL yet.
> My conclusion is that LOLA-generated Verilog is a great proof of principle and very educational. But if we want to see V4, S3, or CP running on the FPGA (which are worthy goals!), then we need to tackle the subject matter with full repository of available tools and IP cores. Otherwise we will stay where we are now. We can consider building a board with lots of SRAM or ZBT RAM instead of SDRAM or DDRx, but this will get both expensive and unwieldy at more than a few megs.
> Just my two groszes. (A "grosz" is a Polish penny.)
> From: Oberon [oberon-bounces at lists.inf.ethz.ch] on behalf of Jörg [joerg.straube at iaeth.ch]
> Sent: Wednesday, October 4, 2017 5:25 PM
> To: ETH Oberon and related systems
> Subject: [Oberon] RISC-5 and memory
> some remarks on memory.
>> On the other hand, the one megabyte FPGA card just happened to be available. It is not the only possible solution even in the FPGA world. Let me compare memory prices to make it clear. Two 0.5 MB chips type IS61LV25616AL-10TL comprising the 1 megabyte cost $9.26 at DigiKey. A single 512 megabyte chip type AS4C256M16D3LB-12BCN costs $10.59. So we are looking at the cost effectiveness differing by orders of magnitude.
>> These are different technologies. One is simple to use, while the other is much harder. On the other hand, boards using the latter have been built. (For example, Arty from Digilent.) It is not that clear to me that the language definition and compiler technology should stay at the level of asynchronous SRAM rather than advance into the era of DDR3L.
> First let me start with the statement: I’m not the HW guru. So, I can be completely wrong.
> I took your arguments and started to investigate on the reasons why NW decided for SRAM instead of the much larger and cheaper SDRAM.
> The key thing in ProjectOberon is not the language or the compiler, the key topic is his own CPU, the "RISC-5“.
> As with the Oberon language, the Oberon OS and the Oberon compiler, NW seems to follow the same principle for the Oberon CPU: make it simple but not simpler.
> I studied the RISC-5 Verilog code and googled a bit because I was wondering how today’s CPU tackle the fact that SDRAM is MUCH slower than SRAM. I found that today's CPUs implement several optimization techniques, e.g. pipelining. But I think the fundamental point to overcome the low speed of SDRAM is that today’s architecture use caches. Either only L1 or a combo of L1 and L2 cache before they access slow SDRAM.
> To keep the CPU design simple, the RISC-5 does not implement neither a pipeline nor does it use a two stage cache approach to access RAM.
> I come to the conclusion that the whole SRAM in ProjectOberon can be seen as one big cache in today’s CPUs wording.
> Or in other words: We would have to add a cache strategy to RISC-5 environment to use SDRAM.
> When we did that, I have no clue whether we would then be forced to introduce special video RAM as well.
> I’d like to get feedback.
> Oberon at lists.inf.ethz.ch mailing list for ETH Oberon and related systems
> Oberon at lists.inf.ethz.ch mailing list for ETH Oberon and related systems
More information about the Oberon