[Oberon] oberonnet of things

Wed May 27 06:09:04 CEST 2015

> As I don't want to modify the existing behavior of the RISC5, my
> intention is to extend the IO addresses by one additional address,
> namely -24. Simply said: By writing to this address with
> "SYSTEM.PUT(-24, x)" I will put one byte on the Ethernet interface.
> When I read from this address with "SYSTEM.GET(-24, x)",
> I will get one byte from the Ethernet interface.
> The actual Oberon code to drive the wiz550io is a little more complex
> than that but to give you an idea.
>
> Now to make this happen I need some "HW glue logic" to map the Oberon
> address -24 to the underlying HW mechanism of the Pipistrello board
> (FPGA) and the wiz550io. This HW behavior is defined in Verilog; I hence
> adopt "RISC5Top.v" accordingly.

It is interesting to recall how Cypress achieved near full USB-2 speed in
their EZ-USB-FX devices, which combined an 8-bit 8051 and some
programmable logic. (They did not open up the programmable logic to the
users, and perhaps they implemented it with ASIC silicon, but it obviously
looks like a small FPGA-on-chip.) The 8051 runs at 48 MHz with 4 clocks
per instruction, which translates into 12 MIPS. The transfer rate was
about 30 MB/s despite this low CPU power.

The architecture employed zero-copy "end point buffers", which were filled
by the hardware. The buffer, after it was filled with USB data, was
switched to the "endpoint domain" and it became unaccessible to the CPU.
The buffer content was pumped via the USB channel, and then the buffer was
returned to the CPU domain. The CPU could either poll the status bit (not
recommended) or it could receive an interrupt when the buffer was return
for reuse by the CPU. There were multiple such buffers, four if I remember
correctly. The CPU could work on filling some buffers, while the others
were being transmitted by the programmable HW domain.

It is important that these buffers were NOT allocated from the general
purpose RAM. They were rather preassigned somewhere in silicon. Their
addresses were fixed in the memory space. It was obvious from the
documentation that the USB transfer was handled by a state machine in the
USB domain.

In the current picture such buffers would be crafted from BRAM, filled by
the RISC5, and then passed to the custody of a state machine that would
pump the data to the Wiznet chip. BTW, the Wiznet W5100 has both a SPI and
a byte-wide interface. The latter is obviously faster than SPI.

Cypress distributed a very neat embedded operating system for their EZ-USB
chips. They named it Frameworks" rather than OS, but in fact it was an OS,
with a task scheduler, interrupt handlers, and all the usual stuff. It was
small, very well written in C, and well documented. In a certain way it
was a gem of software design.

The lesson from Cypress design was that in order to achieve good
performance in hardware one needs to think in hardware. Transfering data
in and out of a buffer to a communication channel is a better task for a
state machine rather than a CPU. The software can never achieve the speed
that a state machine can run at. The keys to performance are: (1) working
with buffers rather than individual bytes; (2) delegating the transfers to
state machines; (3) having multiple parallel buffers in order to execute
various tasks in parallel with both the CPU and the state machines.

You can also say that Cypress implemented a multicore system with one
slower 8051 core, and one (or several, who knows how they did it) fast
cores dedicated to just sending and receiving the buffers. Inter-core
synchronization was done either with status bits polled by the slow CPU,
or with the interrupts executed by the slow CPU when buffers became
available.

It would not hurt if Cypress design was studied a bit more because it was
an example of good engineering and excellent documentation. The basic
principles can be employed in the RISC5 systems. Working along these
lines, one can also implement multicore RISC5 systems-on-chip.

I think that the above discussion can point towards the areas where RISC5
can offer unique advantages, namely joint hardware-software codesign. In
order to reach those areas one has to divide the tasks between both
hardware and CPU domains, just like Cypress did. A skilled engineer can
design embedded systems running in low-capacity FPGAs, somewhat similar to
the Cypress SoC which achieved sufficient performance despite using a
low-end CPU.

In order to reach the efficient codesign one has to part with an illusion
that the HW part can be implemented with just a few lines of portable LOLA
or Verilog. Efficient hardware cannot be casually implemented as a "glue
register logic". In modern FPGAs the efficiency can only be achieved when
one knows the underlying hardware resources which one is using.

Proof-of-principle HW can be implemented with generic HDL, but efficient
hardware cannot be built this way.

W.