[Oberon] oberonnet of things

Fri May 29 06:47:57 CEST 2015

Joerg:

W5550 uses a common approach of endpoint buffers internal to the chip.
Another such Ethernet chip is AX88180 from ASIX (Taiwan) which is a
gigabit MAC. This one has 32-bit memory bus interface. To my knowledge it
is the only GbE MAC chip that has a RAM-like interface. We use it in our
designs.

Wiznet is better than ASIX for the as-simple-as-possible approach because
it encapsulates the TCP/IP software, while ASIX requires the CPU to run te
SW stack.

Since both ASIX and Wiznet are external to the FPGA, the zero-copy would
only mean that within the FPGA you pass the buffers to a state machine
which then transfers the content to the external chip without further CPU
attention. It would relieve the CPU from using its own cycles to move the
data. It is the most efficient solution possible in this case.

Implementing either Wiznet or ASIX will consume quite a few pins because
both chips use parallel interfaces. Either one could be a candidate chip
for the Oberon-specific board. I would prefer Wiznet because it runs its
own TCP/IP inside.

Wojtek

> I agree, this zero-copy design is nice.
> Your approach would be doable if I decided for another Ethernet chip.
> If I see it correctly, the W5500 in the wiz550io comes with its own memory
> and irs own state machine.
> So, unfortunately I cannot go the zero-copy approach, as the RISC cannot
> access this memory directly :-(
>
> Br, Joerg
>
> Am 27.05.2015 um 06:09 schrieb <skulski at pas.rochester.edu>
> <skulski at pas.rochester.edu>:
>
>>> As I don't want to modify the existing behavior of the RISC5, my
>>> intention is to extend the IO addresses by one additional address,
>>> namely -24. Simply said: By writing to this address with
>>> "SYSTEM.PUT(-24, x)" I will put one byte on the Ethernet interface.
>>> When I read from this address with "SYSTEM.GET(-24, x)",
>>> I will get one byte from the Ethernet interface.
>>> The actual Oberon code to drive the wiz550io is a little more complex
>>> than that but to give you an idea.
>>>
>>> Now to make this happen I need some "HW glue logic" to map the Oberon
>>> address -24 to the underlying HW mechanism of the Pipistrello board
>>> (FPGA) and the wiz550io. This HW behavior is defined in Verilog; I
>>> hence
>>> adopt "RISC5Top.v" accordingly.
>>
>> It is interesting to recall how Cypress achieved near full USB-2 speed
>> in
>> their EZ-USB-FX devices, which combined an 8-bit 8051 and some
>> programmable logic. (They did not open up the programmable logic to the
>> users, and perhaps they implemented it with ASIC silicon, but it
>> obviously
>> looks like a small FPGA-on-chip.) The 8051 runs at 48 MHz with 4 clocks
>> per instruction, which translates into 12 MIPS. The transfer rate was
>> about 30 MB/s despite this low CPU power.
>>
>> The architecture employed zero-copy "end point buffers", which were
>> filled
>> by the hardware. The buffer, after it was filled with USB data, was
>> switched to the "endpoint domain" and it became unaccessible to the CPU.
>> The buffer content was pumped via the USB channel, and then the buffer
>> was
>> returned to the CPU domain. The CPU could either poll the status bit
>> (not
>> recommended) or it could receive an interrupt when the buffer was return
>> for reuse by the CPU. There were multiple such buffers, four if I
>> remember
>> correctly. The CPU could work on filling some buffers, while the others
>> were being transmitted by the programmable HW domain.
>>
>> It is important that these buffers were NOT allocated from the general
>> purpose RAM. They were rather preassigned somewhere in silicon. Their
>> addresses were fixed in the memory space. It was obvious from the
>> documentation that the USB transfer was handled by a state machine in
>> the
>> USB domain.
>>
>> In the current picture such buffers would be crafted from BRAM, filled
>> by
>> the RISC5, and then passed to the custody of a state machine that would
>> pump the data to the Wiznet chip. BTW, the Wiznet W5100 has both a SPI
>> and
>> a byte-wide interface. The latter is obviously faster than SPI.
>>
>> Cypress distributed a very neat embedded operating system for their
>> EZ-USB
>> chips. They named it Frameworks" rather than OS, but in fact it was an
>> OS,
>> with a task scheduler, interrupt handlers, and all the usual stuff. It
>> was
>> small, very well written in C, and well documented. In a certain way it
>> was a gem of software design.
>>
>> The lesson from Cypress design was that in order to achieve good
>> performance in hardware one needs to think in hardware. Transfering data
>> in and out of a buffer to a communication channel is a better task for a
>> state machine rather than a CPU. The software can never achieve the
>> speed
>> that a state machine can run at. The keys to performance are: (1)
>> working
>> with buffers rather than individual bytes; (2) delegating the transfers
>> to
>> state machines; (3) having multiple parallel buffers in order to execute
>> various tasks in parallel with both the CPU and the state machines.
>>
>> You can also say that Cypress implemented a multicore system with one
>> slower 8051 core, and one (or several, who knows how they did it) fast
>> cores dedicated to just sending and receiving the buffers. Inter-core
>> synchronization was done either with status bits polled by the slow CPU,
>> or with the interrupts executed by the slow CPU when buffers became
>> available.
>>
>> It would not hurt if Cypress design was studied a bit more because it
>> was
>> an example of good engineering and excellent documentation. The basic
>> principles can be employed in the RISC5 systems. Working along these
>> lines, one can also implement multicore RISC5 systems-on-chip.
>>
>> I think that the above discussion can point towards the areas where
>> RISC5
>> can offer unique advantages, namely joint hardware-software codesign. In
>> order to reach those areas one has to divide the tasks between both
>> hardware and CPU domains, just like Cypress did. A skilled engineer can
>> design embedded systems running in low-capacity FPGAs, somewhat similar
>> to
>> the Cypress SoC which achieved sufficient performance despite using a
>> low-end CPU.
>>
>> In order to reach the efficient codesign one has to part with an
>> illusion
>> that the HW part can be implemented with just a few lines of portable
>> LOLA
>> or Verilog. Efficient hardware cannot be casually implemented as a "glue
>> register logic". In modern FPGAs the efficiency can only be achieved
>> when
>> one knows the underlying hardware resources which one is using.
>>
>> Proof-of-principle HW can be implemented with generic HDL, but efficient
>> hardware cannot be built this way.
>>
>> W.
>>
>>
>> --
>> Oberon at lists.inf.ethz.ch mailing list for ETH Oberon and related systems
>> https://lists.inf.ethz.ch/mailman/listinfo/oberon
> --
> Oberon at lists.inf.ethz.ch mailing list for ETH Oberon and related systems
> https://lists.inf.ethz.ch/mailman/listinfo/oberon
>