skulski at pas.rochester.edu
Fri Feb 15 22:42:31 CET 2019
Paul Reed [paulreed at paddedcell.com] wrote:
> But surely this all misses the point - two cycles are needed for a load
> and store because these operations require both an instruction fetch and a
> memory read/write in the same instruction, to/from the same memory. So
> that's why it seems to me that putting a cache in the way would just slow
> things down, not speed them up.
Harvard architecture? Imagine a system core consisting of the most commonly used code. Put the code into BRAM. Now imagine the most commonly used data in its own BRAM. This is Harvard. Now you are doing two things in one cycle.
Imagine that the "less commonly used" code and data is in external storage where you need two cycles. Do not fetch these. Do not move to the BRAM, because this would mean that BRAM became cache. Just leave the "less commonly used" stuff outside the core BRAM memories. A smart linker / loader would help, if it can identify the "more commonly used" and "less commonly used" code and data.
Am I not reinventing the L1 / L2 / L3 architecture? Blackfin works this way. It has some small and fast L1 RAM operating at the CPU speed. It has a much larger L2 operating at half the CPU speed. Both L1 and L2 are on chip. The external DRAM works as the L3 memory.
More information about the Oberon