Reading from ROM, LUP and trampolines

marcelk · Post by **marcelk** » 20 Mar 2020, 07:11

pdr0663 wrote: ↑19 Mar 2020, 23:29 Can you perhaps explain trampolining a little...?

Let's take some steps back first before jumping to the trampolines. Originally in the breadboard computer the [Y,X++] mode was supposed to serve two purposes:

Purpose 1. Streaming pixels to OUT

Code: Select all

ld [Y,X++],OUT
ld [Y,X++],OUT
ld [Y,X++],OUT
...

Or, for robustness it is better to force the sync bits 6:7 to one with

Code: Select all

ld $c0
ora [Y,X++],OUT
ora [Y,X++],OUT
ora [Y,X++],OUT
...

We still do the pixel burst in this way.

[
BTW, it helps to think of the st, ld, etc operations as being data-bus centric instead of AC-centric. Many other processors have AC-centric mnemonics, but it doesn't help to think that way here because the Gigatron really isn't like those architectures.

ld pulls data from the data bus through the ALU without further operation such as +, -, ...
ora pulls data from the data bus and logical-ORs it with AC (hence the 'a' in the mnemonic!)
st writes data from the data bus into RAM.

]

Purpose 2. Transfer data from ROM to RAM

Since we have a Harvard architecture, we risked that storing embedded data became horribly inefficient. In a naive design you need two instructions to transfer a single byte:

Code: Select all

st  $ff,[register]
inc register

That's 32 bits of ROM space for a payload of 8 bits, or 25% packing efficiency. That's fine for small text-based computer such as the Nibbler, but it explodes in your face if you need to embed graphics data. So we needed something better. In very early HaD logs we still had a `STIX' operation in mind that would use the ALU to increment X after store. But stix didn't work out and we arrived at the current situation where X itself is a counter:

Code: Select all

st $de,[Y,X++]
st $ad,[Y,X++]
st $be,[Y,X++]
st $ef,[Y,X++]
...

This gives 50% packing efficiency and is very fast. You can't expect much better than that without complicating things a lot further. We use this in several places. For example, it is a great way to put a small vCPU routine in memory before executing it.

The LUP instruction

We discovered the trampolines in a much later phase, when writing ROM v1 for the kit edition and implementing vCPU. LUP (`lookup') is a vCPU instruction for arbitrary byte reading from ROM. You put a ROM address in vAC (we're in 16-bits world now), and LUP will load the value of ROM[vAC] in the lower byte of vAC (and clears the higher byte).

It was a complete surprise to us that anything like this was possible at all in our Harvard design. But it works by using a property of the pipelining and the branch delay slot: the single-instruction subroutine for making compact data tables in ROM. The basics of this are explained in Docs/Pipelining.txt and also in the HaD log. I won't repeat it here. Frans Faase also wrote about it in his online diary.

LUP isn't 100.0% random access: it can only read data byte offset 0...250 for ROM pages that have a piece of trampoline code at offset 251. LUP jumps to this offset with the desired page offset 0..250 in AC. The trampoline sequence itself is a bit tricky, as it relies on pipelining effects for jumping to the right byte in the page, hopping back into the trampoline and then jumping back into vCPU.

Code: Select all

              0afb fe00  bra  ac          ;+-----------------------------------+
              0afc fcfd  bra  $0afd       ;|                                   |
              0afd 1404  ld   $04,y       ;| Trampoline for page $0a00 lookups |
              0afe e065  jmp  y,$65       ;|                                   |
              0aff c218  st   [$18]       ;+-----------------------------------+

Overall, with 26 cycles LUP is slower than the 'Purpose 2' method above. We still have 50% packing efficiency. But it allows random ROM access from vCPU-land. And because its runtime is predictable, most (not all) ROM files are stored in this format. With this we can happily load arbitrary data into RAM without disturbing any audio or video signals. Our ROM-to-RAM loader `SYS_Exec_88' is built around LUP. (And incidentally to confuse things more: SYS_Exec_88 starts by putting a bit of vCPU code on the stack using the 'Purpose 2' method... SYS_Exec_88 is really a vCPU routine disguised as a SYS extension).

pdr0663 · Post by **pdr0663** » 20 Mar 2020, 08:26

Marcel,

Thanks for the comprehensive and informative reply, much appreciated.

I might leave trampolining alone for the time being, as STORE D, [Y:X++] serves my needs for ROM to RAM transfer, and I'll have room in RAM for LUTs I need.

The trampolining is a truly insightful process, a credit to some very clever people who've embraced the Gigatron.

Paul

tocksin · Post by **tocksin** » 31 Mar 2020, 12:21

Even though I understood the post-branch instruction execution, I never quite understood the purpose of the trampoline function in the assembler. Thanks for the explanation.

Gigatron Hackers

Reading from ROM, LUP and trampolines

Reading from ROM, LUP and trampolines

Re: Reading from ROM, LUP and trampolines

Re: Reading from ROM, LUP and trampolines