Expanding ROM space
Posted: 14 May 2021, 23:38
I am not sure if this is a new idea on this forum, but it might be possible extend the ROM storage space by 50% or more. That means, according to my estimates, more than 30k bytes of extra room for applications in ROM.
As far as I understand, data is stored in ROM and LUP is the vCPU instruction responsible for fetching the data from ROM. It uses a trampoline mechanism (see the trampoline() definition in asm.py), which roughly works like this:
The double branch (the second one in the branch delay slot of the first) allows the instruction on address [pch,ac] to be executed and then immediately return to "continue_trampoline". Typically, the instruction on target address [pch, ac] will be of the form
where imm is an immediate constant. These instructions can encode 256 possible values (the immediate value is stored on the ac register). Note that we used 2 bytes in ROM (one for the instruction and one for the immediate) to store only 1 byte of information for the LUP.
However, it is possible to encode more information if one allows the use of different instructions. For example, one could use one of the following instructions instead
To detect whether it is ld imm,ac, ld imm,x, and ld imm,y, one could preload ac, x, and y with some sentinel values and detect if any of these registers changed. If no register changed, try with different sentinel values for x and y (one cannot chose the value of ac because ac holds the address of the load instruction at this point). That way we encode 256 * 3 = 768 possible values for a single LUP. In number of bits, it is log(768) which is approximately 9.58 bits of information. Note that the penalty here is a slower LUP, which might take twice as long since we might need to use the trampoline twice.
Approximately 59k entries/addresses of the ROM are dedicated for this type of storage. In other words, 118k bytes of ROM are used for LUP storage. If we can get 1.58 extra bits per byte stored, we would have 11k bytes of extra storage already!
But we can do better than this. If we also allow the following instructions in our encoding:
It is possible to encode 7 * 256 = 1792 possible values, which amounts to log(3840) ~ 10.8 bits of information. That way we can extend the storage space by 2.8 / 8 ~ 35% which amounts to 20k extra bytes.
One thing that I did not yet explain, is how to distinguish between these types of instructions (to completely recover the imm byte and the instruction itself). For that purpose, we need to be able to control the data on page 0 and on another page, say page 1, (in that case we need to preload the register y with this value).
So the algorithm to recover the instruction is as follows:
If we push this idea to its extreme, we can still add some more instructions to our encoding scheme:
In this case, we would have 25 * 256 = 6400 possible values, which amounts to log(6400) ~ 12.64 bits of information. That way we can extend the storage space by 4.64 / 8 ~ 58% which amounts to 34k extra bytes!
Note that I did not include suba imm, because it is impossible to distinguish it from adda imm (since for instance the effect of suba 2 is identical to the effect of adda 0xFE). In addition, I did not include the anda and ora operations, because depending on the value of ac (which we do not control), it might not be possible to extract the value of imm. Figuring out and recovering which instruction was stored in ROM might be a lot of work, since we have to analyze several possible cases and maybe execute the trampoline multiple times, but in theory, it is possible to do.
Maybe the sensible thing to do here is to to keep a fast version of LUP which returns only 8 bits of information per address, but also add a slow version LUP2, which might return more bits of information at the cost of being possibly much slower.
As far as I understand, data is stored in ROM and LUP is the vCPU instruction responsible for fetching the data from ROM. It uses a trampoline mechanism (see the trampoline() definition in asm.py), which roughly works like this:
Code: Select all
label("trampoline")
bra(ac)
bra("continue_trampoline")
label("continue_trampoline")
ld(hi("lupReturn#19"), Y)
jmp(Y, lo("lupReturn#19"))
st(lo("vAC"))
Code: Select all
ld imm
However, it is possible to encode more information if one allows the use of different instructions. For example, one could use one of the following instructions instead
Code: Select all
ld imm, x
ld imm,y
Approximately 59k entries/addresses of the ROM are dedicated for this type of storage. In other words, 118k bytes of ROM are used for LUP storage. If we can get 1.58 extra bits per byte stored, we would have 11k bytes of extra storage already!
But we can do better than this. If we also allow the following instructions in our encoding:
Code: Select all
ld [imm]
ld [y, imm]
ld [imm], x
ld [imm], y
One thing that I did not yet explain, is how to distinguish between these types of instructions (to completely recover the imm byte and the instruction itself). For that purpose, we need to be able to control the data on page 0 and on another page, say page 1, (in that case we need to preload the register y with this value).
So the algorithm to recover the instruction is as follows:
- First write bytes 0, 1, 2, ... , 255 in that order on page 0;
- Execute a trampoline, setting y = 0 and x = 0;
- If the value of ac changed, we know the instruction must be of one the following:
We also know the value of imm (it is the current value of ac). To distinguish between these 3 cases we first modify the byte at imm on page 0 to an arbitrary value imm' different than imm, and execute the trampoline again. If the value in ac changed to imm' we know the instruction is of the form ld imm'. Otherwise, we change the value at [1, imm] to some arbitrary value imm'' (different than imm), and set y = 1 and execute the trampoline again. Depending on the value of ac, we will be able to distinguish between ld imm, and ld [y, imm]. If the value of ac is imm, then we know that the instruction is of the form ld imm, otherwise ac will hold imm'' and the instruction is ld [y, imm]. In fact, instead of executing 2 more trampolines, we can do in only one by modifying the value at [imm] to imm' and the value at [1, imm] to imm'' simultaneously and executing the trampoline with y = 1 only once;
Code: Select all
ld imm ld [imm], ld [y, imm]
- If the value of x (or y) changed, we can do a simpler analysis: we know that the instruction is either:
We just set the value at imm on page 0 to any value imm' different than imm, and execute the trampoline again;
Code: Select all
ld imm, x ld [imm], x
- If none of these values (ac, x, and y) have changed, then we know the instruction must be of the following forms:
where addr is the address of the LUP. We can then try to modify the value at [0], or at [addr], or at [1, addr], to rule out some cases, or set x = 2 and y = 1 before the trampoline, to rule out the other cases. In fact, we can do all of this at the same time: set byte at [0] to some value, byte [addr] to some other value, and byte [1, addr] to yet another value, set x = 2 and set y = 1 and do a trampoline.
Code: Select all
ld addr ld [addr] ld [y, addr] ld 0, x ld [0], x ld 0, y ld [0], y
- If the value of ac changed, we know the instruction must be of one the following:
If we push this idea to its extreme, we can still add some more instructions to our encoding scheme:
Code: Select all
adda imm
xora imm
adda [imm]
suba [imm]
xora [imm]
adda [y, imm]
suba [y, imm]
xora [y, imm]
adda [imm], x
suba [imm], x
xora [imm], x
adda [imm], y
suba [imm], y
xora [imm], y
st [imm]
st [y, imm]
st [imm], x
st [imm], y
Note that I did not include suba imm, because it is impossible to distinguish it from adda imm (since for instance the effect of suba 2 is identical to the effect of adda 0xFE). In addition, I did not include the anda and ora operations, because depending on the value of ac (which we do not control), it might not be possible to extract the value of imm. Figuring out and recovering which instruction was stored in ROM might be a lot of work, since we have to analyze several possible cases and maybe execute the trampoline multiple times, but in theory, it is possible to do.
Maybe the sensible thing to do here is to to keep a fast version of LUP which returns only 8 bits of information per address, but also add a slow version LUP2, which might return more bits of information at the cost of being possibly much slower.