Re: New vCPU instructions 2.0
Posted: 07 Jul 2021, 02:51
I just ran some experimented with a batch of indirect-indexed instructions.
The encoding is as follows:
PREFIX VAR OPCODE OFFSET
where PREFIX = $B1 (which is at67's PREFX1) and OPCODE is one of LD/LDW/ST/STW/ADDW/SUBW/ANDW/ORW/XORW. Instead of accessing a 16 bit variable at address OFFSET in page zero, these instructions now use [ [VAR] + OFFSET ]. This is useful in the C compiler to access local variables allocated on the stack, -- e.g, LDW([SP,offset]) -- and also to access fields in a structure pointed by a register variable, -- e.g., ANDW( [StructPtr, FieldOffset] ).
This comes at a cost of an additional 42-44 cycles which can be split in various ways (the PREFIX instruction does the full address calculation if it has enough time, otherwise it delegates the addition to a restart. Once the address is computed (stored in vLR), a final restart runs the actual instruction.) This overhead is quite good because it is the same as computing the address with LDI(offset);ADDW(var). The code size benefit is quite small with LD/LDW because one could do LDI(offset);ADDW(var);PEEK/DEEK() but much more significant with STW or ADDW because one replaces things like LDI(offset);ADDW(var);STW(tmpvar); <compute-something-in-vAC> ; DOKE(tmpvar) by a simple <compute-something-in-vAC> STW([var,offset]).
The total gain with the C compiler is about 3-5% extra reduction with respect to at67's new instruction set. This is smaller than I expected because the C compiler often finds a way to use DEEKA/DOKEA/DEEKV/DOKE relatively efficiently and aggressively promotes local variables to registers. When it fails to promote, it resorts to using stack variables in a manner that costs a lot of opcodes. So indirect-indexed addressing helps a lot there. But when the compiler works well, or when the programmer uses the keyword 'register' smartly, the gain is more limited.
Another question is the potential gain with respect to the v5a instruction set. Without the competition of DOKEA/DEEKA/DEEKV, the benefits of indirect-indexed addressing is a lot more obvious.
Overall I believe this is a good idea. The implementation might have to be refined. In particular I am not sure at67 would like the idea of completely taking over the PREFX1 instruction page for just 8 instructions. I need to sleep over this...
After mulling these results, I concluded that this would be a nice improvement over rom v5a, but a much less compelling one over at67's rom, once released.
The encoding is as follows:
PREFIX VAR OPCODE OFFSET
where PREFIX = $B1 (which is at67's PREFX1) and OPCODE is one of LD/LDW/ST/STW/ADDW/SUBW/ANDW/ORW/XORW. Instead of accessing a 16 bit variable at address OFFSET in page zero, these instructions now use [ [VAR] + OFFSET ]. This is useful in the C compiler to access local variables allocated on the stack, -- e.g, LDW([SP,offset]) -- and also to access fields in a structure pointed by a register variable, -- e.g., ANDW( [StructPtr, FieldOffset] ).
This comes at a cost of an additional 42-44 cycles which can be split in various ways (the PREFIX instruction does the full address calculation if it has enough time, otherwise it delegates the addition to a restart. Once the address is computed (stored in vLR), a final restart runs the actual instruction.) This overhead is quite good because it is the same as computing the address with LDI(offset);ADDW(var). The code size benefit is quite small with LD/LDW because one could do LDI(offset);ADDW(var);PEEK/DEEK() but much more significant with STW or ADDW because one replaces things like LDI(offset);ADDW(var);STW(tmpvar); <compute-something-in-vAC> ; DOKE(tmpvar) by a simple <compute-something-in-vAC> STW([var,offset]).
The total gain with the C compiler is about 3-5% extra reduction with respect to at67's new instruction set. This is smaller than I expected because the C compiler often finds a way to use DEEKA/DOKEA/DEEKV/DOKE relatively efficiently and aggressively promotes local variables to registers. When it fails to promote, it resorts to using stack variables in a manner that costs a lot of opcodes. So indirect-indexed addressing helps a lot there. But when the compiler works well, or when the programmer uses the keyword 'register' smartly, the gain is more limited.
Another question is the potential gain with respect to the v5a instruction set. Without the competition of DOKEA/DEEKA/DEEKV, the benefits of indirect-indexed addressing is a lot more obvious.
Overall I believe this is a good idea. The implementation might have to be refined. In particular I am not sure at67 would like the idea of completely taking over the PREFX1 instruction page for just 8 instructions. I need to sleep over this...
After mulling these results, I concluded that this would be a nice improvement over rom v5a, but a much less compelling one over at67's rom, once released.