I started experimenting with the idea of a 16bit vCPU stack pointer in the native code firmware about a week ago, the reason being that the BASIC compiler needs a proper stack for local variables, proc parameters and for recursion.
Currently there are 8 bytes of general purpose stack space, (for the BASIC compiler), used in the all important zero page ram which must be shared by nested proc's locals, params and recursion...as you can imagine, this is somewhat limiting and can make your code quite hard to write and debug when nesting procs with any of those features.
I found an unused location within zero page, (the reserved byte at 0x04), renamed it to vSPH and proceeded to add the following code to all stack aware instructions, PUSH, POP, LDLW, etc.
Code: Select all
ld(vSPH,Y)
ld([X]) --> ld([Y,X])
st([X]) --> st([Y,X])
Code: Select all
ld(0) #17
st([vSP]) #18 vSP
st([vSPH]) #19 vSPH <-- new instruction
This is the list of instructions I have created, tested and added to an experimental ROM as well as to the assembler and BASIC compiler:
Code: Select all
DEC ;(22 cycles), decrements a zero page variable's lower byte, borrow is ignored in the same way that INC ignores carry
DECW ;(28 cycles), decrements a zero page variable's 16bit value
INCW ;(26 cycles), increments a zero page variable's 16bit value
MOVQW ;(30 cycles), loads a literal, (0..255), as a 16bit value into a zero page variable
MOVQ ;(28 cycles), loads a literal, (0..255), as an 8bit value into a zero page variable
DBNZ ;(28 cycles), decrements and checks for zero on a zero page variable, branching if not zero
XCHG ;(30 cycles), exchanges bytes of any zero page variables
MOV ;(28 cycles), copies a byte from src to dst, where src and dst are zero page variables
MOVV ;(22 cycles), copies a byte from var to [vAC], where var is a zero page variable
MOVVW ;(30 cycles), copies a word from var to [vAC], where var is a zero page variable
MOVA ;(24 cycles), copies a byte from [vAC] to var, where var is a zero page variable
MOVAW ;(30 cycles), copies a word from [vAC] to var, where var is a zero page variable
NOTW ;(26 cycles), boolean inversion of any zero page variable
NEGW ;(28 cycles), arithmetic negate of any zero page variable
LSRB ;(28 cycles), logical shift right on any zero page byte
LSLV ;(26 cycles), logical shift left any zero page word variable
PEEKV ;(28 cycles), read byte from an address within any zero page variable
DEEKV ;(28 cycles), read word from an address within any zero page variable
ADDB ;(28 cycles), adds a literal, (0..255), to a zero page byte variable
SUBB ;(28 cycles), subtracts a literal, (0..255), from a zero page byte variable
PEEKX ;(30 cycles), read byte from an address within any zero page variable and increment that zero page variable
POKEX ;(28 cycles), write byte to an address within any zero page variable and increment that zero page variable
POKEI ;(20 cycles), write an immediate byte, (0..255), to an address contained in [vAC]
DOKEI ;(28 cycles), write an immediate word, (-32768..32767), to an address contained in [vAC]
TEQ ;(28 cycles), tests a zero page variable for EQ
TNE ;(28 cycles), tests a zero page variable for NE
TGE ;(26 cycles), tests a zero page variable for GE
TLT ;(26 cycles), tests a zero page variable for LT
TGT ;(28 cycles), tests a zero page variable for GT
TLE ;(28 cycles), tests a zero page variable for LE
Code: Select all
LDW ;(20 cycles) --> (24 cycles)
STW ;(20 cycles) --> (24 cycles)
ADDW ;(28 cycles) --> (32 cycles) --> (28 cycles)
CALL ;(26 cycles) --> (30 cycles)
POP ;(26 cycles) --> (30 cycles)
PUSH ;(26 cycles) --> (30 cycles)
LDWI ;(20 cycles) --> (24 cycles)
ST ;(16 cycles) --> (20 cycles)
LDI ;(16 cycles) --> (20 cycles)
ANDW ;(28 cycles) --> (26 cycles)
ORW ;(28 cycles) --> (26 cycles)
ADDI ;(28 cycles) --> (26 cycles)
SUBI ;(28 cycles) --> (26 cycles)
POKE ;(28 cycles) --> (26 cycles)
ANDI ;(22 cycles) --> (20 cycles)
INC ;(20 cycles) --> (16 cycles) --> (20 cycles)
LD ;(22 cycles) --> (18 cycles) --> (22 cycles) --> (18 cycles)
It was a lot of work unraveling, re-organising the vCPU interpreter and then optimising the old instructions, (the ones that I could), and creating the new instructions. Marcel had prioritised speed and ROM space when coding the original part of this firmware, I prioritised increased instruction slots over all else and it rather surprisingly allowed for some free optimisations in the old code as well. Currently there are between 6 and 9 instruction slots free, so if anyone has suggestions for new vCPU instructions, please feel free to add them to this thread.
P.S. You may note I have also modified the vCPU maxTicks count from 28 to 30, this allows a lot of instructions to be created that wouldn't have been possible at all otherwise. The execution effects that maxTicks=30 compared to maxTicks=28 has had on any code I have thrown at it, has been zero. Higher values start to have a more dominant effect, e.g. 32 reduces code speed by around 10%, 34 by about 15%.