I started experimenting with the idea of a 16bit vCPU stack pointer in the native code firmware about a week ago, the reason being that the BASIC compiler needs a proper stack for local variables, proc parameters and for recursion.

Currently there are 8 bytes of general purpose stack space, (for the BASIC compiler), used in the all important zero page ram which must be shared by nested proc's locals, params and recursion...as you can imagine, this is somewhat limiting and can make your code quite hard to write and debug when nesting procs with any of those features.

I found an unused location within zero page, (the reserved byte at 0x04), renamed it to vSPH and proceeded to add the following code to all stack aware instructions, PUSH, POP, LDLW, etc.

Code: Select all

```
ld(vSPH,Y)
ld([X]) --> ld([Y,X])
st([X]) --> st([Y,X])
```

Code: Select all

```
ld(0) #17
st([vSP]) #18 vSP
st([vSPH]) #19 vSPH <-- new instruction
```

This is the list of instructions I have created, tested and added to an experimental ROM as well as to the assembler and BASIC compiler:

Code: Select all

```
DEC ;(22 cycles), decrements a zero page variable's lower byte, borrow is ignored in the same way that INC ignores carry
DECW ;(26 cycles), decrements a zero page variable's 16bit value
INCW ;(26 cycles), increments a zero page variable's 16bit value
LDWQ ;(30 cycles), loads a literal 0..255 as a 16bit value into a zero page variable
LDQ ;(26 cycles), loads a literal 0..255 as an 8bit value into a zero page variable
DBNZ ;(30 cycles), decrements and checks for zero on a zero page variable, branching if not zero
XCHG ;(28 cycles), exchanges bytes of any zero page variables
MOVB ;(28 cycles), copies a byte from [src] to [dst], where src and dst are zero page variables
MOVBA ;(30 cycles), copies a byte from [[src]] to [[vAC]], where src is a zero page variable containing a src pointer and vAC contains a dst pointer
NOTW ;(26 cycles), boolean inversion of any zero page variable
NEGW ;(30 cycles), arithmetic negate of any zero page variable
LSRB ;(28 cycles), logical shift right on any zero page byte
LSLV ;(30 cycles), logical shift left any zero page word variable
PEEKV ;(28 cycles), read byte from any zero page variable
ADDB ;(28 cycles), adds a literal 0..255 to a zero page byte variable
SUBB ;(28 cycles), subtracts a literal 0..255 from a zero page byte variable
TEQ ;(28 cycles), tests a zero page variable for EQ
TNE ;(28 cycles), tests a zero page variable for NE
TGE ;(26 cycles), tests a zero page variable for GE
TLT ;(26 cycles), tests a zero page variable for LT
TGT ;(28 cycles), tests a zero page variable for GT
TLE ;(28 cycles), tests a zero page variable for LE
```

Code: Select all

```
ADDW ;(28 cycles) --> (32 cycles) --> (28 cycles)
CALL ;(26 cycles) --> (30 cycles)
POP ;(26 cycles) --> (30 cycles)
PUSH ;(26 cycles) --> (30 cycles)
LDWI ;(20 cycles) --> (24 cycles)
ST ;(16 cycles) --> (20 cycles)
LDI ;(16 cycles) --> (20 cycles)
ANDW ;(28 cycles) --> (26 cycles)
ORW ;(28 cycles) --> (26 cycles)
ADDI ;(28 cycles) --> (26 cycles)
SUBI ;(28 cycles) --> (26 cycles)
POKE ;(28 cycles) --> (26 cycles)
ANDI ;(22 cycles) --> (20 cycles)
INC ;(20 cycles) --> (16 cycles) --> (20 cycles)
LD ;(22 cycles) --> (18 cycles) --> (22 cycles) --> (18 cycles)
```

It was a lot of work unraveling, re-organising the vCPU interpreter and then optimising the old instructions, (the ones that I could), and creating the new instructions. Marcel had prioritised speed and ROM space when coding the original part of this firmware, I prioritised increased instruction slots over all else and it rather surprisingly allowed for some free optimisations in the old code as well. Currently there are between 6 and 9 instruction slots free, so if anyone has suggestions for new vCPU instructions, please feel free to add them to this thread.

P.S. You may note I have also modified the vCPU maxTicks count from 28 to 30, this allows a lot of instructions to be created that wouldn't have been possible at all otherwise. The execution effects that maxTicks=30 compared to maxTicks=28 has had on any code I have thrown at it, has been zero. Higher values start to have a more dominant effect, e.g. 32 reduces code speed by around 10%, 34 by about 15%.