I understand the part about the prefix opcode, what I don't understand is why you can't do this:
Now that we are in a new page, (after running PREFIX/SETIP), we have a duplicate of the dispatch code, so 'NEXTY", 'NEXT', and 'EXIT', (without redefining the labels, as they are already defined in page3 and we only access their low bytes so that we can correctly branch within any page):
Unless I am missing something there is no state to be saved or restored, (apart from vCpuSelect for vertical blank interrupts), you could in fact have two versions of SETIP, one for when interrupts are enabled, SETIPI 16 cycles, and one for when interrupts are disabled, SETIP 14 cycles, (which the compiler/programmer could choose depending on their use case).
I haven't tested this code and haven't 100% thought through the entire process, so I may have missed something crucial, please let me know if I did.
I don't follow how you can change maxTicks in any meaningful way for longer instructions, 'maxTicks' is a global definition that the runVcpu macro, (and a bunch of other code and macros use), to define a maximum slot size limit. Once defined within the source code it can't be changed again after compile time.
I experimented with values 28, 30 and 32, obviously 28 was the original value and didn't allow for crucial instructions such as 'DEEKX', 30 allows for these crucial instructions to exist and incurs about a 5% overall performance penalty when using 28 as a baseline. But because I was able to move most of the instructions out of page3 into other pages and re-code them taking advantage of the copious amounts of ROM space available, some of the original instructions decreased in cycle count and therefore 30 cycle compared to 28 cycle execution is statistically within +/- 2% on all the applications I tested.
32 cycles on the other hand incurs about a 15% performance penalty across the board and even though 32 allows even more complex instructions to exist I deemed the extra functionality not worth the performance hit.
So I am interested in exactly how you would go about increasing 'maxTicks' for some instructions given the above.
I did implement an indirection table in page 3 as my first attempt, it wasn't video cycle error free, but it did perform the task required and what I found was that instruction cycle times ballooned out by an extra 16-20 cycles. It would be interesting to revisit this at some stage and see if it could be done more efficiently.
I actually didn't have to perform any gymnastics in page3 to move instructions to other pages, what I actually had to do was unravel Marcel's magnificent gymnastics and then rewrite all the instructions knowing I had vast amounts of ROM space to play with.
i.e. Marcel originally wrote the code balancing these 3 constraints, page3 byte usage, instruction execution time and number of instructions; this resulted in him producing some amazing code that satisfied all 3 constraints about as well as any Earthly programmer could have probably achieved IMHO.
I on the other hand decided to only prioritise number of instruction slots, so I had to painstakingly unravel Marcel's code, provide simple launch-pad's into other pages of memory for each old and new instruction and then re-code the old instructions using the advantage of massive amounts of ROM space, as you know this led to some old instructions actually executing more quickly. But my implementations of all the old instructions are usually 50% to 100% bigger in byte size compared to Marcel's versions.
Here is what my page3 now looks like, you'll notice there are only 6 instructions that still fully exist within page3, ADDW, SUBW, LUP, SYS, XORI and BRA, (RET doesn't count as it starts at 0x03FF and spills into page 4).
P.S. native bugs are now trivial to fix as the implementation code for each vCPU instruction is simpler to understand, (it's usually just sequential code with multiple paths for each branch), and has no start or size constraints
Code: Select all
# pc = 0x0311, Opcode = 0x11
# Instruction LDWI: Load immediate word constant (vAC=D), 24 cycles
label('LDWI')
ld(hi('ldwi#13'),Y) #10
jmp(Y,'ldwi#13') #11
ld([vPC+1],Y) #12
# pc = 0x0314, Opcode = 0x14
# Instruction DEC: Decrement byte var ([D]--), 22 cycles
label('DEC')
ld(hi('dec#13'),Y) #10
jmp(Y,'dec#13') #11
#dummy #12 Overlap
#
# pc = 0x0316, Opcode = 0x16
# Instruction MOVQ: Load a byte var with a small constant 0..255, 28 cycles
label('MOVQ')
ld(hi('movq#13'),Y) #10 #12
jmp(Y,'movq#13') #11
#dummy #12 Overlap
#
# pc = 0x0318, Opcode = 0x18
# Instruction LSRB: Logical shift right on a byte var, 28 cycles
label('LSRB')
ld(hi('lsrb#13'),Y) #10 #12
jmp(Y,'lsrb#13') #11
#dummy #12 Overlap
# pc = 0x031a, Opcode = 0x1a
# Instruction LD: Load byte from zero page (vAC=[D]), 22 cycles
label('LD')
ld(hi('ld#13'),Y) #10 #12
jmp(Y,'ld#13') #11
#dummy #12 Overlap
# pc = 0x031c, Opcode = 0x1c
# Instruction SEXT: Sign extend vAC based on a variable mask, 28 cycles
label('SEXT')
ld(hi('sext#13'),Y) #10, #12
jmp(Y,'sext#13') #11
st([vTmp]) #12 sign mask
# pc = 0x031f, Opcode = 0x1f
# Instruction CMPHS: Adjust high byte for signed compare (vACH=XXX), 28 cycles
label('CMPHS_v5')
ld(hi('cmphs#13'),Y) #10
jmp(Y,'cmphs#13') #11
#dummy #12 Overlap, not dependent on ld(AC,X) anymore
# pc = 0x0321, Opcode = 0x21
# Instruction LDW: Load word from zero page (vAC=[D]+256*[D+1]), 24 cycles
label('LDW')
ld(hi('ldw#13'),Y) #10
jmp(Y,'ldw#13') #11
#dummy #12 Overlap
#
# pc = 0x0323, Opcode = 0x23
# Instruction PEEKX: Peek byte at address contained in var, inc var, 30 cycles
label('PEEKX')
ld(hi('peekx#13'),Y) #10 #12
jmp(Y,'peekx#13') #11
#dummy #12 Overlap
#
# pc = 0x0325, Opcode = 0x25
# Instruction POKEI: Poke immediate byte into address contained in [vAC], 20 cycles
label('POKEI')
ld(hi('pokei#13'),Y) #10 #12
jmp(Y,'pokei#13') #11
#dummy #12 Overlap
#
# pc = 0x0327, Opcode = 0x27
# Instruction LSLV: Logical shift left word var, 28 cycles
label('LSLV')
ld(hi('lslv#13'),Y) #10 #12
jmp(Y,'lslv#13') #11
#dummy #12 Overlap
#
# pc = 0x0329, Opcode = 0x29
# Instruction ADDBA: vAC += var.lo, 28 cycles
label('ADDBA')
ld(hi('addba#13'),Y) #10 #12
jmp(Y,'addba#13') #11
#dummy #12 Overlap
#
# pc = 0x032b, Opcode = 0x2b
# Instruction STW: Store word in zero page ([D],[D+1]=vAC&255,vAC>>8), 24 cycles
label('STW')
ld(hi('stw#13'),Y) #10 #12
jmp(Y,'stw#13') #11
#dummy #12 Overlap
#
# pc = 0x032d, Opcode = 0x2d
# Instruction ADDBI: Add a constant 0..255 to byte var, 28 cycles
label('ADDBI')
ld(hi('addbi#13'),Y) #10 #12
jmp(Y,'addbi#13') #11
#dummy #12 Overlap
#
# pc = 0x032f, Opcode = 0x2f
# Instruction XCHG: Exchange byte of [vAC] and [var], 28 cycles
label('XCHG')
ld(hi('xchg#13'),Y) #10 #12
jmp(Y,'xchg#13') #11
ld([vPC+1],Y) #12
#
# pc = 0x0332, Opcode = 0x32
# Instruction DBNZ: Decrement byte var and branch if not zero then 26 cycles, 28 cycles on zero
label('DBNZ')
ld(hi('dbnz#13'),Y) #10
jmp(Y,'dbnz#13') #11
ld([vPC+1],Y) #12 vPC.hi
#
# pc = 0x0335, Opcode = 0x35
# Instruction BCC: Test AC sign and branch conditionally, variable, (24-26), cycles
label('BCC')
bra(AC) #10 AC is the conditional operand
st([Y,Xpp]) #11 X++
# pc = 0x0337, Opcode = 0x37
# Instruction DOKEI: Doke immediate word into address contained in [vAC], 30 cycles
label('DOKEI')
ld(hi('dokei#13'),Y) #10
jmp(Y,'dokei#13') #11
#dummy #12 Overlap
# pc = 0x0339, Opcode = 0x39
# Instruction PEEKV: Read byte from address contained in var, 30 cycles
label('PEEKV')
ld(hi('peekv#13'),Y) #10
jmp(Y,'peekv#13') #11
#dummy #12 Overlap
# pc = 0x033b, Opcode = 0x3b
# Instruction DEEKV: Read word from address contained in var, 28 cycles
label('DEEKV')
ld(hi('deekv#13'),Y) #10 #12
jmp(Y,'deekv#13') #11
#dummy #12 Overlap
# pc = 0x033d, Opcode = 0x3d
# Instruction XORBI: var.lo ^= imm, 28 cycles
label('XORBI')
ld(hi('xorbi#13'),Y) #10 #12
jmp(Y,'xorbi#13') #11
#dummy #12 Overlap
# pc = 0x033f, Opcode = 0x3f
# Conditional EQ: Branch if zero (if(vACL==0)vPCL=D)
ld(hi('beq#15'),Y) #12 #12
jmp(Y,'beq#15') #13
ld([vPC+1],Y) #14 vPC.hi
# pc = 0x0342, Opcode = 0x42
# Instruction ANDBA: vAC &= var.lo, 24 cycles
label('ANDBA')
ld(hi('andba#13'),Y) #10 #12
jmp(Y,'andba#13') #11
#dummy #12 Overlap
# pc = 0x0344, Opcode = 0x44
# Instruction ORBA: vAC |= var.lo, 22 cycles
label('ORBA')
ld(hi('orba#13'),Y) #10 #12
jmp(Y,'orba#13') #11
#dummy #12 Overlap
# pc = 0x0346, Opcode = 0x46
# Instruction XORBA: vAC ^= var.lo, 22 cycles
label('XORBA')
ld(hi('xorba#13'),Y) #10 #12
jmp(Y,'xorba#13') #11
#dummy #12 Overlap
# pc = 0x0348, Opcode = 0x48
# Instruction NOTB: var.lo = ~var.lo, 22 cycles
label('NOTB')
ld(hi('notb#13'),Y) #10 #12
jmp(Y,'notb#13') #11
#dummy #12 Overlap
# pc = 0x034a, Opcode = 0x4a
# Instruction DOKEX: doke word in vAC to address contained in var, var += 2, 30 cycles
label('DOKEX')
ld(hi('dokex#13'),Y) #10 #12
jmp(Y,'dokex#13') #11
ld(AC,X) #12
# pc = 0x034d, Opcode = 0x4d
# Conditional GT: Branch if positive (if(vACL>0)vPCL=D)
ld(hi('bgt#15'),Y) #12
jmp(Y,'bgt#15') #13
ld([vPC+1],Y) #14 vPC.hi
# pc = 0x0350, Opcode = 0x50
# Conditional LT: Branch if negative (if(vACL<0)vPCL=D)
ld(hi('blt#15'),Y) #12
jmp(Y,'blt#15') #13
ld([vPC+1],Y) #14 vPC.hi
# pc = 0x0353, Opcode = 0x53
# Conditional GE: Branch if positive or zero (if(vACL>=0)vPCL=D)
ld(hi('bge#15'),Y) #12
jmp(Y,'bge#15') #13
ld([vPC+1],Y) #14 vPC.hi
# pc = 0x0356, Opcode = 0x56
# Conditional LE: Branch if negative or zero (if(vACL<=0)vPCL=D)
ld(hi('ble#15'),Y) #12
jmp(Y,'ble#15') #13
ld([vPC+1],Y) #14 vPC.hi
# pc = 0x0359, Opcode = 0x59
# Instruction LDI: Load immediate small positive constant (vAC=D), 20 cycles
label('LDI')
ld(hi('ldi#13'),Y) #10
jmp(Y,'ldi#13') #11
#dummy #12 Overlap
#
# pc = 0x035b, Opcode = 0x5b
# Instruction MOVQW: Load a word var with a small constant 0..255, 30 cycles
label('MOVQW')
ld(hi('movqw#13'),Y) #10 #12
jmp(Y,'movqw#13') #11
ld([vPC+1],Y) #12 vPC.hi
# pc = 0x035e, Opcode = 0x5e
# Instruction ST: Store byte in zero page ([D]=vAC&255), 20 cycles
label('ST')
ld(hi('st#13'),Y) #10
jmp(Y,'st#13') #11
#dummy #12 Overlap
#
# pc = 0x0360, Opcode = 0x60
# Instruction DEEKX: Deek word at address contained in var, var += 2, 30 cycles
label('DEEKX')
ld(hi('deekx#13'),Y) #10 #12
jmp(Y,'deekx#13') #11
ld(0,Y) #12
# pc = 0x0363, Opcode = 0x63
# Instruction POP: Pop address from stack (vLR,vSP==[vSP]+256*[vSP+1],vSP+2), 30 cycles
label('POP')
ld(hi('pop#13'),Y) #10
jmp(Y,'pop#13') #11
#dummy #12 Overlap
#
# pc = 0x0365, Opcode = 0x65
# Instruction MOV: Moves a byte from src var to dst var, 28 cycles
label('MOV')
ld(hi('mov#13'),Y) #10
jmp(Y,'mov#13') #11
#dummy #12 Overlap
#
# pc = 0x0367, Opcode = 0x67
# Instruction PEEKA: Peek a byte from [AC] to var, 24 cycles
label('PEEKA')
ld(hi('peeka#13'),Y) #10 #12
jmp(Y,'peeka#13') #11
#dummy #12 Overlap
#
# pc = 0x0369, Opcode = 0x69
# Instruction POKEA: Poke a byte from var to [vAC], 22 cycles
label('POKEA')
ld(hi('pokea#13'),Y) #10 #12
jmp(Y,'pokea#13') #11
#dummy #12 Overlap
# pc = 0x036b, Opcode = 0x6b
# Instruction TEQ: Test for EQ, returns 0x0000 or 0x0101 in vAC, 28 cycles
label('TEQ')
ld(hi('teq#13'),Y) #10 #12
jmp(Y,'teq#13') #11
#dummy #12 Overlap
#
# pc = 0x036d, Opcode = 0x6d
# Instruction TNE: Test for NE, returns 0x0000 or 0x0101 in vAC, 28 cycles
label('TNE')
ld(hi('tne#13'),Y) #10 #12
jmp(Y,'tne#13') #11
#dummy #12 Overlap
#
# pc = 0x036f, Opcode = 0x6f
# Instruction DEEKA: Move a word from [AC] to var, 30 cycles
label('DEEKA')
ld(hi('deeka#13'),Y) #10, #12
jmp(Y,'deeka#13') #11
st([vTmp]) #12 mask
# pc = 0x0372, Opcode = 0x72
# Conditional NE: Branch if not zero (if(vACL!=0)vPCL=D)
ld(hi('bne#15'),Y) #12
jmp(Y,'bne#15') #13
ld([vPC+1],Y) #14 vPC.hi
# pc = 0x0375, Opcode = 0x75
# Instruction PUSH: Push vLR on stack ([vSP-2],v[vSP-1],vSP=vLR&255,vLR>>8,vLR-2), 30 cycles
label('PUSH')
ld(hi('push#13'),Y) #10
jmp(Y,'push#13') #11
#dummy #12 Overlap
#
# pc = 0x0377, Opcode = 0x77
# Instruction SUBBA: vAC -= var.lo, 28 cycles
label('SUBBA')
ld(hi('subba#13'),Y) #10 #12
jmp(Y,'subba#13') #11
#dummy #12 Overlap
#
# pc = 0x0379, Opcode = 0x79
# Instruction INCW: Increment word var, 26 cycles
label('INCW')
ld(hi('incw#13'),Y) #10
jmp(Y,'incw#13') #11
#dummy #12 Overlap
#
# pc = 0x037b, Opcode = 0x7b
# Instruction DECW: Decrement word var, 26 cycles
label('DECW')
ld(hi('decw#13'),Y) #10 #12
jmp(Y,'decw#13') #11
#dummy #12 Overlap
#
# pc = 0x037d, Opcode = 0x7d
# Instruction DOKEA: Doke a word from var to [vAC], 30 cycles
label('DOKEA')
ld(hi('dokea#13'),Y) #10 #12
jmp(Y,'dokea#13') #11
#dummy #12 Overlap
# pc = 0x037f, Opcode = 0x7f
# Instruction LUP: ROM lookup (vAC=ROM[vAC+D]), 26 cycles
label('LUP')
ld([vAC+1],Y) #10
jmp(Y,251) #11 Trampoline offset
adda([vAC]) #12
# pc = 0x0382, Opcode = 0x82
# Instruction ANDI: Logical-AND with small constant (vAC&=D), 20 cycles
label('ANDI')
ld(hi('andi#13'),Y) #10
jmp(Y,'andi#13') #11
anda([vAC]) #12
# pc = 0x0385, Opcode = 0x85
# Instruction CALLI: Goto immediate address and remember vPC (vLR,vPC=vPC+3,$HHLL-2), 28 cycles
label('CALLI_v5')
ld(hi('calli#13'),Y) #10
jmp(Y,'calli#13') #11
ld([vPC]) #12
# pc = 0x0388, Opcode = 0x88
# Instruction ORI: Logical-OR with small constant (vAC|=D), 20 cycles
label('ORI')
ld(hi('ori#13'),Y) #10
jmp(Y,'ori#13') #11
#dummy #12 Overlap
#
# pc = 0x038a, Opcode = 0x8a
# Instruction NOTW: Boolean invert var
label('NOTW')
ld(hi('notw#13'),Y) #10
jmp(Y,'notw#13') #11
#dummy #12 Overlap
#
# pc = 0x038c, Opcode = 0x8c
# Instruction XORI: Logical-XOR with small constant (vAC^=D), 14 cycles
label('XORI')
xora([vAC]) #10 #12
st([vAC]) #11
bra('NEXT') #12
ld(-14/2) #13
# pc = 0x0390, Opcode = 0x90
# Instruction BRA: Branch unconditionally (vPC=(vPC&0xff00)+D), 14 cycles
label('BRA')
st([vPC]) #10 #12
bra('NEXTY') #11
ld(-14/2) #12
# pc = 0x0393, Opcode = 0x93
# Instruction INC: Increment zero page byte ([D]++), 20 cycles
label('INC')
ld(hi('inc#13'),Y) #10
jmp(Y,'inc#13') #11
#dummy #12 Overlap
#
# pc = 0x0395, Opcode = 0x95
# Instruction ORBI: OR immediate byte with byte var, result in byte var, 28 cycles
label('ORBI')
ld(hi('orbi#13'),Y) #10 #12
jmp(Y,'orbi#13') #11
#dummy #12 Overlap
#
# pc = 0x0397, Opcode = 0x97
# Instruction CMPHU: Adjust high byte for unsigned compare (vACH=XXX), 28 cycles
label('CMPHU_v5')
ld(hi('cmphu#13'),Y) #10
jmp(Y,'cmphu#13') #11
#dummy #12 Overlap, not dependent on ld(AC,X) anymore
#
# pc = 0x0399, Opcode = 0x99
# Instruction ADDW: Word addition with zero page (vAC+=[D]+256*[D+1]), 28 cycles
label('ADDW')
# The non-carry paths could be 26 cycles at the expense of (much) more code.
# But a smaller size is better so more instructions fit in this code page.
# 28 cycles is still 4.5 usec. The 6502 equivalent takes 20 cycles or 20 usec.
ld(AC,X) #10,12 Address of low byte to be added
adda(1) #11
st([vTmp]) #12 Address of high byte to be added
ld([vAC]) #13 Add the low bytes
adda([X]) #14
st([vAC]) #15 Store low result
bmi('.addw#18') #16 Now figure out if there was a carry
suba([X]) #17 Gets back the initial value of vAC
bra('.addw#20') #18
ora([X]) #19 Carry in bit 7
label('.addw#18')
anda([X]) #18 Carry in bit 7
nop() #19
label('.addw#20')
anda(0x80,X) #20 Move carry to bit 0
ld([X]) #21
adda([vAC+1]) #22 Add the high bytes with carry
ld([vTmp],X) #23
adda([X]) #24
st([vAC+1]) #25 Store high result
bra('NEXT') #26
ld(-28/2) #27
# pc = 0x0399, Opcode = 0x99
# Instruction ADDW: Word addition with zero page (vAC+=[D]+256*[D+1]), 30 cycles
#label('ADDW')
#ld(hi('addw#13'),Y) #10 #12
#jmp(Y,'addw#13') #11
#ld(0,Y) #12
#
#fillers(until=0xad)
# pc = 0x03ad, Opcode = 0xad
# Instruction PEEK: Read byte from memory (vAC=[vAC]), 26 cycles
label('PEEK')
ld(hi('peek#13'),Y) #10
jmp(Y,'peek#13') #11
#ld([vPC]) #12 Overlap
#
# pc = 0x03b4, Opcode = 0xb4
# Instruction SYS: Native call, <=256 cycles (<=128 ticks, in reality less)
#
# The 'SYS' vCPU instruction first checks the number of desired ticks given by
# the operand. As long as there are insufficient ticks available in the current
# time slice, the instruction will be retried. This will effectively wait for
# the next scan line if the current slice is almost out of time. Then a jump to
# native code is made. This code can do whatever it wants, but it must return
# to the 'REENTER' label when done. When returning, AC must hold (the negative
# of) the actual consumed number of whole ticks for the entire virtual
# instruction cycle (from NEXT to NEXT). This duration may not exceed the prior
# declared duration in the operand + 28 (or maxTicks). The operand specifies the
# (negative) of the maximum number of *extra* ticks that the native call will
# need. The GCL compiler automatically makes this calculation from gross number
# of cycles to excess number of ticks.
# SYS functions can modify vPC to implement repetition. For example to split
# up work into multiple chucks.
label('.sys#13')
ld([vPC]) #13,12 Retry until sufficient time
suba(2) #14
st([vPC]) #15
bra('REENTER') #16
ld(-20/2) #17
label('SYS')
adda([vTicks]) #10
blt('.sys#13') #11
ld([sysFn+1],Y) #12
jmp(Y,[sysFn]) #13
#dummy() #14 Overlap
#
# pc = 0x03b8, Opcode = 0xb8
# Instruction SUBW: Word subtract with zero page (AC-=[D]+256*[D+1]), 28 cycles
# All cases can be done in 26 cycles, but the code will become much larger
label('SUBW')
ld(AC,X) #10,14 Address of low byte to be subtracted
adda(1) #11
st([vTmp]) #12 Address of high byte to be subtracted
ld([vAC]) #13
bmi('.subw#16') #14
suba([X]) #15
st([vAC]) #16 Store low result
bra('.subw#19') #17
ora([X]) #18 Carry in bit 7
label('.subw#16')
st([vAC]) #16 Store low result
anda([X]) #17 Carry in bit 7
nop() #18
label('.subw#19')
anda(0x80,X) #19 Move carry to bit 0
ld([vAC+1]) #20
suba([X]) #21
ld([vTmp],X) #22
suba([X]) #23
st([vAC+1]) #24
label('REENTER_28')
ld(-28/2) #25
label('REENTER')
bra('NEXT') #26 Return from SYS calls
ld([vPC+1],Y) #27
#
# The instructions below are all implemented in the second code page. Jumping
# back and forth makes each 6 cycles slower, but it also saves space in the
# primary page for the instructions above. Most of them are in fact not very
# critical, as evidenced by the fact that they weren't needed for the first
# Gigatron applications (Snake, Racer, Mandelbrot, Loader). By providing them
# in this way, at least they don't need to be implemented as a SYS extension.
#
# pc = 0x03cd, Opcode = 0xcd
# Instruction DEF: Define data or code (vAC,vPC=vPC+2,(vPC&0xff00)+D), 26 cycles
label('DEF')
ld(hi('def#13'),Y) #10
jmp(Y,'def#13') #11
#dummy #12 Overlap
#
# pc = 0x03cf, Opcode = 0xcf
# Instruction CALL: Goto address and remember vPC (vLR,vPC=vPC+2,[D]+256*[D+1]-2), 30 cycles
label('CALL')
ld(hi('call#13'),Y) #10, #12
jmp(Y,'call#13') #11
#dummy #12 Overlap
#
# pc = 0x03d1, Opcode = 0xd1
# Instruction POKEX: Poke byte in vAC to address contained in var, inc var, 30 cycles
label('POKEX')
ld(hi('pokex#13'),Y) #10 #12
jmp(Y,'pokex#13') #11
#dummy #12 Overlap
#
# pc = 0x03d3, Opcode = 0xd3
# Instruction NEGW: Arithmetic negate var
label('NEGW')
ld(hi('negw#13'),Y) #10, #12
jmp(Y,'negw#13') #11
#dummy #12 Overlap
#
# pc = 0x03d5, Opcode = 0xd5
# Instruction TGE: Test for GE, returns 0x0000 or 0x0101 in vAC, 26 cycles
label('TGE')
ld(hi('tge#13'),Y) #10 #12
jmp(Y,'tge#13') #11
#dummy #12 Overlap
#
# pc = 0x03d7, Opcode = 0xd7
# Instruction TLT: Test for LT, returns 0x0000 or 0x0101 in vAC, 26 cycles
label('TLT')
ld(hi('tlt#13'),Y) #10 #12
jmp(Y,'tlt#13') #11
#dummy #12 Overlap
#
# pc = 0x03d9, Opcode = 0xd9
# Instruction TGT: Test for GT, returns 0x0000 or 0x0101 in vAC, 28 cycles
label('TGT')
ld(hi('tgt#13'),Y) #10 #12
jmp(Y,'tgt#13') #11
#dummy #12 Overlap
#
# pc = 0x03db, Opcode = 0xdb
# Instruction TLE: Test for LE, returns 0x0000 or 0x0101 in vAC
label('TLE')
ld(hi('tle#13'),Y) #10 #12
jmp(Y,'tle#13') #11
#dummy #12 Overlap
#
# pc = 0x03dd, Opcode = 0xdd
# Instruction ANDBI: And immediate byte with byte var, result in byte var, 28 cycles
label('ANDBI')
ld(hi('andbi#13'),Y) #10 #12
jmp(Y,'andbi#13') #11
#dummy #12 Overlap
#
# pc = 0x03df, Opcode = 0xdf
# Instruction ALLOC: Create or destroy stack frame (vSP+=D), 20 cycles
label('ALLOC')
ld(hi('alloc#13'),Y) #10
jmp(Y,'alloc#13') #11
#dummy #12 Overlap
#
# pc = 0x03e1, Opcode = 0xe1
# Instruction SUBBI: Subtract a constant 0..255 from a byte var, 28 cycles
label('SUBBI')
ld(hi('subbi#13'),Y) #10 #12
jmp(Y,'subbi#13') #11
#dummy #12 Overlap
#
# pc = 0x03e3, Opcode = 0xe3
# Instruction ADDI: Add small positive constant (vAC+=D), 26 cycles
label('ADDI')
ld(hi('addi#13'),Y) #10 #12
jmp(Y,'addi#13') #11
st([vTmp]) #12
# pc = 0x03e6, Opcode = 0xe6
# Instruction SUBI: Subtract small positive constant (vAC+=D), 26 cycles
label('SUBI')
ld(hi('subi#13'),Y) #10
jmp(Y,'subi#13') #11
st([vTmp]) #12
# pc = 0x03e9, Opcode = 0xe9
# Instruction LSLW: Logical shift left (vAC<<=1), 28 cycles
# Useful, because ADDW can't add vAC to itself. Also more compact.
label('LSLW')
ld(hi('lslw#13'),Y) #10
jmp(Y,'lslw#13') #11
ld([vAC]) #12
# pc = 0x03ec, Opcode = 0xec
# Instruction STLW: Store word in stack frame ([vSP+D],[vSP+D+1]=vAC&255,vAC>>8), 24 cycles
label('STLW')
ld(hi('stlw#13'),Y) #10
jmp(Y,'stlw#13') #11
#dummy() #12 Overlap
#
# pc = 0x03ee, Opcode = 0xee
# Instruction LDLW: Load word from stack frame (vAC=[vSP+D]+256*[vSP+D+1]), 24 cycles
label('LDLW')
ld(hi('ldlw#13'),Y) #10,12
jmp(Y,'ldlw#13') #11
#dummy() #12 Overlap
#
# pc = 0x03f0, Opcode = 0xf0
# Instruction POKE: Write byte in memory ([[D+1],[D]]=vAC&255), 26 cycles
label('POKE')
ld(hi('poke#13'),Y) #10,12
jmp(Y,'poke#13') #11
st([vTmp]) #12
# pc = 0x03f3, Opcode = 0xf3
# Instruction DOKE: Write word in memory ([[D+1],[D]],[[D+1],[D]+1]=vAC&255,vAC>>8), 28 cycles
label('DOKE')
ld(hi('doke#13'),Y) #10
jmp(Y,'doke#13') #11
st([vTmp]) #12
# pc = 0x03f6, Opcode = 0xf6
# Instruction DEEK: Read word from memory (vAC=[vAC]+256*[vAC+1]), 28 cycles
label('DEEK')
ld(hi('deek#13'),Y) #10
jmp(Y,'deek#13') #11
#dummy() #12 Overlap
#
# pc = 0x03f8, Opcode = 0xf8
# Instruction ANDW: Word logical-AND with zero page (vAC&=[D]+256*[D+1]), 28 cycles
label('ANDW')
ld(hi('andw#13'),Y) #10,12
jmp(Y,'andw#13') #11
#dummy() #12 Overlap
#
# pc = 0x03fa, Opcode = 0xfa
# Instruction ORW: Word logical-OR with zero page (vAC|=[D]+256*[D+1]), 28 cycles
label('ORW')
ld(hi('orw#13'),Y) #10,12
jmp(Y,'orw#13') #11
#dummy() #12 Overlap
#
# pc = 0x03fc, Opcode = 0xfc
# Instruction XORW: Word logical-XOR with zero page (vAC^=[D]+256*[D+1]), 28 cycles
label('XORW')
ld(hi('xorw#13'),Y) #10,12
jmp(Y,'xorw#13') #11
ld(AC,X) #12
# pc = 0x03ff, Opcode = 0xff
# Instruction RET: Function return (vPC=vLR-2), 16 cycles
label('RET')
ld([vLR]) #10
assert pc()&255 == 0