New vCPU instructions 2.0
Forum rules
Be nice. No drama.
Be nice. No drama.
Re: New vCPU instructions 2.0
I just ran some experimented with a batch of indirect-indexed instructions.
The encoding is as follows:
PREFIX VAR OPCODE OFFSET
where PREFIX = $B1 (which is at67's PREFX1) and OPCODE is one of LD/LDW/ST/STW/ADDW/SUBW/ANDW/ORW/XORW. Instead of accessing a 16 bit variable at address OFFSET in page zero, these instructions now use [ [VAR] + OFFSET ]. This is useful in the C compiler to access local variables allocated on the stack, -- e.g, LDW([SP,offset]) -- and also to access fields in a structure pointed by a register variable, -- e.g., ANDW( [StructPtr, FieldOffset] ).
This comes at a cost of an additional 42-44 cycles which can be split in various ways (the PREFIX instruction does the full address calculation if it has enough time, otherwise it delegates the addition to a restart. Once the address is computed (stored in vLR), a final restart runs the actual instruction.) This overhead is quite good because it is the same as computing the address with LDI(offset);ADDW(var). The code size benefit is quite small with LD/LDW because one could do LDI(offset);ADDW(var);PEEK/DEEK() but much more significant with STW or ADDW because one replaces things like LDI(offset);ADDW(var);STW(tmpvar); <compute-something-in-vAC> ; DOKE(tmpvar) by a simple <compute-something-in-vAC> STW([var,offset]).
The total gain with the C compiler is about 3-5% extra reduction with respect to at67's new instruction set. This is smaller than I expected because the C compiler often finds a way to use DEEKA/DOKEA/DEEKV/DOKE relatively efficiently and aggressively promotes local variables to registers. When it fails to promote, it resorts to using stack variables in a manner that costs a lot of opcodes. So indirect-indexed addressing helps a lot there. But when the compiler works well, or when the programmer uses the keyword 'register' smartly, the gain is more limited.
Another question is the potential gain with respect to the v5a instruction set. Without the competition of DOKEA/DEEKA/DEEKV, the benefits of indirect-indexed addressing is a lot more obvious.
Overall I believe this is a good idea. The implementation might have to be refined. In particular I am not sure at67 would like the idea of completely taking over the PREFX1 instruction page for just 8 instructions. I need to sleep over this...
After mulling these results, I concluded that this would be a nice improvement over rom v5a, but a much less compelling one over at67's rom, once released.
The encoding is as follows:
PREFIX VAR OPCODE OFFSET
where PREFIX = $B1 (which is at67's PREFX1) and OPCODE is one of LD/LDW/ST/STW/ADDW/SUBW/ANDW/ORW/XORW. Instead of accessing a 16 bit variable at address OFFSET in page zero, these instructions now use [ [VAR] + OFFSET ]. This is useful in the C compiler to access local variables allocated on the stack, -- e.g, LDW([SP,offset]) -- and also to access fields in a structure pointed by a register variable, -- e.g., ANDW( [StructPtr, FieldOffset] ).
This comes at a cost of an additional 42-44 cycles which can be split in various ways (the PREFIX instruction does the full address calculation if it has enough time, otherwise it delegates the addition to a restart. Once the address is computed (stored in vLR), a final restart runs the actual instruction.) This overhead is quite good because it is the same as computing the address with LDI(offset);ADDW(var). The code size benefit is quite small with LD/LDW because one could do LDI(offset);ADDW(var);PEEK/DEEK() but much more significant with STW or ADDW because one replaces things like LDI(offset);ADDW(var);STW(tmpvar); <compute-something-in-vAC> ; DOKE(tmpvar) by a simple <compute-something-in-vAC> STW([var,offset]).
The total gain with the C compiler is about 3-5% extra reduction with respect to at67's new instruction set. This is smaller than I expected because the C compiler often finds a way to use DEEKA/DOKEA/DEEKV/DOKE relatively efficiently and aggressively promotes local variables to registers. When it fails to promote, it resorts to using stack variables in a manner that costs a lot of opcodes. So indirect-indexed addressing helps a lot there. But when the compiler works well, or when the programmer uses the keyword 'register' smartly, the gain is more limited.
Another question is the potential gain with respect to the v5a instruction set. Without the competition of DOKEA/DEEKA/DEEKV, the benefits of indirect-indexed addressing is a lot more obvious.
Overall I believe this is a good idea. The implementation might have to be refined. In particular I am not sure at67 would like the idea of completely taking over the PREFX1 instruction page for just 8 instructions. I need to sleep over this...
After mulling these results, I concluded that this would be a nice improvement over rom v5a, but a much less compelling one over at67's rom, once released.
Re: New vCPU instructions 2.0
I'm going to use this format, (as you suggested), for PREFX3 to save a few cycles.
We could just move one of the page3 instructions that is infrequently used, (like i did with SEXT), and create a new PREFX instruction page that supports this format and performs the offset calculation as part of PREFX, (if possible). If this is not possible or if the page wastage is too great, then using the modified PREFX3, (as above), may be an alternative.lb3361 wrote: ↑07 Jul 2021, 02:51 Overall I believe this is a good idea. The implementation might have to be refined. In particular I am not sure at67 would like the idea of completely taking over the PREFX1 instruction page for just 8 instructions. I need to sleep over this...
After mulling these results, I concluded that this would be a nice improvement over rom v5a, but a much less compelling one over at67's rom, once released.
P.S. I predict a lot of potential new instructions that could use a signed 8 bit offset, so I think there would eventually be a lot more than 8, making the page wastage a moot point hopefully.
Re: New vCPU instructions 2.0
Update:
I've updated and/or added the following instructions to ROMvX0, I've set myself a deadline of releasing the ROM by next weekend.
PAGE3
I've updated and/or added the following instructions to ROMvX0, I've set myself a deadline of releasing the ROM by next weekend.
PAGE3
- LSRB <var>, logical shift right on a zero page byte var, 28 cycles.
- LSRV <var>, logical shift right on a zero page word var, 52 cycles.
- LSLV <var>, logical shift left on a zero page word var, 28 cycles.
- ADDVI <var>, <imm>, add 8bit immediate to 16bit zero page var, var += imm, vAC = var, 50 cycles.
- SUBVI <var>, <imm>, subtract 8bit immediate from 16bit zero page var, var -= imm, vAC = var, 50 cycles.
- ADDVW <var dst>, <var src>, add 16bit zero page vars, dst += src, vAC = dst, 54 cycles.
- SUBVW <var dst>, <var src>, subtract 16bit zero page vars, dst -=src, vAC = dst, 54 cycles.
- DJNE <var>, <16bit imm>, decrement word var and jump if not equal to zero, 46 cycles
- DJGE <var>, <16bit imm>, decrement word var and jump if greater than or equal to zero, 42 cycles
- NOTE, vAC = ROM:[NotesTable + vAC.lo*2], 22 + 28 cycles.
- MIDI, vAC = ROM:[NotesTable + (vAC.lo - 11)*2], 22 + 30 cycles.
- LSLN <imm n>, vAC <<= n, (16bit shift), 22 + 30*n + 20 cycles.
- FREQM <var chan>, [(((chan & 3) + 1) <<8) | 0x00FC] = vAC, chan = [0..3], 22 + 26 cycles.
- FREQA <var chan>, [((((chan - 1) & 3) + 1) <<8) | 0x00FC] = vAC, chan = [1..4], 22 + 26 cycles.
- FREQZ <imm chan>, [(((chan & 3) + 1) <<8) | 0x00FC] = 0, chan = [0..3], 22 + 22 cycles.
- VOLM <var chan>, [(((chan & 3) + 1) <<8) | 0x00FA] = vAC.low, chan = [0..3], 22 + 24 cycles.
- VOLA <var chan>, [((((chan - 1) & 3) + 1) <<8) | 0x00FA] = 63 - vAC.low + 64, chan = [1..4], 22 + 26 cycles.
- MODA <var chan>, [((((chan - 1) & 3) + 1) <<8) | 0x00FB] = vAC.low, chan = [1..4], 22 + 24 cycles.
- MODZ <imm chan>, [(((imm & 3) + 1) <<8) | 0x00FA] = 0x0200, imm = [0..3], 22 + 24 cycles.
- SMPCPY <var addr>, copies 64 packed 4bit samples from [vAC] to the interlaced address in addr, vAC += 32, 22 + 31*58 + 52 cycles, (if vAC overflows a 256 byte boundary then 22 + 30*58 + 60 + 52 cycles).
- CMPWS <var>, vAC = vAC CMPWS var, combines CMPHS and SUBW into one instruction, 22 + 46 cycles.
- CMPWU <var>, vAC = vAC CMPWU var, combines CMPHU and SUBW into one instruction, 22 + 46 cycles.
- LEEKA <var>, var[0..3] = PEEK([vAC+0...vAC+3]), peeks a long from [vAC] to [var], 22 + 44 cycles.
- LOKEA <var>, POKE vAC[0..3], var[0..3], pokes a long from [var] to [vAC], 22 + 44 cycles.
- FEEKA <var>, var[0..4] = PEEK([vAC+0...vAC+4]), peeks a float, (5 bytes), from [vAC] to [var], 22 + 48 cycles.
- FOKEA <var>, POKE vAC[0..4], var[0..4], pokes a float, (5 bytes), from [var] to [vAC], 22 + 48 cycles.
- MEEKA <var>, var[0..7] = PEEK([vAC+0...vAC+7]), peeks 8 bytes from [vAC] to [var], 22 + 64 cycles.
- MOKEA <var>, POKE vAC[0..7], var[0..7], pokes 8 bytes from [var] to [vAC], 22 + 64 cycles.
- STB2 <16bit imm>, store vAC.lo into 16bit immediate address, 22 + 20 cycles.
- STW2 <16bit imm>, store vAC into 16bit immediate address, 22 + 22 cycles.
- XCHGB <var0>, <var1>, exchange two zero byte variables, 22 + 28 cycles.
- ADDWI <16bit imm>, vAC += immediate 16bit value, 22 + 28 cycles.
- SUBWI <16bit imm>, vAC -= immediate 16bit value, 22 + 28 cycles.
- ANDWI <16bit imm>, vAC &= immediate 16bit value, 22 + 22 cycles.
- ORWI <16bit imm>, vAC |= immediate 16bit value, 22 + 22 cycles.
- XORWI <16bit imm>, vAC ^= immediate 16bit value, 22 + 22 cycles.
- LDPX, <var addr>, <colour var>, load pixel, <addr>, <colour>, 22 + 30 cycles, (respects VTable).
- STPX, <var addr>, <colour var>, store pixel, <addr>, <colour>, 22 + 30 cycles, (respects VTable).
- CONDI, <imm0>, <imm1>, chooses immediate operand based on condition, (vAC == 0), 22 + 26 cycles.
- CONDB, <var0 byte>, <var1 byte>, chooses byte variable based on condition, (vAC == 0), 22 + 26 cycles.
- CONDIB, <imm0>, <var byte>, chooses between immediate operand and byte variable based on condition, (vAC == 0), 22 + 26 cycles.
- CONDBI, <var byte>, <imm0>, chooses between byte variable and immediate operand based on condition, (vAC == 0), 22 + 26 cycles.
- XCHGW, <var0>, <var1>, exchanges two zero page word variables, 22 + 46 cycles, (destroys vAC).
- SWAPB, <var0 addr>, <var1 addr>, swaps two bytes in memory, 22 + 46 cycles.
- SWAPW, <var0 addr>, <var1 addr>, swaps two words in memory, 22 + 58 cycles.
- NEEKA <var addr>, <imm n>, var[0..n] = PEEK([vAC+0...vAC+n]), peeks n bytes from [vAC] to [var], 22 + 34*n + 24 cycles.
- NOKEA <var addr>, <imm n>, POKE vAC[0..n], var[0..n], pokes n bytes from [var] to [vAC], 22 + 34*n + 24 cycles.
- OSCPX <var wave addr>, <var index>, read sample from wave-table address and format it into a screen pixel at address in [vAC], 22 + 42 cycles.
Last edited by at67 on 17 Oct 2021, 11:27, edited 4 times in total.
Re: New vCPU instructions 2.0
great work, thanks!
Re: New vCPU instructions 2.0
Nice! Looking forward to it!
Re: New vCPU instructions 2.0
Update:
I've updated/added the following instructions to ROMvX0.
PAGE3
I've updated/added the following instructions to ROMvX0.
PAGE3
- CMPHS: Reinstated.
- CMPHU: Reinstated.
- LOKEI: Loke immediate long into address contained in [vAC], 42 cycles, (5 byte instruction).
- LSLVL: Logical shift left var long, 22 + 56 cycles
- LSRVL: Logical shift right var long, 22 + 104 cycles
- ADDVL: Add two 32bit zero page vars, dst += src, 22 + 78 cycles
- SUBVL: Subtract two 32bit zero page vars, dst -= src, 22 + 74 cycles
- ANDVL: And two 32bit zero page vars, dst &= src, 22 + 46 cycles
- ORVL: Or two 32bit zero page vars, dst |= src, 22 + 46 cycles
- XORVL: Xor two 32bit zero page vars, dst ^= src, 22 + 46 cycles
- JCCL: Jump to address based on long CC, (address of long in vAC), 22 + (40 to 44) cycles
Re: New vCPU instructions 2.0
Hello
I am trying to take my first steps in ROMvX0. But i have some problems at start
My first program is easy:
And i got what i wanted: one white pixel in left and top corner.
I modified the progam by adding ADDBA:
Now the program is crashed. I expected pixel shifted in right about 63 pixels.
What is wrong?
I am trying to take my first steps in ROMvX0. But i have some problems at start
My first program is easy:
Code: Select all
MOVQ #63, #$42
LDWI #$0800
POKEA #$42
I modified the progam by adding ADDBA:
Code: Select all
MOVQ #63, #$42
LDWI #$0800
ADDBA #$42
POKEA #$42
What is wrong?
Code: Select all
$16, $3F, $42 MOVQ #63, #$42
$11, $00, $08 LDWI #$0800
$29, $42 ADDBA #$42
$69, $42 POKEA #$42
- Attachments
-
- asm_test.gt1
- (21 Bytes) Downloaded 56 times
Re: New vCPU instructions 2.0
I did few tests on randomly instructions. Some working correctly. But some not.
Exp. DEEK+ (opcode 0x60) working like DOKE+. Propably is wrong definied in instructions.txt file on Github.
Exp. DEEK+ (opcode 0x60) working like DOKE+. Propably is wrong definied in instructions.txt file on Github.
Re: New vCPU instructions 2.0
It probably is out of date, (though it shouldn't have been).
The best way to verify the opcodes is to use the actual ROM source at ROMvX0/Core/ROMvX0.asm.py
Search for "$0300" to get to the intrinsic instructions.
Search for "0x2200" to get to the PREFX3 instructions.
Search for "0x2300" to get to the PREFX2 instructions.
Search for "0x2400" to get to the PREFX1 instructions.
The best way to verify the opcodes is to use the actual ROM source at ROMvX0/Core/ROMvX0.asm.py
Search for "$0300" to get to the intrinsic instructions.
Search for "0x2200" to get to the PREFX3 instructions.
Search for "0x2300" to get to the PREFX2 instructions.
Search for "0x2400" to get to the PREFX1 instructions.