Indirect Addressing?

Using, learning, programming and modding the Gigatron and anything related.
Forum rules
Be nice. No drama.
Post Reply
monsonite
Posts: 101
Joined: 17 May 2018, 07:17

Indirect Addressing?

Post by monsonite »

Hi All,

I'm starting to think about implementing a Tiny Forth on the Gigatron vCPU.

Forth requires 2 stacks - the data stack and the return stack. If we use the usual vCPU stack with it's stack pointer at 0x1C as the data stack, is there a way of creating a second stack for the return stack with it's own return stack pointer at a different zero page location?

For this to work, we will need indirect addressing with one location acting as a pointer to the word we want to access.

With DEEK and DOKE we can access a word that is referenced by the vAC. Is it a case of first priming the vAC with the contents of the required pointer?

Is there a neater way of doing this with the vCPU?

Any help appreciated.


Ken
User avatar
marcelk
Posts: 488
Joined: 13 May 2018, 08:26

Re: Indirect Addressing?

Post by marcelk »

I believe that's the primary way. If you need to reduce code density, it can be hidden in subroutines for the primitives. Another approach can be to switch vSP's contents before using the instructions that operate on it. I haven't thought that through.

A completely different approach is to replace/augment vCPU with one optimised for FORTH operations. Maybe in a distant future though... (Perhaps there's a middle road and put some primitives in SYS functions that can be called from vCPU.)
monsonite
Posts: 101
Joined: 17 May 2018, 07:17

Re: Indirect Addressing?

Post by monsonite »

Hi All,

Now that we have a cpu capable of 12.5MHz (at least) and the addition of 200 vCPU cycles on each line of video, I decided that it was time to start looking again at some sort of Forth-like interpreted language.

First off, I need to be able to simulate the vCPU behaviour - so I have started on a simple instruction simulator written in Arduino code. The main reason for choosing the Arduino, is that the IDE is widely available and opens up a whole range of processors not just AVR. Another reason is that using the millis() and micros() functions, blocks of simulated instructions can be timed accurately.

The simulator is a work in progress, and there are still half a dozen instructions still to code (LUP, SYS, DEF, ALLOC, CALL, RET), but so far it's useful enough to test snippets of vCPU assembly to make sure they implement the correct stack behaviour needed for a stack based language like Forth.

Forth is often implemented as a small set of primitive instructions - coded up in the assembly language of the target processor. These primitives perform stack manipulation, arithmetic and logic functions, memory access, I/O, and program flow structures - and usually consist of short, self contained snippets of assembly language. This simulator is intended as a test bed for these isolated code snippets - so that they can be individually tested for correct operation, before being assembled into a larger program.

Initially I am concerned solely with stack operations, arithmetic, logic and memory access. The only I/O for the moment is via a serial terminal. Once the basics are in place, the language can be extended to include the Gigatron video generation.

The other idea I wish to implement is a set of pseudo registers in zero-page RAM. If vAC is register R0, then I see no reason why further 16-bit registers cannot be implemented in RAM - and given names (eg R1-R15) to make working in assembly language easier. This was inspired by Steve Wozniak's "Sweet16" and the regular register structure of the PDP-11 and MSP430 etc. They won't be the fastest access registers - but at least will be memorable.

As mentioned earlier in this thread, the vCPU instruction set is not ideal for stack manipulation and the code to take two numbers off the stack, add them together and return the sum to the stack is something like 16 instructions - see code window below:

Code: Select all

 
 	0x1A,       // LD DSP         DSP = Data stack pointer 0x20    1A 20
        0x20,
        0xF6,       // DEEK           vAC = [DSP] + 256*[DSP+1}        F6
        0x2B,       // STW TOS        TOS = Top of Stack               2B 30
        0x30,
        0x1A,       // LD DSP         Get DSP                          1A 20
        0x20,
        0xE6,       // SUB 02         Subtract 2                       E6 02
        0x02,
        0x5E,       // ST DSP         Store back                       5E 20
        0x20,
        0xF6,       // DEEK           vAC = second value on stack NOS  F6
        0x99,       // ADDW TOS       ADD TOS and NOS                  99 30
        0x30,
        0xF3,       // DOKE  DSP      Store sum to new stack top       F3 20
        0x20,      
        0x00,       // addr 0x10
 
If anyone would like to play about with simulating the VCPU - I have attached the draft code below, which simulates the stack addition above and prints out address and data values of RAM. The Data Stack Pointer is implemennted at address 0x20 and the Datastack grows down from address 0x30. When a bit more polished the code will appear on Github.

Code: Select all


// Gigatron vCPU Simulator

// Ken Boak April 28th 2019

// This attempts to simulate the vCPU instructions - so that short snippets of VCPU code may
// be written and tested - updating the vCPU accumulator vAC and any of the zero-page and other 
// memory locations.

// Memory will be defined as an 8-bit array with the opcode in byte 1 and data in byte 2

// The following registers will be defined:


// 0016-0017 vPC           Interpreter program counter, points into RAM
// 0018-0019 vAC           Interpreter accumulator, 16-bits
// 001a-001b vLR           Return address, for returning after CALL
// 001c      vSP           Stack pointer
// 001d      (vTmp)        Scratch storage location for vCPU
// 001e      (vReturn)     Return address (L) from vCPU into the loop (H is fixed)
// 

/*        List of Gigatron vCPU opcodes
  
 "LDWI"         0x11  LDWI  $DDDD    Load immediate arbitrary constant (vAC=D)
 
 "LD"           0x1A  LD    $DD      Load byte from zero page (vAC=[D])
 "ST"           0x5E  ST    $DD      Store byte in zero page ([D]=vAC)
 
 "LDW"          0x21  LDW   $DD      Word load from zero page (vAC=[D]+256*[D+1])
 "STW"          0x2B  STW   $DD      Store word into zero page ([D]=vAC&255,[D+1]=vAC>>8)
 
 "STLW"         0xEC  STLW  $DD      Store word in stack frame (vSP[D],vSP[D+1]=vAC&255,vAC>>8)
 "LDLW"         0xEE  LDLW  $DD      Load word from stack frame (vAC=vSP[D]+256*vSP[D+1])
 
 "PEEK"         0xAD  PEEK  -        Read byte from memory (vAC=[vAC])
 "POKE"         0xF0  POKE  $DD      Write byte in memory ([[D+1],[D]]=vAC&255)
 "DEEK"         0xF6  DEEK  -        Read word from memory (vAC=[vAC]+256*[vAC+1])
 "DOKE"         0xF3  DOKE  $DD      Write word in memory ([[D+1],[D]],[[D+1],[D]+1]=vAC&255,vAC>>8)
 
 "INC"          0x93  INC   $DD      Increment zero page byte ([D]++)
 
 "BRA"          0x90  BRA   $DD      Branch unconditionally (vPC=(vPC&0xff00)+D)
 "BCC"          0x35  BCC   $CC $DD  Test vAC and branch conditionally. CC can be EQ,NE,LT,GT,LE,GE
 "EQ"           0x3F
 "GT"           0x4D
 "LT"           0x50
 "GE"           0x53
 "LE"           0x56
 "NE"           0x72
 
 "LDI"          0x59  LDI   $DD      Load immediate small positive constant (vAC=D)
 "ADDI"         0xE3  ADDI  $DD      Add small positive constant (vAC+=D)
 "SUBI"         0xE6  SUBI  $DD      Subtract small positive constant (vAC-=D)
 "ANDI"         0x82  ANDI  $DD      Logical-AND with constant (vAC&=D)
 "ORI"          0x88  ORI   $DD      Logical-OR with constant (vAC|=D)
 "XORI"         0x8C  XORI  $DD      Logical-XOR with constant (vAC^=D)
 
 "ADDW"         0x99  ADDW  $DD      Word addition with zero page (vAC+=[D]+256*[D+1])
 "SUBW"         0xB8  SUBW  $DD      Word subtraction with zero page (vAC-=[D]+256*[D+1])
 "ANDW"         0xF8  ANDW  $DD      Word logical-AND with zero page (vAC&=[D]+256*[D+1])
 "ORW"          0xFA  ORW   $DD      Word logical-OR with zero page (vAC|=[D]+256*[D+1])
 "XORW"         0xFC  XORW  $DD      Word logical-XOR with zero page (vAC^=[D]+256*[D+1])
 
 "POP"          0x63  POP   -        Pop value from stack (vAC=[vSP]+256*[vSP+1],vSP+=2) 
 "PUSH"         0x75  PUSH  -        Push vLR on stack ([--vSP]=vLR&255,[--vSP]=vLR>>8)
 "LUP"          0x7F  LUP   $DD      ROM lookup (vAC=ROM[D,AC])

 "SYS"          0xB4  SYS   $DD      Native function call using at most 2*T cycles, D=270-max(14,T)
 
 "DEF"          0xCD  DEF   $DD      Define data or code (vAC,vPC=vPC+2,D+256*(vPC>>8))

 "ALLOC"        0xDF  ALLOC $DD      Create or destroy stack frame (vSP+=D)
 
 "LSLW"         0xE9  LSLW  -        Shift left (because 'ADDW vAC' will not work!) (vAC+=vAC)
 
 "CALL"         0xCF  CALL  $DD      Goto address but remember vPC (vLR,vPC=vPC+2,[D]+256*[D+1]-2)
 "RET"          0xFF  RET   -        Leaf return (vPC=vLR-2)

 With the simulation model in place it would be good to be able to evaluate small snippets of code

// LD DSP        // DSP = Data stack pointer         1A 20
// DEEK          // vAC = [DSP] + 256*[DSP+1}        F6
// STW TOS       // TOS = Top of Stack               2B 30
// LD DSP        // Get DSP                          1A 20
// SUB 02        // Subtract 2                       E6 02
// ST DSP        // Store back                       5E 20
// DEEK          // vAC = second value on stack NOS  F6
// ADDW TOS      // ADD TOS and NOS                  99 30 
// STW  DSP      // Store sum to new stack top       2B 20

0x1A,
0x20,
0xF6,
0x2B,
0x30,
0x1A,
0x20,
0xE6,
0x02,
0x5E,
0x20,
0xF6,
0x99,
0x30,
0x2B,
0x20,
 */

#define MEMSIZE         1024      // RAM sized for smallest Arduino 
byte M[MEMSIZE] = {
        
        0x1A,       // LD DSP         DSP = Data stack pointer 0x20    1A 20
        0x20,
        0xF6,       // DEEK           vAC = [DSP] + 256*[DSP+1}        F6
        0x2B,       // STW TOS        TOS = Top of Stack               2B 30
        0x30,
        0x1A,       // LD DSP         Get DSP                          1A 20
        0x20,
        0xE6,       // SUB 02         Subtract 2                       E6 02
        0x02,
        0x5E,       // ST DSP         Store back                       5E 20
        0x20,
        0xF6,       // DEEK           vAC = second value on stack NOS  F6
        0x99,       // ADDW TOS       ADD TOS and NOS                  99 30
        0x30,
        0xF3,       // DOKE  DSP      Store sum to new stack top       F3 20
        0x20,      
        0x00,       // addr 0x10
        0x00,
        0x00,
        0x00,
        0x00,
        0x00,
        0x00,     // addr 0x16  vPC L
        0x00,     // addr 0x17  vPC H
        0x00,     // addr 0x18  vAC L
        0x00,     // addr 0x19  vAC H
        0x00,     // addr 0x1a  vLR L
        0x00,     // addr 0x1b  vLR H
        0x00,     // addr 0x1c  vSP
        0x00,
        0x00,
        0x00,
        0x30,     // addr 0x20 DSP points to top of stack
        0x00,
        0x00,
        0x00,
        0x00,
        0x00,
        0x00,
        0x00,
        0x00,
        0x00,
        0x00,
        0x00,
        0x00,
        0x00,
        0x20,     // NOS
        0x02,
        0x10,     // addr 0x30 TOS
        0x01,
        0x40,
        0x04,
        0x00,
        0x00,
        0x00,
        0x00,
        0x00,
        0x00,
        0x00,
        0x00,
        0x00,
        0x00,
        0x00,
        0x00,
};

int vPC;
int vAC;
int vLR;
int addr;         // The address
int IR;          // The Instruction register
int DSP = 0x20;  // Data stack pointer
int TOS;
// byte M;        // The contents of memory address pointed to by the PC 
byte D;          // The data part of the instruction
byte DD;         // 2nd byte of data
byte vSP;
int vTmp;
byte vReturn;

void fetch()
{
  
   
    IR = M[vPC];
    D = M[vPC+1];        // get the data
    DD = M[vPC+2];       // get 2nd byte of data
    vSP = M[0x1c];       // get stack pointer
    
    vPC ++ ;
    vPC &= (MEMSIZE-1) ;
}

void execute()
{
  int op = IR;            // get the opcode
   
switch (op) {
case 0x00:  vTmp = M[DSP]; vTmp =M[vTmp] +256*M[vTmp+1]; Serial.print("TOS="); Serial.println(vTmp, HEX); break; // HALT  - and print TOS
case 0x11: vAC = D + 256 * DD;  vPC=vPC+2;           break;  // LDWI  $DDDD     Load immediate arbitrary constant (vAC=D) 
case 0x1A: vAC=M[D];   vPC ++ ;                      break;  //  LD    $DD      Load byte from zero page (vAC=[D])
case 0x5E: M[D]=vAC;   vPC ++ ;                      break;  //  ST    $DD      Store byte in zero page ([D]=vAC) 
case 0x21: vAC=M[D]+256*M[D+1];  vPC ++ ;            break;  //  LDW   $DD      Word load from zero page (vAC=[D]+256*[D+1])
case 0x2B: M[D]=vAC&255; M[D+1]=vAC>>8;   vPC ++ ;   break;  //  STW   $DD      Store word into zero page ([D]=vAC&255,[D+1]=vAC>>8) 
case 0xEC: M[vSP+D]=vAC&255; M[vSP+D+1]=vAC>>8; vPC ++ ; break;  //  STLW  $DD      Store word in stack frame (vSP[D],vSP[D+1]=vAC&255,vAC>>8)
case 0xEE: vAC=M[vSP+D]+256*M[vSP+D+1];  vPC ++ ;        break;  //  LDLW  $DD      Load word from stack frame (vAC=vSP[D]+256*vSP[D+1]) 
case 0xAD: vAC=M[vAC];                                   break;  //  PEEK  -        Read byte from memory (vAC=[vAC])
case 0xF0: addr = D + 256*DD; M[addr] = vAC&255; vPC=vPC+2; break;  //  POKE  $DD      Write byte in memory ([[D+1],[D]]=vAC&255)
case 0xF6: vAC=M[vAC] +256*M[vAC+1];                       break;  //  DEEK  - Read word from memory (vAC=[vAC]+256*[vAC+1])
case 0xF3: addr=M[D]; M[addr]=vAC&255; M[addr+1]=vAC>>8;  vPC=vPC+1 ; break; //  DOKE  $DD  Write word in memory ([[D+1],[D]],[[D+1],[D]+1]=vAC&255,vAC>>8) 

case 0x93: M[D]= M[D]+ 1; vPC ++ ; break;                  //  INC   $DD      Increment zero page byte ([D]++)
case 0x59:  vAC  = D; vPC ++ ;                      break; //  LDI   $DD      Load immediate small positive constant (vAC=D)
case 0xE3:  vAC += D; vPC ++ ;                      break; //  ADDI  $DD      Add small positive constant (vAC+=D)
case 0xE6:  vAC -= D; vPC ++ ;                      break; //  SUBI  $DD      Subtract small positive constant (vAC-=D)
case 0x82:  vAC &= D; vPC ++ ;                      break; //  ANDI  $DD      Logical-AND with constant (vAC&=D)
case 0x88:  vAC |= D; vPC ++ ;                      break; //  ORI   $DD      Logical-OR with constant (vAC|=D)
case 0x8C:  vAC ^= D; vPC ++ ;                      break; //  XORI  $DD      Logical-XOR with constant (vAC^=D)
case 0x99:  vAC+= M[D]+256*M[D+1]; vPC ++ ;         break; //  ADDW  $DD      Word addition with zero page (vAC+=[D]+256*[D+1])
case 0xB8:  vAC-= M[D]+256*M[D+1]; vPC ++ ;         break; //  SUBW  $DD      Word subtraction with zero page (vAC-=[D]+256*[D+1])
case 0xF8:  vAC&= M[D]+256*M[D+1]; vPC ++ ;         break; //  ANDW  $DD      Word logical-AND with zero page (vAC&=[D]+256*[D+1])
case 0xFA:  vAC|= M[D]+256*M[D+1]; vPC ++ ;         break; //  ORW   $DD      Word logical-OR with zero page (vAC|=[D]+256*[D+1])
case 0xFC:  vAC^= M[D]+256*M[D+1]; vPC ++ ;         break; //  XORW  $DD      Word logical-XOR with zero page (vAC^=[D]+256*[D+1])
case 0xE9:  vAC+=vAC;                               break; //  LSLW  -        Shift left (because 'ADDW vAC' will not work!) (vAC+=vAC)
 
case 0x90:  vPC=(vPC&0xff00)+D;                     break; //  BRA   $DD      Branch unconditionally (vPC=(vPC&0xff00)+D)
case 0x35:                                          break; //  BCC   $CC $DD  Test vAC branch conditionally. CC can be EQ,NE,LT,GT,LE,GE
case 0x3F:  if (vAC == 0) {vPC=(vPC&0xff00)+D;}  vPC ++ ;    break; //  EQ
case 0x4D:  if (vAC > 0)  {vPC=(vPC&0xff00)+D;}  vPC ++ ;    break; //  GT
case 0x50:  if (vAC < 0)  {vPC=(vPC&0xff00)+D;}  vPC ++ ;    break; //  LT
case 0x53:  if (vAC >= 0) {vPC=(vPC&0xff00)+D;}  vPC ++ ;    break; //  GE
case 0x56:  if (vAC <= 0) {vPC=(vPC&0xff00)+D;}  vPC ++ ;    break; //  LE
case 0x72:  if (vAC != 0) {vPC=(vPC&0xff00)+D;}  vPC ++ ;    break; //  NE 
case 0x63:  vAC=M[vSP]+256*M[vSP+1]; vSP+=2;                 break; //  POP   Pop value from stack (vAC=[vSP]+256*[vSP+1],vSP+=2) 
case 0x75:  M[--vSP]=vLR&255; M[--vSP]=vLR>>8;               break; //  PUSH  Push vLR on stack ([--vSP]=vLR&255,[--vSP]=vLR>>8)

case 0x7F:  break; //  LUP   $DD      ROM lookup (vAC=ROM[D,AC])

case 0xB4:  break; //  SYS   $DD      Native function call using at most 2*T cycles, D=270-max(14,T)
 
case 0xCD:  break; //  DEF   $DD      Define data or code (vAC,vPC=vPC+2,D+256*(vPC>>8))

case 0xDF:  break; //  ALLOC $DD      Create or destroy stack frame (vSP+=D)
 
case 0xCF:  break; //  CALL  $DD      Goto address but remember vPC (vLR,vPC=vPC+2,[D]+256*[D+1]-2)
case 0xFF:  vPC=vLR-2;                                      break; //  RET   -        Leaf return (vPC=vLR-2)
  
}

}

void setup() {
  
  Serial.begin(115200);
  vPC=0;
  vAC=0;
  vSP=0x80;
  
}

void loop() {

  while (M[vPC]) {
  fetch();
  execute();  
  Serial.print("  vPC="); Serial.print(vPC, HEX); Serial.print("  IR="); Serial.print(IR, HEX); Serial.print("  vAC="); Serial.println(vAC, HEX);  
}

   for (int i =0; i<=64; i++) {

   Serial.print("ADDR=");Serial.print(i, HEX);  Serial.print(" DATA=");Serial.println(M[i], HEX);  
   }

   while(1) {} ;

}

User avatar
marcelk
Posts: 488
Joined: 13 May 2018, 08:26

Re: Indirect Addressing?

Post by marcelk »

I was pondering, perhaps it's more efficient to use the vCPU stack as the data stack instead of call stack. We have ALLOC to grow/shrink it by any offset, and we have LDLW and STLW for R/W access at offsets in the stack. The call stack doesn't need these operations. It can be simulated by explicit vCPU sequences for push and pop of vLR.
monsonite
Posts: 101
Joined: 17 May 2018, 07:17

Re: Indirect Addressing?

Post by monsonite »

Marcel,

This sounds like an interesting alternative.

I'll have to look more carefully at ALLOC, LDLW and STLW and get them coded into the simulator.

BTW - I have ordered 13MHz crystal, 10nS 128Kx8 RAMS and SOP32 to DIP32 adaptor boards. I hope to get the expansion board built up and include a fast RAM upgrade.
pgavlin
Posts: 8
Joined: 22 Apr 2019, 19:38

Re: Indirect Addressing?

Post by pgavlin »

A few thoughts:

> The other idea I wish to implement is a set of pseudo registers in zero-page RAM. If vAC is register R0, then I see no reason why further 16-bit registers cannot be implemented in RAM - and given names (eg R1-R15) to make working in assembly language easier.

This is the approach taken by the C compiler. ZP locations 0x30-0x4e are reserved for 15 virtual registers deemed r1 through r15. Fifteen virtual registers may turn out to be too many depending on where things land w.r.t. a calling convention. As it stands, I am treating all registers as callee-save in order to save on space (callee-saves only require handling in function prologs/epilogs; caller-saves typically require handling at each call site). This means that each function has to either save registers ad-hoc (which is fast, but takes extra space) or call through a generic helper (which can be slow, but saves space if more than one register is used by the function).

> I was pondering, perhaps it's more efficient to use the vCPU stack as the data stack instead of call stack. We have ALLOC to grow/shrink it by any offset, and we have LDLW and STLW for R/W access at offsets in the stack. The call stack doesn't need these operations. It can be simulated by explicit vCPU sequences for push and pop of vLR.

IMO we would be better off adding a parameter to ALLOC, LDLW, and STLW that refers to a 2-byte ZP location that is used as the base address for the stack. This would allow e.g. the C compiler to save some time and space when manipulating the stack by using these instructions rather than helper calls. With this approach, ALLOC [D] would add to the stack pointer stored at [D], LDLW [D] would add to the stack pointer stored at [D] and then load the word stored at calculated address into vAC, and STLW [D] would add to the stack pointer stored at [D] and then store the word in vAC at the calculated address.
monsonite
Posts: 101
Joined: 17 May 2018, 07:17

Re: Indirect Addressing?

Post by monsonite »

IMO we would be better off adding a parameter to ALLOC, LDLW, and STLW that refers to a 2-byte ZP location that is used as the base address for the stack. This would allow e.g. the C compiler to save some time and space when manipulating the stack by using these instructions rather than helper calls. With this approach, ALLOC [D] would add to the stack pointer stored at [D], LDLW [D] would add to the stack pointer stored at [D] and then load the word stored at calculated address into vAC, and STLW [D] would add to the stack pointer stored at [D] and then store the word in vAC at the calculated address.
Is this not what is currently being done? ALLOC, LDLW and STLW already have a parameter that offsets into the zero page from the value held in vSP (address 0x1C).

Are you suggesting a more general case where we are not tied to a fixed vSP - but indexed off any zeropage location.

Question for Marcel - when using ALLOC, is $DD a signed integer so that $DD=0x02 increments the stack pointer by 2 and $DD=0xFE decrements it by 2?
User avatar
marcelk
Posts: 488
Joined: 13 May 2018, 08:26

Re: Indirect Addressing?

Post by marcelk »

monsonite wrote: 29 Apr 2019, 16:54 Question for Marcel - when using ALLOC, is $DD a signed integer so that $DD=0x02 increments the stack pointer by 2 and $DD=0xFE decrements it by 2?
Correct, because vSP is a single-byte register.

Programs typically park values on the stack with these instructions. They were originally squeezed in to make the recursive Search() function in Queens.gcl possible. This function doesn't manipulate the stack variables directly, but uses the stack to save and restore zero page variables. These zero page variables are then in turn used as if they are locals.

While ALLOC has some wiggle room for patching, both LDLW and STLW are at 26 cycles already. The limit for vCPU instructions is 28 cycles before they must become SYS extensions. So I fear this vCPU instruction set is pretty much what it is. But there should be possibilities to add new vCPU architectures (and perhaps even do cooperative multithreading between them).

I briefly checked if it's possible to make the address of vSP itself a variable (not really sure if that helps). But I don't immediately see how to do that in 2 cycles.
User avatar
marcelk
Posts: 488
Joined: 13 May 2018, 08:26

Re: Indirect Addressing?

Post by marcelk »

Staring at LCC's generated code, and looking at vCPU again, I think we could squeeze in one or two (or three) new vCPU instructions in a new ROM, if we really want, without breaking compatibility with existing vCPU programs.

At first glance, the existing ANDI and INC can each be patched to provide the landing space for new opcodes. This at the expense of slowing down the originals by 6 cycles (0.96 µs) because they must be rerouted to another ROM page and back. (ALLOC looks a bit too short for patching BTW, but maybe...).

As one new candidate instruction “THUNK $DD” comes to mind: basically a “BRA” into the next code page, replacing Pat's thunk functions at the end of each segment. That makes page hopping much faster and frees up some zero page real estate. It can also replace “LDWI $DDDD / CALL vAC” in many cases, saving 3 bytes (and not clobbering vAC).
Post Reply