Interrupts:

Post by **at67** » 18 Nov 2019, 03:59

Whilst finishing this BASIC compiler I am working on, one of the things I have been implementing is a time sliced architecture that allows a chained set of routines to run in a real time fashion. e.g. I want my audio routines, (sound and MIDI), to playback uninterrupted and at the correct tick rate no matter what is going on in the user's code or being displayed on the screen, or even what scanline mode the screen is in.

This is not a trivial task, either I leave it up to the programmer to insert tickMidi() and tickAudio calls in every processor intensive routine, (which is what I did in Tetronis, not fun), or I build in a seamless system that does it for you automatically, (which I have done).

So currently every intensive tight loop in the runtime, (mostly graphics stuff, busy waits, etc), has a call to a stub routine which by default is just a RET. Calling functions like PLAY MIDI <address> will insert the appropriate runtime function's address into this stub and voila you have seamless time-slicing of real time critical routines.

Ok, that's all well and good, but there's no free lunch and there are some drawbacks:

it costs cycles even when the stub routine is just a RET.

if the user's code itself is processor intensive without calling any runtime functions, then they need to insert appropriate TICK MIDI/AUDIO calls in their loops.

every new non trivial runtime routine needs to have this architecture built into it.

So what else can we do?

Interrupts...

Well, specifically, one vertical blank interrupt for vCPU code built into the ROM; how would it be done and what would be required?

2 bytes in zero page allocated for user VBLANK address.

Native code that runs at vertical blank that saves vPC into vLR and saves vAC, (maybe even vTMP?), on the stack.

Native code that copies VBLANK address to vPC and calls dispatch.

Native code that restores vPC from vLR, etc etc

I've had a quick play with my native code assembler and so far have not been very successful, re-organising the current code, (without fully understanding it like Marcel does), is not trivial when you have substantial Native code changes to make.

Does anyone have thoughts, ideas, additions? Obviously we could rely on Marcel to do everything when it comes to Native code, or we could help push it in a community driven direction.

marcelk · Post by **marcelk** » 18 Nov 2019, 08:48

Indeed, the only interrupt we have now is the reset interrupt vReset. That one is easy, because it doesn't have to return

.

Recently, when I enabled the TIME$ variable in MS BASIC, I thought a lot about interrupts and context switching. I haven't arrived at a satisfactory solution yet. Common wisdom dictates that the easiest mechanism should be a form of cooperative multitasking. But the issue I encountered is that within vCPU there's no way to get back into an arbitrary register state: you always clobber vAC, vLR or sysFn when jumping back into the interrupted program from vCPU itself.

On the other hand, preemptive multitasking appears remarkably easy to add: just swap out all registers 60 times per second. But the context is about 18 bytes in that case, because you really can't ignore sysFn, sysArgs[0..7] and vCPUselect. That makes it a bit heavy to use as an interrupt mechanism as well: 72 cycles just to swap the context data. Worse, the time slices will be at least a full frame and that's too coarse for many simple tasks you need interrupts for.

The nicest would be to start with a lightweight interrupt hook attached to the first or last line of vertical blank. You can even have it trigger when frameCount reaches zero. It should have only minimal context saving, preferably not even vLR and vSP. But it needs to be able to resume the interrupted program in a clean way when it's done (and that's where I got stuck).

There will be a solution, we just have to discover it. It would be cleanest if we can add a direct vCPU instruction for this, but I feel we maxed out the expansion room there. I was already looking to give an existing vCPU instruction a back door just for this. (Ouch! Although 'RET' is a natural candidate for such modification.)

An advantage for triggering on the first vBlank line is that you can schedule the maximum amount of work outside the visible screen area, for smooth graphical updates. An advantage for triggering on the last line of vBlank is that serialRaw and buttonState have just been updated. That avoids a 16 ms latency in input processing. For simplicity, we should pick one and stick to it.

Once we have such a minimal interrupt mechanism, we can program clocks and have a keyboard buffer. Or take it a step further and perform full context switching from there.

For many games, a main loop approach works fine of course. But I also want something more dynamic, if only because that's much cooler.

marcelk · Post by **marcelk** » 18 Nov 2019, 20:41

at67 wrote: ↑18 Nov 2019, 03:59
Well, specifically, one vertical blank interrupt for vCPU code built into the ROM; how would it be done and what would be required?
2 bytes in zero page allocated for user VBLANK address.

The vector can also come from outside zero page. We still have 3 unused bytes in page 1 at $1f6..$1f8. Alternatively, we can simply jump to a fixed address. Except I like to have a mechanism that switches this off completely (I think...), so some variable will be needed. A single bit will do.

Native code that runs at vertical blank that saves vPC into vLR and saves vAC, (maybe even vTMP?), on the stack.

If we save vPC into vLR, we must save vLR itself. So vPC can just as well all go directly onto the stack. Let the interrupt code save vLR if it decides it needs it. You can do a lot without using it.

The only issue I can think of is that some of my functions park values in the stack area just below where vSP points. But typically no more than 8 bytes (queens.gt1). The C runtime also does this (@div and @divu). It's a natural thing to do if you're a leaf function.

We can also park the state at a fixed ZP location. That is also easier for the native dispatch code. For that we can require the program that enables interrupts to make $fc..$ff available for this purpose (a single ALLOC -4).

(BTW: vTmp doesn't need saving.)

Native code that copies VBLANK address to vPC and calls dispatch.

Native code that restores vPC from vLR, etc etc

The last point is where I was stuck. We can do a SYS call for this. But that needs sysFn to be set and therefore saved. That makes the effective context at least 6 bytes: vPC, vAC and sysFn. Including that last one feels wrong to me, it itches.

In the case of returning through SYS the interrupt dispatcher could just as well preset sysFn as a courtesy. Just like saving vAC is really a courtesy: the interrupt handler can do it itself if we force it to. But for both we know the use of these registers is inevitable.

But adding a SYS call is a ROM change. And if we need a ROM change anyway, perhaps we can modify RET or some other instruction and give it more power. If we succeed, we don't need to clobber sysFn all the time. And a 4 byte context would be really neat and minimal.

Post by **at67** » 19 Nov 2019, 07:46

marcelk wrote: ↑18 Nov 2019, 20:41 The vector can also come from outside zero page. We still have 3 unused bytes in page 1 at $1f6..$1f8. Alternatively, we can simply jump to a fixed address. Except I like to have a mechanism that switches this off completely (I think...), so some variable will be needed. A single bit will do.

Of course, I get fixated with zero page; those 3 spare bytes sound like they were pre-ordained for this destiny.

marcelk wrote: ↑18 Nov 2019, 20:41 If we save vPC into vLR, we must save vLR itself. So vPC can just as well all go directly onto the stack. Let the interrupt code save vLR if it decides it needs it. You can do a lot without using it.

I also like the idea of saving/restoring minimal state and forcing the interrupt to do all the housekeeping, (not that there is much), as it actually forces the interrupt programmer to fully understand what is going on and how it all works.

marcelk wrote: ↑18 Nov 2019, 20:41 The only issue I can think of is that some of my functions park values in the stack area just below where vSP points. But typically no more than 8 bytes (queens.gt1). The C runtime also does this (@div and @divu). It's a natural thing to do if you're a leaf function.

We can also park the state at a fixed ZP location. That is also easier for the native dispatch code. For that we can require the program that enables interrupts to make $fc..$ff available for this purpose (a single ALLOC -4).

Forcing the interrupt routine to manage state would let the coder decide where and how much state he/she saves.

marcelk wrote: ↑18 Nov 2019, 20:41 The last point is where I was stuck. We can do a SYS call for this. But that needs sysFn to be set and therefore saved. That makes the effective context at least 6 bytes: vPC, vAC and sysFn. Including that last one feels wrong to me, it itches.

In the case of returning through SYS the interrupt dispatcher could just as well preset sysFn as a courtesy. Just like saving vAC is really a courtesy: the interrupt handler can do it itself if we force it to. But for both we know the use of these registers is inevitable.

But adding a SYS call is a ROM change. And if we need a ROM change anyway, perhaps we can modify RET or some other instruction and give it more power. If we succeed, we don't need to clobber sysFn all the time. And a 4 byte context would be really neat and minimal.

This is the part where I am a little confuzzld, I assume you mean a SYS call to get back to exactly where you started, (in terms of vCPU code). Why doesn't restoring whatever state you decided to mess with and then a RET do exactly that? Assuming you saved PC into LR in the first place.

marcelk · Post by **marcelk** » 19 Nov 2019, 08:49

at67 wrote: ↑19 Nov 2019, 07:46 This is the part where I am a little confuzzld, I assume you mean a SYS call to get back to exactly where you started, (in terms of vCPU code). Why doesn't restoring whatever state you decided to mess with and then a RET do exactly that? Assuming you saved PC into LR in the first place.

RET jumps to the value in vLR. An interrupt can trigger at any point in the main code, and the original vLR value may be live there: the interruptee can be running a leaf function and invoke RET itself in the near future. Or it may have been interrupted just between its POP and RET.

For an RTI mechanism, I'm now also staring at vCPUselect. It can be used to divert vCPU to some native context restore code with the next time slice. At the end of the interrupt code, you do something like this:

Code: Select all

LD >$12ff
ST vCPUselect
SYS 284 ---> Large value forces a resync. There will be no dispatch through sysFn

Then at the next time slice, we don't enter vCPU through ENTER at $2ff, but we divert to native code at $12ff. There we can restore the context, restore vCPUselect, and dispatch vCPU as if nothing happened. The cost will be these 3 instructions and a wait for the next time slice.

marcelk · Post by **marcelk** » 22 Nov 2019, 08:46

I opened a GitHub issue, because there are plenty of uses for this. Personally I want the PIA addresses ($D010-$D013) in the Apple-1 emulator to work as on the original.

There are 4 aspects to think about:

Triggering, vector and dispatch.
Context restore and return mechanism. This can be tricky.
Enable/disable method
The interaction with v6502

I stumbled upon a possible solution direction for item 2, by using the LUP instruction in a new way.

https://github.com/kervinck/gigatron-rom/issues/125

marcelk · Post by **marcelk** » 08 Feb 2020, 16:45

Consider it done and tested: vertical blank interrupts now work fine in dev.rom.

They trigger at the top of vertical blank whenever frameCount overflows to 0 and the vIRQ vector is non-zero. This vector must point to vCPU code. vPC and vAC will already be saved in the top of the zero page. The "vRTI" return sequence goes by the LUP instruction.

You can set the pace by updating frameCount prior to "vRTI". But be careful when setting it to 255, because there is a race condition if the vertical blank arrives before you've returned!

vIRQ is very fast when staying within vCPU, both on dispatch and return. Switching back to v6502 (or any other arbitrary processor, such a Forth) takes a bit of additional effort and awareness, but nothing crazy. Documentation in Docs/Interrupts.txt.

The simplest example is in Contrib/kervinck/IrqTest.gcl:

Code: Select all

gcl0x

4--                             {Reserve bottom of stack for saved context}
\vIRQ_DEVROM p=
[def                            {Interrupt handler}
  $901 p= peek 23+ p.           {Flash second pixel}
  226 \frameCount.              {256-30: next interrupt after .5 second}
  \vIRQ_Return 0??              {vRTI sequence}
] p:

$800 q=                         {Main loop}
[do 1+ q. loop]                 {Flash first pixel}

: IrqTest.png (15.75 KiB) Viewed 10512 times

Or, for those who don't grok the GCL notation:

Code: Select all

* file: Contrib/kervinck/IrqTest.gt1x

0200  df fc                    ALLOC $fc                |..|
0202  11 f6 01                 LDWI  $01f6              |...|
0205  2b 30                    STW   $30                |+0|
0207  cd 1a                    DEF   $021c              |..|
0209  11 01 09                 LDWI  $0901              |...|
020c  2b 30                    STW   $30                |+0|
020e  ad                       PEEK                     |.|
020f  e3 01                    ADDI  1                  |..|
0211  f0 30                    POKE  $30                |.0|
0213  59 e2                    LDI   $e2                |Y.|
0215  5e 0e                    ST    frameCount         |^.|
0217  11 00 04                 LDWI  $0400              |...|
021a  7f 00                    LUP   0                  |..|
021c  f3 30                    DOKE  $30                |.0|
021e  11 00 08                 LDWI  $0800              |...|
0221  2b 32                    STW   $32                |+2|
0223  e3 01                    ADDI  1                  |..|
0225  f0 32                    POKE  $32                |.2|
0227  90 21                    BRA   $0223              |.!|
* 41 bytes

* start at $0200

The prime example for mixing vIRQ with v6502 code is now in Apps/Apple-1/Apple-1.gcl. Here vIRQ emulates the PIA chip.

P.S.: I can't say that using interrupts makes vCPU programming any easier... I spent many days banging my head against race conditions in the PIA emulation. That's time I wish I could have spent on other things. The "main loop approach" to multitasking on small systems may seem lame, but it has clear advantages as well.

marcelk · Post by **marcelk** » 18 Feb 2020, 11:34

After experiencing the impact on Apple-1 Integer BASIC, I'm strongly thinking to move the $00FC-$00FF vIRQ locations to $0030-$0033. Any objections, please shoot...

marcelk · Post by **marcelk** » 24 Mar 2020, 09:01

marcelk wrote: ↑18 Feb 2020, 11:34 After experiencing the impact on Apple-1 Integer BASIC, I'm strongly thinking to move the $00FC-$00FF vIRQ locations to $0030-$0033. Any objections, please shoot...

Relocation is now done including documentation and example code. This also fixed `AUTO'.

Post by **at67** » 28 May 2020, 23:05

I am cursing myself for not responding to this when I could have, (rest in peace Marcel).

marcelk wrote: ↑08 Feb 2020, 16:45 Consider it done and tested: vertical blank interrupts now work fine in dev.rom.

Vertical Blank Interrupts on the Gigatron work better than I could ever have possibly imagined, I conjured up so many scenarios where they would fall over and break the vCPU interpreter sending it off into lala land. e.g.

Code: Select all

- VBI routine being swamped by SYS_SetMemory_v2_54 and the SYS_Sprite6x_v3_64 SYS functions. Nope, not
only do they not get in each other's way, they harmoniously co-exist no matter how much work the SYS calls
have to perform.

- VBI routine taking too much vCPU time causing the vCPU interpreter to fall over. Nope, all that happens is
that you miss the next VBI; you did mention a race condition, but it must be exceedingly rare, as after many
hours of trying to break it with frameCount = 255, and using scanline mode 0 with a complex daisy chain of VBI
handlers, all that happens is that VBI's are missed. No lockups, no crashes, no lala land.

- Efficiency and ease of use; gtBASIC allows for pseudo interrupts for ROMv1 to  ROMv4, but is a nightmare
to maintain and can cause unneeded inefficiencies in your code and the gtBASIC runtime. VBI handlers allow
your code and the runtime to be drastically more efficient and compact, as instead of having to call pseudo
interrupt handlers within ALL tight-loop runtime functions, (and user tight loops using TICK), it can just be
all automatically taken care of by a simple daisy chained interrupt handler.

Here is an example of what I am talking about, the video shows a number of graphics primitives being rendered and the printing of TIME variables within a main loop, whilst the MIDI playback, the calculation of those time variables and the left hand bottom corner pixel flashing are all handled by the daisy chained interrupt handlers, (runtime MIDI proc, runtime TIME proc and user interrupt proc). The user interrupt proc that is flashing the pixel is a tiny self contained gtBASIC subroutine).
https://streamable.com/8akz9z

marcelk wrote: ↑08 Feb 2020, 16:45 P.S.: I can't say that using interrupts makes vCPU programming any easier... I spent many days banging my head against race conditions in the PIA emulation. That's time I wish I could have spent on other things. The "main loop approach" to multitasking on small systems may seem lame, but it has clear advantages as well.

I agree, even though VBI handlers are not a panacea for handling all realtime/low latency/deterministic code on the Gigatron; there is a small niche of scenarios where they work so well they are a godsend, (see above video link for an example).

Gigatron Hackers

Interrupts:

Interrupts:

Re: Interrupts:

Re: Interrupts:

Re: Interrupts:

Re: Interrupts:

Re: Interrupts:

Re: Interrupts:

Re: Interrupts:

Re: Interrupts:

Re: Interrupts: