Unified return from vCPU interrupt (vRTI)

lb3361 · Post by **lb3361** » 10 Nov 2022, 04:31

The gigatron virtual interrupt system (see https://github.com/kervinck/gigatron-ro ... rrupts.txt) is a bit complicated because there are two ways to return from an interrupt. There is a fast way (LDWI $400 LUP 0) which does not save the current interpreter and therefore cannot work with the v6502, and there is another way (LDWI $400 LUP $xx) which restores the vcpu selection variable vCpuSelect with the value saved at address [xx+1]. The vIRQ routine is expected to save this when it starts. This more complete method is much slower because it only resumes the previous code at the next scanline. This complicated setup is heavily used by the Apple1 emulator.

I always thought that the problem was that there was not enough time to save vCpuSelect within the maximum 28 cycles. This is in fact more complicated because the convenient vCPU entry points do not work with v6502. The only common one, ENTER, is designed to restart the cpu interpreter at the beginning of an empty scanline. And using it requires more code, less likely to fit in the 28 cycles.

After several attempts, I found a way that uses the fast path if there is enough time left in the current scanline, or resynchronizes otherwise (but this is fast because one is already at the end of the scanline). So it seems that I now have the best of both worlds. Following at67's experimental rom, the address range 0x30-0x35 is reserved to save vPC, vAC, and vCpuSelect. This in turn requires minor changes to the Apple1 emulator because its monitor (wozmon) also wants to use 0x34-0x35.

Changes are at https://github.com/lb3361/gigatron-rom/ ... 5b8df7bf17

Should this simplification make it to the DEVROM. At67?

Post by **at67** » 10 Nov 2022, 15:00

It looks good to me, as long as it passes compatibility tests I think it's a definite improvement.

lb3361 · Post by **lb3361** » 10 Nov 2022, 15:42

at67 wrote: ↑10 Nov 2022, 15:00 It looks good to me, as long as it passes compatibility tests I think it's a definite improvement.

Thanks.

The only program that seriously exercises vIRQ+v6502 is the Apple1 emulator and it works perfectly fine. I also successfully tried some of my programs that use vIRQ with vCPU only. At67, can you suggest more programs to try, e.g. gtBasic programs that rely on vIRQ? Do you have more compatibility tests in mind? I believe one should test serously...

In fact the main incompatibility results from the fact that a vIRQ also clobbers $34 to save vCpuSelect. So one could imagine an old program that uses vIRQ and also uses $34 for other purposes. The unmodified Apple-1_v2.gcl is such a program. I had to reallocate some zero page variables used by Wozmon. In fact this problem also exists with ROMvX0 where the Apple-1 emulator sometimes enter nasty loops because it wants to use $34. In the case of the DEVROM, one could instead save vCpuSelect in $04 which is currently unused and reserved.

Meanwhile I changed the code already to make sure we take margins before triggering an immediate return. v6502 is much more complex than vCPU. The main change is on line 2588 in https://github.com/lb3361/gigatron-rom/ ... 9558cc76e8, and reshuffling native code around. Note also line 2604 which is an attempt to make this work for different values of maxTicks as, for instance, in ROMvX0.

lb3361 · Post by **lb3361** » 11 Nov 2022, 22:19

Turns out that gtBasic programs compiled for ROMv5a and using "INIT TIME" or "INIT MIDI" are only avoiding the vIRQ locations $30 to $33. This means that these programs can use $34 as the low byte of an integer variable. I would want these programs to work unchanged on the DEVROM. Therefore I have no other solution than moving the saved 'vCpuSelect' into the reserved variable at location 0x4.

This also means these same programs (gtbasic programs compiled for romv5a and using virq) break under ROMvX0 (verified).

Updated patch https://github.com/lb3361/gigatron-rom/ ... ee9e941122 is much simpler since Apple-1_v2.gcl now works unmodified. It could now be simplified but this is not necessary.

lb3361 · Post by **lb3361** » 19 Nov 2022, 04:04

Turns out that there was a solution that does not involve tapping the reserved variable at address 0x04. This new version saved vPC in $30-$31, vCpuSelect in $32, and pushes vAC on the stack. Location $33 is reserved for now. This code is tricky because restoring the context takes 10 more cycles. If there is enough time in the current slice to both restore the context and execute the next vCPU or v6502 instruction, we can call ENTER and continue without delay. If there is only time to restore the context, we can call RESYNC to resume the interrupted code at the next time slice. And when there is no time to do either, we need to restart the LUP instruction. Despite this complexity, everything seems to work well.

https://github.com/lb3361/gigatron-rom/ ... 9fd9dfb203

It might feel a bit strange to save the context in both a fixed location $30-33 and the stack. For instance it is tempting to save all the context on the stack because this fixes the problems that occur when a second vIRQ occurs before the end of a vIRQ routine. Alas, stack space on the Gigatron is a very limited resource.

Compared vRTI overheads
Assume that the time slice (minus vCPU overhead) ends at time T. With the old fast path (LUP 0, vCPU only), we have the following cases:

After time T-28, there is no time to execute the LUP instruction and one must wait for a resync to restart LUP.
Between T-56 and T-28, one can execute LUP, but one will have to wait for a resync to resume the next instruction.
Between T-84 and T-56, one can execute LUP and resume one instruction in the current time slice.
Before T-84, one can execute LUP and resume at least two instruction in the current time slice.

Unlike vCPU instructions, the v6502 instructions can last 38 cycles (v6502_maxTicks) and there is an additional 2 cycles of overhead. Since the new code works for both vCPU and v6502, it has to make headroom for these longer instructions. There is also the extra cost of the stack manipulation. Therefore we have the following picture:

After time T-38, there is no time to execute LUP and one must wait for a resync to restart LUP.
Between T-82 and T-28, one can execute LUP, but one will have to wait for a resync to resume the next instruction.
Between T-98 and T-82, one can execute LUP, and resume one instruction in the current time slice.
Before T-98, one can execute LUP and resume at least two instruction in the current time slice.

This means that in the case where there is no v6502 in the picture, there is a time window (T-98 to T-56) where new code lags about one instruction behind the old code. This is not ideal but maybe not significant. Versions of the unified vRTI that use a page zero location to save vCpuSelect are substantially better because the boundaries are T-90, T-72 and T-28 instead of T-98, T-82 and T-38. The window is shorter and during this window, they only lag by one half instruction instead of a full one.

lb3361 · Post by **lb3361** » 21 Nov 2022, 01:25

Conclusion. I tend to believe one should bite the bullet and reserve two more bytes for saving the vIRQ state. One should do this even if we do not implement the unified vRTI because future developments will need for space in page zero and one needs to define clear rules about which zero page locations can be used and when...

Scanning the repo to find out all instances where vIRQ is used gives me the following programs:

Code: Select all

Apps/Interrupts
Apps/Apple-1

and the following gtbasic programs which merely need to be recompiled after updating Compiler::moveVlblankVars in Contrib/at67/compiler.cpp.

Code: Select all

Contrib/at67/gbas/games/PucMon/PucMon_ROMv5a.gbas
Contrib/at67/gbas/test/vblank_ROMv5a.gbas
Contrib/at67/gbas/apps/Clock3_ROMv5a_64k.gbas
Contrib/at67/gbas/apps/Clock2_ROMv5a.gbas
Contrib/at67/gbas/demos/Xmas2020_ROMv5a.gbas
Contrib/at67/gbas/audio/Music64k_ROMv5a.gbas

Not that many at this point...

lb3361 · Post by **lb3361** » 21 Nov 2022, 23:32

Created pull request https://github.com/kervinck/gigatron-rom/pull/242. Let's see what at67 thinks!

This includes a comment and new interface.json variables 'userVars_vX' clarifying the boundary above which programs can use zero page bytes. Here is the full comment

Code: Select all

# Management of free space in page zero (userVars)
# * Programs that only use the features of ROMvx can
#   safely use all bytes above userVars_vx except 0x80.
# * Programs that use some but not all features of ROMvx
#   may exceptionally use bytes between userVars
#   and userVars_vx if they avoid using ROM features
#   that need them. This is considerably riskier.
userVars        = zpByte(0)
userVars_v4     = zpByte(0)

# Saved vCPU context during vIRQ
# Code that uses vCPU interrupts should not use these locations.
vIrqSave        = zpByte(6)

# Start of safely usable bytes under ROMv5 and derivatives
userVars_v5     = zpByte(0)

# [0x80]
# Constant 0x01. 
zpReset(0x80)
oneConst        = zpByte(1)
userVars2       = zpByte(0)

# Warning: One should avoid using SYS_ExpanderControl
# under ROMv4 overwrites becauses it overwrites 0x81.

Gigatron Hackers

Unified return from vCPU interrupt (vRTI)

Unified return from vCPU interrupt (vRTI)

Re: Unified return from vCPU interrupt (vRTI)

Re: Unified return from vCPU interrupt (vRTI)

Re: Unified return from vCPU interrupt (vRTI)

Re: Unified return from vCPU interrupt (vRTI)

Re: Unified return from vCPU interrupt (vRTI)

Re: Unified return from vCPU interrupt (vRTI)