ROM adventures (dev7rom)

Using, learning, programming and modding the Gigatron and anything related.
Forum rules
Be nice. No drama.
Hans61
Posts: 102
Joined: 29 Dec 2020, 16:15
Location: Saxonia
Contact:

Re: ROM adventures (dev7rom)

Post by Hans61 »

Thanks for your work
lb3361
Posts: 367
Joined: 17 Feb 2021, 23:07

Re: ROM adventures (dev7rom)

Post by lb3361 »

Shaving cycles.

Maybe inspired by Hans61 who solders for relaxation, I spend time here and there shaving cycles in dev7 rom, sometimes taking advantage of more rom code to speedup instructions (e.g. Bcc/ADDI/SUBI), sometimes tweaking the contents of the vCPU page, page 3, to recover some of the cycles lost between ROMv4 and ROMv5a (e.g. LD/ANDI/INC), sometimes recoding the new dev7 instructions (e.g. MACX, ADDL, LSLVL, LSLXA, LSRXA). However some instructions remain necessarily slower because they have added features (e.g. 16 bit stack POP, PUSH, LDLW, STLW) or because they have been moved out of page 3 (CALL/SUBW), making space for other instructions and also providing opportunities to speed up some old instructions. A complicated landscape.

Here are the results for ascbrot.gt1

Code: Select all

+-----------------------+---------+---------+---------+------------+-------------+
| ascbrot.gt1(mode 3)   |  ROMv4  |  ROMv5a |  ROMv6  | DEV7(2/23) | DEV7(11/23) |
+-----------------------+---------+---------+---------+------------+-------------+
| compiled for ROMv4    |  104.6s |  108.7s |  108.7s |   110.9s   |    106.4s   |
| compiled for ROMv5a   |         |  101.1s |  101.1s |   100.4s   |     97.9s   |
| compiled for ROMv6    |         |         |  101.1s |   100.4s   |     97.9s   |
| compiled for DEV7(*)  |         |         |         |    25.5s   |     23.6s   |
+-----------------------+---------+---------+---------+------------+-------------+
This is a floating point heavy program that runs much faster when compiled for rom dev7. Yet the cycle shaving changes yield a 10% speedup which is not insignificant. I was also quite happy that dev7 rom now runs programs compiled for roms v5a/v6 about 3% faster than roms v5a/v6 themselves. The execution times of the program compiled for ROMv4 are very telling as well.
veekoo
Posts: 123
Joined: 07 Jun 2021, 07:07

Re: ROM adventures (dev7rom)

Post by veekoo »

Very intresting results. Thanks for job you have done.

Is the DEV7 ROM and DEV ROM same?
lb3361
Posts: 367
Joined: 17 Feb 2021, 23:07

Re: ROM adventures (dev7rom)

Post by lb3361 »

veekoo wrote: 06 Nov 2023, 14:32 Very intresting results. Thanks for job you have done.
Is the DEV7 ROM and DEV ROM same?
They're different: The main version, named dev7.rom, is highly compatible with ROMv6 with speed improvements and added opcodes enabled by GLCC option -rom=dev7. Both longbrot and fpbrot should benefit considerably. Two customized versions of the DEV7ROM target the Gigatrons 128K or 512K. They can displace the video buffer in banked memory, allowing larger programs such as Marcel's Chess program MSCP. This is achieved with the additional GLCC option -map=128k or -map=512k.

Whether code or ideas from DEV7ROM will make it into the official repository is in the air for lack of consensus.
lb3361
Posts: 367
Joined: 17 Feb 2021, 23:07

Re: ROM adventures (dev7rom)

Post by lb3361 »

One of the key enablers for new vCPU instructions is the increase of MaxTicks pioneered by at67 (https://forum.gigatron.io/viewtopic.php?p=1995#p1995). Although this change has pervasive effects in the operation of the Gigatron, increasing MaxTicks from 14 to 15 ticks has been amazingly free of backward compatibility nightmares. Until last week, that is.

Background on MaxTicks --- The Gigatron ROM is essentially a loop that generates VGA and sound signals with precise timings. However, in many points of this loop, there is nothing to do for a known duration. These time slices are used to interpret vCPU opcodes. For instance, at the beginning of a blank scanline, there are about 148 cycles for vCPU opcode, or 74 ticks, with each tick equal to 2 cycle. Of course, when the ROM branches to the native code that implements a vCPU opcode, it must be certain that this code will "return" before the end of the time slice. This is why vCPU opcodes are only dispatched when at least MaxTicks ticks remain available in the time slice. With MaxTicks=14 as in ROMv5a, all vCPU opcodes must return it at most 28 cycles. This is not much because 10 of these cycles are already taken by the dispatching code, and 3-4 more are necessary if the vCPU opcode implementation is outside ROM page 3. Increasing MaxTicks really helps because it provides the elbow room to move vCPU opcode implementations around and add new ones. But increasing MaxTicks also means that there are more unused cycles at the end of each time slice. At67 found that increasing MaxTicks to 15 had practically no impact on the vCPU speed, but increasing to 16 would slow it by about 10%. This is why both ROMvX0 and DEV7ROM use MaxTicks=15.

Background on the SYS opcode --- The vCPU SYS opcode provides a way to execute native code that requires more than MaxTicks*2 cycles. For instance, the routine SYS_VDrawBits_134, which is used to draw characters on the screen, must be called with vCPU instruction SYS(134) which checks whether there are 134 remaining cycles in the current time slice. If not, it arranges to be called again by tweaking the vCPU program counter and returns immediately. The result is that the SYS(134) instruction is called again and again, until finding a long enough time slice.

SYS with MaxTicks=15 --- The argument of SYS(134) is not encoded as a cycle count, but as excess ticks required beyond MaxTicks. This means that SYS(134) with MaxTicks=14 is encoded as B4 CB, and SYS(134) with MaxTicks=15 is encoded as B4 CC. So when a ROM with MaxTicks=15 executes a program compiled for a ROM with MaxTicks=14, these SYS(134) encoded as B4 CB are executed as SYS(136). This does not seem too problematic because ensuring that there are 136 remaining cycles is enough to run a routine that takes at most 134 cycles.

Until last week.

When we runs the Gigatron in video mode 0, the slowest mode that displays all scanlines, the only remaining time slices are those occurring during the video vertical blanking interval. It turns out that the longest of these time slices is 134 cycles. So these SYS(134) opcodes compiled for MaxTicks=14 and interpreted as SYS(136) never find a time slice long enough to run. The Gigatron just waits. For instance, TinyBasic_v4.gt1, which was compiled for ROMv5a, works slowly but correctly with ROMv5a in video mode 0. However, on a MaxTicks=15 ROM operating in mode 0, it will simply hang until one changes the video mode.

If you care about backward compatibility, this is a problem. This is not one that is easy to fix. There is simply no cycle left in the code of the SYS instruction to correct its argument and normalize the way the instruction is encoded regardless of MaxTicks. After looking at this problem, I concluded that the only viable solution is to find a way to increase the length of at least some the vertical blanking time slices. When your only option is to find two free cycles in Marcel's incredibly tight code, you know you're in trouble.

I got lucky. Just before these 134 cycles time slices, there is code that tests variable videoY and decides whether to read the input to store in variable SerialRaw (this happens when videoY=207) and whether one needs to collect the audio samples (this happens when videoY&6 is zero). Instead of testing these both, we can notice that 207&6 is not zero. So if we need to read the input we don't need to test the audio condition. With some code reorganization, this gives the two cycles we need. However Marcel's code also used to write zero in memory location zero with instruction st(0,[0]) when not reading the input. Fortunately the previous bit of code contained a nop() and therefore gave another free cycle to do this as well. In the end, all seems to work.

With this patch (https://github.com/lb3361/gigatron-rom/ ... b18527ed8f), dev7rom offers slightly longer time slices during vertical blanking and runs old programs that prints characters in mode zero without hanging. This was a close one...
lb3361
Posts: 367
Joined: 17 Feb 2021, 23:07

Re: ROM adventures (dev7rom)

Post by lb3361 »

 

Virtual interrupts

Virtual interrupts (vIRQ), introduced in ROMv5a. Variable "frameCount" (0x0e) is incremented sixty times per second, at the beginning of each vertical blank interval. When it reaches zero, the Gigatron checks the contents of "vIRQ" (0x1f6-0x1f7). If this is a non zero pointer, the Gigatron firmware saves a restricted context (vAC, vPC, vCpuSelect) at locations 0x30-0x35, and sets the vCPU to execute the code located at address vIRQ regardless of what was running before. When this interrupt handling code executes a certain LUP instruction (quite a hack), the Gigatron firmware restores the saved context (vAC, vPC, vCpuSelect) and theoretically resumes what was running before the interrupt. Interrupts can occur between any two vCPU instructions, but also in the middle of SYS calls, in the middle of v6502 opcodes, or in the middle of long vCPU7 opcodes.
  • Maybe the most accomplished example is AT67 music sequencer which sets the audio channels according to the timings specified by a MIDI-like instruction sequence. Since ROMv5a, this is achieved by a vIRQ routine without need for the main program to do anything to keep the music playing.
  • The real Apple-1 contains a PIA chip which is used to read the keyboard and display textual output. The Apple-1 emulator simulates this chip in a vIRQ routine that is executed 60 times per second by the vCPU between or even in the middle of v6502 instructions.
  • The GLCC function "clock()" also uses a vIRQ routine to increment a word counter whenever frameCount reaches zero. This provides an easy way to count the elapsed time beyond the 255 frames supported by the byte-wide frameCount variable.
Yet many things can go wrong:
  • Many programs use locations 0x30-0x35 to store their own variables. This works as long as these programs do not use virtual interrupts. But for instance, trying to setup a virtual interrupt handler in WozMon is doomed to crash.
  • If a virtual interrupt occurs before the completion of the previous interrupt handling code, the firmware overwrites the context saved in 0x30-0x35. When the second virtual interrupt handler returns, the execution of the first one resumes. But when the first interrupt handler tries to return, it restores the same context and loops forever.
  • Bad things also happen if a virtual interrupt handler changes something that is used by the interrupted program. For instance, using opcode CALL changes vLR which is not part of the saved context. Therefore any interrupt handler that uses opcode CALL must make sure to save vLR before and to restore its original value before returning. If an interrupt handler calls a SYS routine, then it must save and restore the sysFn variable and the entire sysArgs array. If an interrupt handler calls a vCPU7 opcode that uses sysArgs or use new registers, then it must save and restore them as well. The problem is that saving and restoring all this extended context increases the risk of a double virtual interrupt.
((The rest of this post used to present a solution implemented in dev7rom. The more I look at this solution, the more I find it ugly. I am planning to change it all.))
Phibrizzo
Posts: 69
Joined: 09 Nov 2022, 22:46

Re: ROM adventures (dev7rom)

Post by Phibrizzo »

Hello :)

I have a few questions.

1. In documentation, RTI (Return from interrupt) is a sequence of asm codes. Latest is LUP.
What is the parameter for LUP? Any?

2, In the same documentations wrote, a LUP changes vAC. Then for what vAC is saved if interrupt came?
lb3361
Posts: 367
Joined: 17 Feb 2021, 23:07

Re: ROM adventures (dev7rom)

Post by lb3361 »

Phibrizzo wrote: 23 Jan 2024, 13:58 Hello :)
I have a few questions.
1. In documentation, RTI (Return from interrupt) is a sequence of asm codes. Latest is LUP.
What is the parameter for LUP? Any?
2, In the same documentations wrote, a LUP changes vAC. Then for what vAC is saved if interrupt came?
Are you speaking of the file in directory "Docs"? Here is the most up-to-date one https://github.com/lb3361/gigatron-rom ... rrupts.txt, describing the differences between the ROMv5a and ROMv6 version which is simpler. The parameter of LUP matters for ROMv5a but not for ROMv6 and beyond.

The LUP sequence to return from interrupt is a huge hack. In principle LUP reads a byte from the ROM, but it does so by jumping to a piece of code located at offset 251 in the ROM pages containing the target byte. All ROM pages containing "LUP-able" data must have this trampoline installed at offset 251. However the code at offset 251 of page $400 is completely different. Instead of returning a rom byte, it restores the saved vAC, vPC, and vCpuSelect (saved by the interrupt code in 0x30-0x35), and resumes the execution of the interrupted program. So the RTI sequence makes the LUP instruction do something completely different from its stated purpose.
Last edited by lb3361 on 23 Jan 2024, 23:14, edited 1 time in total.
Phibrizzo
Posts: 69
Joined: 09 Nov 2022, 22:46

Re: ROM adventures (dev7rom)

Post by Phibrizzo »

Are you speaking of the file in directory "Docs"?
I was based on https://github.com/lb3361/gigatron-rom/ ... ummary.txt
lb3361
Posts: 367
Joined: 17 Feb 2021, 23:07

Re: ROM adventures (dev7rom)

Post by lb3361 »

LUP is an instruction whose meaning has been hideously perverted by the return-from-interrupt mechanism :|
Post Reply