ROM adventures (dev7rom)

Using, learning, programming and modding the Gigatron and anything related.
Forum rules
Be nice. No drama.
Hans61
Posts: 98
Joined: 29 Dec 2020, 16:15
Location: Saxonia
Contact:

Re: ROM adventures (dev7rom)

Post by Hans61 »

Thanks for your work
lb3361
Posts: 325
Joined: 17 Feb 2021, 23:07

Re: ROM adventures (dev7rom)

Post by lb3361 »

Shaving cycles.

Maybe inspired by Hans61 who solders for relaxation, I spend time here and there shaving cycles in dev7 rom, sometimes taking advantage of more rom code to speedup instructions (e.g. Bcc/ADDI/SUBI), sometimes tweaking the contents of the vCPU page, page 3, to recover some of the cycles lost between ROMv4 and ROMv5a (e.g. LD/ANDI/INC), sometimes recoding the new dev7 instructions (e.g. MACX, ADDL, LSLVL, LSLXA, LSRXA). However some instructions remain necessarily slower because they have added features (e.g. 16 bit stack POP, PUSH, LDLW, STLW) or because they have been moved out of page 3 (CALL/SUBW), making space for other instructions and also providing opportunities to speed up some old instructions. A complicated landscape.

Here are the results for ascbrot.gt1

Code: Select all

+-----------------------+---------+---------+---------+------------+-------------+
| ascbrot.gt1(mode 3)   |  ROMv4  |  ROMv5a |  ROMv6  | DEV7(2/23) | DEV7(11/23) |
+-----------------------+---------+---------+---------+------------+-------------+
| compiled for ROMv4    |  104.6s |  108.7s |  108.7s |   110.9s   |    106.4s   |
| compiled for ROMv5a   |         |  101.1s |  101.1s |   100.4s   |     97.9s   |
| compiled for ROMv6    |         |         |  101.1s |   100.4s   |     97.9s   |
| compiled for DEV7(*)  |         |         |         |    25.5s   |     23.6s   |
+-----------------------+---------+---------+---------+------------+-------------+
This is a floating point heavy program that runs much faster when compiled for rom dev7. Yet the cycle shaving changes yield a 10% speedup which is not insignificant. I was also quite happy that dev7 rom now runs programs compiled for roms v5a/v6 about 3% faster than roms v5a/v6 themselves. The execution times of the program compiled for ROMv4 are very telling as well.
veekoo
Posts: 119
Joined: 07 Jun 2021, 07:07

Re: ROM adventures (dev7rom)

Post by veekoo »

Very intresting results. Thanks for job you have done.

Is the DEV7 ROM and DEV ROM same?
lb3361
Posts: 325
Joined: 17 Feb 2021, 23:07

Re: ROM adventures (dev7rom)

Post by lb3361 »

veekoo wrote: 06 Nov 2023, 14:32 Very intresting results. Thanks for job you have done.
Is the DEV7 ROM and DEV ROM same?
They're different: The main version, named dev7.rom, is highly compatible with ROMv6 with speed improvements and added opcodes enabled by GLCC option -rom=dev7. Both longbrot and fpbrot should benefit considerably. Two customized versions of the DEV7ROM target the Gigatrons 128K or 512K. They can displace the video buffer in banked memory, allowing larger programs such as Marcel's Chess program MSCP. This is achieved with the additional GLCC option -map=128k or -map=512k.

Whether code or ideas from DEV7ROM will make it into the official repository is in the air for lack of consensus.
lb3361
Posts: 325
Joined: 17 Feb 2021, 23:07

Re: ROM adventures (dev7rom)

Post by lb3361 »

One of the key enablers for new vCPU instructions is the increase of MaxTicks pioneered by at67 (viewtopic.php?p=1995#p1995). Although this change has pervasive effects in the operation of the Gigatron, increasing MaxTicks from 14 to 15 ticks has been amazingly free of backward compatibility nightmares. Until last week, that is.

Background on MaxTicks --- The Gigatron ROM is essentially a loop that generates VGA and sound signals with precise timings. However, in many points of this loop, there is nothing to do for a known duration. These time slices are used to interpret vCPU opcodes. For instance, at the beginning of a blank scanline, there are about 148 cycles for vCPU opcode, or 74 ticks, with each tick equal to 2 cycle. Of course, when the ROM branches to the native code that implements a vCPU opcode, it must be certain that this code will "return" before the end of the time slice. This is why vCPU opcodes are only dispatched when at least MaxTicks ticks remain available in the time slice. With MaxTicks=14 as in ROMv5a, all vCPU opcodes must return it at most 28 cycles. This is not much because 10 of these cycles are already taken by the dispatching code, and 3-4 more are necessary if the vCPU opcode implementation is outside ROM page 3. Increasing MaxTicks really helps because it provides the elbow room to move vCPU opcode implementations around and add new ones. But increasing MaxTicks also means that there are more unused cycles at the end of each time slice. At67 found that increasing MaxTicks to 15 had practically no impact on the vCPU speed, but increasing to 16 would slow it by about 10%. This is why both ROMvX0 and DEV7ROM use MaxTicks=15.

Background on the SYS opcode --- The vCPU SYS opcode provides a way to execute native code that requires more than MaxTicks*2 cycles. For instance, the routine SYS_VDrawBits_134, which is used to draw characters on the screen, must be called with vCPU instruction SYS(134) which checks whether there are 134 remaining cycles in the current time slice. If not, it arranges to be called again by tweaking the vCPU program counter and returns immediately. The result is that the SYS(134) instruction is called again and again, until finding a long enough time slice.

SYS with MaxTicks=15 --- The argument of SYS(134) is not encoded as a cycle count, but as excess ticks required beyond MaxTicks. This means that SYS(134) with MaxTicks=14 is encoded as B4 CB, and SYS(134) with MaxTicks=15 is encoded as B4 CC. So when a ROM with MaxTicks=15 executes a program compiled for a ROM with MaxTicks=14, these SYS(134) encoded as B4 CB are executed as SYS(136). This does not seem too problematic because ensuring that there are 136 remaining cycles is enough to run a routine that takes at most 134 cycles.

Until last week.

When we runs the Gigatron in video mode 0, the slowest mode that displays all scanlines, the only remaining time slices are those occurring during the video vertical blanking interval. It turns out that the longest of these time slices is 134 cycles. So these SYS(134) opcodes compiled for MaxTicks=14 and interpreted as SYS(136) never find a time slice long enough to run. The Gigatron just waits. For instance, TinyBasic_v4.gt1, which was compiled for ROMv5a, works slowly but correctly with ROMv5a in video mode 0. However, on a MaxTicks=15 ROM operating in mode 0, it will simply hang until one changes the video mode.

If you care about backward compatibility, this is a problem. This is not one that is easy to fix. There is simply no cycle left in the code of the SYS instruction to correct its argument and normalize the way the instruction is encoded regardless of MaxTicks. After looking at this problem, I concluded that the only viable solution is to find a way to increase the length of at least some the vertical blanking time slices. When your only option is to find two free cycles in Marcel's incredibly tight code, you know you're in trouble.

I got lucky. Just before these 134 cycles time slices, there is code that tests variable videoY and decides whether to read the input to store in variable SerialRaw (this happens when videoY=207) and whether one needs to collect the audio samples (this happens when videoY&6 is zero). Instead of testing these both, we can notice that 207&6 is not zero. So if we need to read the input we don't need to test the audio condition. With some code reorganization, this gives the two cycles we need. However Marcel's code also used to write zero in memory location zero with instruction st(0,[0]) when not reading the input. Fortunately the previous bit of code contained a nop() and therefore gave another free cycle to do this as well. In the end, all seems to work.

With this patch (https://github.com/lb3361/gigatron-rom/ ... b18527ed8f), dev7rom offers slightly longer time slices during vertical blanking and runs old programs that prints characters in mode zero without hanging. This was a close one...
Post Reply