ROM adventures (dev7rom)

Using, learning, programming and modding the Gigatron and anything related.
Forum rules
Be nice. No drama.
axelb
Posts: 41
Joined: 07 Jan 2021, 06:27

Re: ROM adventures

Post by axelb »

The v6— option worked fine for me - thank you !
lb3361
Posts: 367
Joined: 17 Feb 2021, 23:07

Re: ROM adventures

Post by lb3361 »

Just bragging about something that worked right away..

Yesterday I unboxed my Gigatron which is equipped with one of the glorious 512k boards manufactured by hans61 https://forum.gigatron.io/viewtopic.php?p=3014#p3014. I burned a recent rom512k7.rom and updated my sd card with both system7.gt1 and system.gt1. As expected the Gigatron runs system7.gt1 which feels substantially faster because it is snappier displaying directories. I ran sieve1.gt1 in about 13 seconds. All good.

Then I wanted to have the latest version of mscp, with the full opening book, and also with a double resolution display as in https://forum.gigatron.io/viewtopic.php?p=2831#p2831. I copied the latest mscp in https://github.com/lb3361/gigatron-lb/t ... progs/mscp and changed the Makefile. Instead of compiling with

Code: Select all

% glcc -rom=dev7 -map=128k,./mscp.ovl mscp.o core.o onload.s -o mscp_128k.gt1
% python3 addbook.py mscp_128k.gt1 book.bin
the -map=128k option becomes -map=512k,hr

Code: Select all

% glcc -rom=dev7 -map=512k,hr,./mscp.ovl mscp.o core.o onload.s -o mscp_512k_hr.gt1
% python3 addbook.py mscp_512k_hr.gt1 book.bin
And it all worked on the first time. Yet this is highly non trivial. The 128k version runs in bank0+bank2 selected after displacing the framebuffer into bank1. The 512k version runs in bank0+bank1 after setting up a double resolution framebuffer in banks 12,13,14 and 15. Both must copy the opening book into bank3 using a custom code (onload.s) that must run before setting up the execution environment. And all this happened flawlessly by just changing the -map option. I felt the vain satisfaction of a job well done!

Note 1 --
Attached the 128k and 512k version of mscp. The 128k version here can be loaded from a sd card (or with the loader) and has the full opening book unlike the one burned into the rom (space problems in the rom. maybe I should revert to the normal app selection.) If you have a Gigatron 128k with dev128k7.rom, use mscp_128k.gt1. If you have a Gigatron 512k, use mscp_512k_hr.gt1.
mscp_128k.gt1
(41.66 KiB) Downloaded 160 times
mscp_512k_hr.gt1
(41.66 KiB) Downloaded 170 times
Note 2 --
The job is not so well done because the ROM implementation of LDFAC/LDFARG/STFAC is three times slower than it should be. I was trying to make all my ROM code fit before address $2000 which may or may not be a good idea. Not sure. Anyway, it is important to notice that I added 12 pages of code over the old devrom, and 8 of them (14,15,18,1A,1B,1C,1D,1E) start with FSM header. In fact all this ROM work is merely a demonstration of the value of this idea. There is a lot of untapped potential there. But very little time.
lb3361
Posts: 367
Joined: 17 Feb 2021, 23:07

Re: ROM adventures (dev7rom)

Post by lb3361 »

7 - A closer look at the FSM framework

I am finally posting a more comprehensive explanation of the FSM framework and why it matters.

There is a substantial need for complex native operations that can be freely called from the vCPU. Not only floating point multiplications, but also graphic operations that manipulate sprites, saving and restoring the background pixels as needed, avoiding useless flicker, etc. Such operations were usually implemented with self-restarting SYS calls (e.g. https://forum.gigatron.io/viewtopic.php?p=492#p492) that manipulate the program counter to cause the vCPU to call them repeatedly until the task is finished. This is a bit clumsy because such a code must emulate potentially complex program flows using only this self-restart capability. This is also slow because it incurs about 15 cycles of overhead per iteration. Techniques that can shave the vCPU dispatch overhead (e.g. https://forum.gigatron.io/viewtopic.php?p=2292#p2292) tend to make the code hard to understand and debug.

The vCPU dispatcher works as follows:

Code: Select all

0301 NEXT: adda([vTicks])     # Count elapsed time
0302       blt('EXIT')        # Branch away when there is not enough time left for the next opcode
0303       st([vTicks])       # Delay slot: store updated time
     ...                      # Advance program counter
0307       ld([Y,X])          # Fetch opcode
     ---
0309       bra(AC)            # Branch to opcode implementation
030a       ld([Y,X])          # Delay slot: fetch opcode argument
030b EXIT: ...                # Resynchronize and return to video code
Each vCPU opcode or SYS call counts the number K of "ticks" it has consumed (one tick represents two cycles) and returns by branching to 'NEXT' with -K in the accumulator. The first part of the vCPU dispatcher computes the time left in the current slice by adjusting variable vTicks. This variable is smartly biased to become negative when there is not enough time left to execute another opcode. When this is the case, the dispatcher branches to EXIT which resynchronizes the Gigatron with the video timings and jumps into the video code. Otherwise the second part of the dispatcher increments the program counter, loads the next vCPU opcode and its arguments, then jumps to the opcode implementation. This takes about 10 cycles.

Let us now replace program counters, opcodes, and arguments with a single variable fsmState that tells what piece of native code should be executed next. Getting rid of all this complication makes it possible to merge the two conditional jumps 302 and 309 merge into a single jump 1402

Code: Select all

1401 NEXT: adda([vTicks])     # Count elapsed time
1402       bge([fsmState])    # Branch to native code block if enough time left
1403       st([vTicks])       # Delay slot: store updated time
1404 EXIT: ...                # Resynchronize and return to video code

This reduces the minimal overhead between two separately schedulable fragments of native code two 3 cycles only!

When I realized the meaning of this, I wrote a message to Hans61 and at67 that summarizes the situation quite well:
lb3361 wrote: 01 Dec 2022, 16:12 Hello Ari and Hans,

I tried to estimate the hidden cost of dispatching long instructions and found that it increases very quickly after 64 cycles. Splitting them into independently scheduled pieces is penalized by the dispatching cost (15 to 20 cycles, depending how you count) and is also very complex. At some point I stumbled on a very fast way to write a dispatcher as a finite state machine. It only has the minimal set of entry points, ENTER and NEXT. Everything has to be in one page. But the overhead is only 3 cycles (or 5 if you count the jump to NEXT)
...
I tried to apply this to AT67's SYS_Multiply_s16 and SYS_Divide_s16. You can see the code at https://github.com/lb3361/gigatron-rom/ ... m.py#L6061. Instead of executing the multiplication as 16 fragments of 56-66 cycles, it does so as 34 fragments which are all less then 28 cycles. The division is even more extreme: 49 fragments. Other than that, this is basically AT67's code. Here are the timing results for 3000 16x16 multiplications with semi-random arguments

Code: Select all

                             +----------+----------+----------+
                             |   vCPU   |  Native  |    FSM   |
+----------------------------+----------+----------+----------+
| 3000 multiplications 16x16 |   12.6   |   4.15   |   3.42   |
| 3000 divisions       16/16 |   15.4   |   7.45   |   4.93   | 
+----------------------------+----------+----------+----------+
Not only this saves time, but it is also far easier to program. Since each fragment starts at cycle 3, one can write a surprising amount in 28 cycles. In addition, there is little need to make complicated adjustments when one merges two control paths with different execution time. One can always branch to NEXT and be done with it. This of course has a lot of consequences on how to program long opcodes.
This code is now substantially faster in ways that would have been very complicated to program as a self-restarting SYS call.

Code: Select all

                             +----------+----------+----------+----------+
                             |   vCPU   |  Native  |    FSM   |  Newest  |
+----------------------------+----------+----------+----------+----------+
| 3000 multiplications 16x16 |   12.6   |   4.15   |   3.42   |   2.75   |
| 3000 divisions       16/16 |   15.4   |   7.45   |   4.93   |   4.93   |
+----------------------------+----------+----------+----------+----------+
The normal multiplication code loops 16 times, each time testing one bit of the multiplier and potentially adding the shifted multiplicand to the 16 bits product. The new code takes advantage of the fact that the last 8 iterations only affect the most significant byte of the product and can therefore be carried out much faster as 8 bit additions instead of 16 bits addition. This updated code is at https://github.com/lb3361/gigatron-rom/ ... m.py#L6527 where you can see the various fragments that chain each other by calling the FSM entry point. What would have been very hard to do with just self-restarting code became not only easy but also faster.

DEV7ROM uses the same idea again and again. For instance a key factor in the floating point performance (four times faster) is the MACX opcode that multiplies a 32 bits number by a 8 bits number and accumulates the product on 40 bits. About 75% of the code added between ROMv6 (proposed) and DEV7ROM is code that runs under the control of a FSM dispatcher.

In some cases it is possible to use the FSM dispatcher to execute more complex programs where we use fsmState as a program counter. Here is how I explained this to Hans61 and at67:
lb3361 wrote: 01 Dec 2022, 16:12 But there is more. One of my goals was to rewrite sys_Exec in purely native code because the v5a one uses memory on the stack, and I have other plans for the stack. The idea was to use the same scheduler but use fsmState as a program counter. Therefore I write a bunch of microinstructions and a program that uses them. You can see that at https://github.com/lb3361/gigatron-rom/ ... m.py#L6263. Then I realized that I could also write a native Loader that shares most of its code with the native Exec. I did not test this thoroughly because I do not have the right setup (my Arduino has long been repurposed). But I find this appealing.
The FSM in page 1500 implements micro-opcodes and programs both SYS_Exec and SYS_Loaders in terms of these micro-opcodes. You can see the micro-programs at https://github.com/lb3361/gigatron-rom/ ... m.py#L6996 and https://github.com/lb3361/gigatron-rom/ ... m.py#L7082. Most of the micro-opcodes resemble vCPU operations and are shared by both SYS calls, some perform tasks that are very specific to SYS_Exec or SYS_Loader such as uLUP, uSERIN, uDAT. Anyway, something that was rather difficult to code turned into something considerably easier to program and debug.

As explained in the beginning of this thread, I am mostly interested in making the Gigatron more C friendly. When I stumbled on that idea last December, I got very excited and decided to share it immediately with Hans61 and at67 because I see that it offers a lot of potential for constructing native support for rich graphics capabilities, something of great interest to at67. I believe he saw competition rather than contribution.

Anyway, the FSM framework makes it much easier to program and debug complex operations in native code. This is what allowed me to put together DEV7ROM in only a couple weeks. For instance the MACX opcode worked right away despite being rather complex. The entire DEV7ROM development took less time and effort than writing SYS_CopyMemory eighteen months ago. But I am glad that I did it because it helps making the point.

That's pretty much the end of this story, for now.
at67
Site Admin
Posts: 647
Joined: 14 May 2018, 08:29

Re: ROM adventures (dev7rom)

Post by at67 »

lb3361 wrote: 02 Mar 2023, 22:14 I got very excited and decided to share it immediately with Hans61 and at67 because I see that it offers a lot of potential for constructing native support for rich graphics capabilities, something of great interest to at67. I believe he saw competition rather than contribution.
Come on dude, I've asked multiple times now, just let it go. I'm not interested in your passive aggressive nonsense.

And for the record after you showed me this new stuff, (which is great btw), you said you would not create a competing ROM, but within weeks you had. The reason "I saw competition" is because you created it and then started flaunting it, remember this comment "well what are you going to do with ROMvX0 now?"

I am happy for your accomplishments and your new ROM, but this is all so immature, just get over yourself and get back to coding.
lb3361
Posts: 367
Joined: 17 Feb 2021, 23:07

Re: ROM adventures (dev7rom)

Post by lb3361 »

at67 wrote: 03 Mar 2023, 09:12 And for the record after you showed me this new stuff, (which is great btw), you said you would not create a competing ROM, but within weeks you had. The reason "I saw competition" is because you created it and then started flaunting it, remember this comment "well what are you going to do with ROMvX0 now?"
It is true that I had no intention to produce a ROM. In fact I was totally disgusted. Then I got Covid over Xmas, got bored, started to code something to play mscp on a 128k gigatron, then added opcodes, then revived my floating point ideas, etc. All things I would have been glad to do for ROMvX0 in fact if you had not made clear that you do not want it. Competition is a self-fulfilling prophecy. In the end I don't regret because one should never yield to the bully.

So the ROM is here and shows what can be done. I do not have great plans going forward. So feel free to copy the good ideas and make your dream graphic machine. Or feel free to try to ignore it while knowing that it shows how things can be done better.
at67
Site Admin
Posts: 647
Joined: 14 May 2018, 08:29

Re: ROM adventures (dev7rom)

Post by at67 »

lb3361 wrote: 04 Mar 2023, 00:17
at67 wrote: 03 Mar 2023, 09:12 And for the record after you showed me this new stuff, (which is great btw), you said you would not create a competing ROM, but within weeks you had. The reason "I saw competition" is because you created it and then started flaunting it, remember this comment "well what are you going to do with ROMvX0 now?"
It is true that I had no intention to produce a ROM. In fact I was totally disgusted. Then I got Covid over Xmas, got bored, started to code something to play mscp on a 128k gigatron, then added opcodes, then revived my floating point ideas, etc. All things I would have been glad to do for ROMvX0 in fact if you had not made clear that you do not want it. Competition is a self-fulfilling prophecy. In the end I don't regret because one should never yield to the bully.

So the ROM is here and shows what can be done. I do not have great plans going forward. So feel free to copy the good ideas and make your dream graphic machine. Or feel free to try to ignore it while knowing that it shows how things can be done better.
I'm the bully and you're the victim genius, lets just leave it at that then.
lb3361
Posts: 367
Joined: 17 Feb 2021, 23:07

Re: ROM adventures (dev7rom)

Post by lb3361 »

I don't fully remember the sentence "well what are you going to do with ROMvX0 now?" but I remember trying clumsily to explain that there were reasons to change a lot of things in the vCPU side of ROMvX0 to exploit the new idea (dev7.rom is just a demonstration of this in fact). I would have gladly helped but I was not sure my help was wanted to that extent. Then I turned to trying to advance an official v5 or v6 rom, doing a lot of janitorial work that is not super-interesting in fact. I simply thought that this was a good way to maintain interest for the platform. I was totally disgusted because of this post the ROMv6 thread:
at67 wrote: 11 Dec 2022, 21:00
lb3361 wrote: 11 Dec 2022, 11:57 I would like this project to make a happy at67.
Ok, wow, this changes everything, I didn't realise my happiness was such a high priority.

Given the above, I now revert all previous permissions I have given you, this applies to all code I have written, (but not limited to), Native, (ROM), vCPU, gtBASIC, GCL, gtBASIC runtime, applications, games, tests...I think you get the idea. So that means no SDCARD Browser, Tetronis, Invader, ROM routines lifted from ROMvX0, etc, should find their way into ROMv5, ROMv6, or any ROM's that you work on.

For bonus happiness points, it would be awesome if you gave me permission to remove what little code you contributed to ROMvX0, so that I could have a clean slate while we have two diverging ROM paths. Obviously I don't need your permission to remove your code from my ROM, but I guess respect is something I strive to give and receive.
And don't get me wrong: I actually hoped to find an outcome you would have liked. Maybe I touched something painful without knowing. Anyway, now the dev7 rom exists and does what it advertises. It is more a demonstration than a finished product. So feel free to copy (and most likely improve) the good bits while making your dream graphic machine. Btw the way you implement sprites in romvx0 is very smart (I actually have a lot of respect for your work, believe it or not.)
at67
Site Admin
Posts: 647
Joined: 14 May 2018, 08:29

Re: ROM adventures (dev7rom)

Post by at67 »

Ok, whatever, you don't want to let it go, here's some facts:

1) You were mad that ROMvX0 was taking too long.
2) Either you or Hans leaked it, (probably multiple times), when it was a private repo for about a year.
3) You wanted to control ROMvX0 by having me "give" you all my code so you could start your own repo.

There's more but I couldn't be bothered, none of 1) to 3) really bothered me, what bothered me is your disingenuous claim that you offered your new code to ROMvX0; but you did nothing of the sort. You never made any pull requests, you never spoke to me about integrating your new ideas into ROMvX0, all you did was show me a code snippet and then over the next few weeks started forum posts and new repos for your new ROM's.

Then you had the audacity to ask me what I was going to do with ROMvX0...hilarious, what did you think I was going to do, curl up into the fetal position and go hide under my bed because you came up with one tiny new feature.

Cheers.
lb3361
Posts: 367
Joined: 17 Feb 2021, 23:07

Re: ROM adventures (dev7rom)

Post by lb3361 »

1) I would have liked ROMvX0 to be out faster (but not mad) and become the official rom. I lobbied Walter in that direction.

2) I did not leak ROMVX0. I do not know who did. I never had a copy of the exact version that was leaked.

3) I politely asked for the multiply/divide code because slow multiplications were an obstacle to reading fat filesystems and because it would have been silly to write a new one. That was not a plan to take control of your code.

4) I would have liked to discuss but was met with a wall. After your post in the ROMv6 thread (above), what was left to discuss? It meant "go away" with more than a hint of bullying. I was left with the choice of either going away or competing. I first chose to go away out of disgust, then got covid, got bored, and built a rom almost by accident. This is how this competing rom non-sense became a self fulfilling prophecy.

If you take the good bits from dev7.rom and make them your own, I would consider it an acceptable resolution of the situation. I can even point out which specific pieces of code are easy to lift.
lb3361
Posts: 367
Joined: 17 Feb 2021, 23:07

Re: ROM adventures (dev7rom)

Post by lb3361 »

Dev7rom update

In the past few weeks, I reorganized the dev7 rom code to consolidate some free space without eating new pages. In the process, I sped up a couple frequent opcodes such as ADDI and SUBI. I also deprecated the ADDX opcode because of its limited use, and displaced the SUBW instruction implementation to make room in a page 3 for future opcodes. Some of this space was used for a 3 bytes MOVW instruction that can make programs more compact. There is still space for up to 4 new opcodes, one of which could be a new prefix opcode. Displacing SUBW makes it 2 cycles longer. This is compensated by the 4 cycles gained on opcodes ADDI and SUBI. All this is described in the updated doc file https://github.com/lb3361/gigatron-rom/ ... s/vCPU7.md. Everything remains backward compatible, seamlessly running Gigatron programs written for ROMv4, ROMv5a, or ROMv6.

I also worked on the C compiler to use the new opcode MOVW wisely to reduce the program sizes. For instance the cumulated size of the gigatron/tst programs went from 95KB to 90KB, which is not an insignificant gain for a simple opcode change. For the time being, this is only available in the compiler experimental branch because it generates code that is not compatible with the old dev7 rom (without the MOVW opcode).
Post Reply