ROM adventures (dev7rom)

Using, learning, programming and modding the Gigatron and anything related.
Forum rules
Be nice. No drama.
lb3361
Posts: 360
Joined: 17 Feb 2021, 23:07

ROM adventures (dev7rom)

Post by lb3361 »

I had pretty much decided to give up on the Gigatron after the ROMv6 drama. Then I caught Covid in December and stayed in my room while my family was having fun in the city. I was so bored that I decided to hack just one last thing. Then I went overboard and did many things I had wanted to do for months, recreating things I had written for ROMvX0, and more. So this is going to be a multi-part thread.

1- Marcel's Simple Chess Program (MSCP) on a 128k Gigatron

I always tried to run MSCP on a Gigatron. The main issue is that MSCP needs a C compiler (done) and between 60 and 80 KB of memory depending on how one sets it up. I had succeeded in running it in a simulator, without a frame buffer (https://forum.gigatron.io/viewtopic.php?p=2379#p2379) and on the Gigatron 512k, by moving the frame buffer into page 15 (https://forum.gigatron.io/viewtopic.php?p=2828#p2828). But none of those is a real Gigatron.

The solution came from a discussion with Hans61. He considering a hardware patch that locks the framebuffer in banks 0 and 1, regardless of which bank is selected. The Gigatron 512k already works like this in fact. Anyway, I realized that we can also do this in software using a video loop that switches banks on the fly: use bank 1 when displaying the screen, use the bank selected by SYS_ExpanderControl when executing vCPU code. Making this change was challenging: I had to find at least two bytes in page zero to store data and a couple spare cycles in Marcel's notoriously tight video loops. In the end I had to steal a couple cycles from the vCPU, costing 2% in speed.

The ROM itself works only on 128k+ Gigatrons and offers very few programs:
  • MSCP takes almost all the free space
  • MSCP and Racer are the only programs that depend on ROM components. Racer loads the cityscape bitmap from the rom and needs two private SYS calls. MSCP loads its opening book from the rom. All other programs can be loaded easily from a SPI SD card without messing with the slowish Loader protocol.
dev128k7-mainmenu.png
dev128k7-mainmenu.png (9.02 KiB) Viewed 3410 times
Then select Chess and play by entering moves using the standard notation.
dev128k7-mscp.png
dev128k7-mscp.png (9.03 KiB) Viewed 3410 times
You can also use command "both" to make MSCP play against itself. Because the Gigatron is not a very fast machine, the default search depth is now 2. To recover the full performance of MSCP, you need to type "sd 4" and be ready to wait. On the other hand, most of the opening book is there.

This program uses all memory banks. The code mostly resides in bank0 and its data in bank2; the screen buffer is in bank1; the transposition table and the opening book are in bank3.

2- Using the DEV7ROM and GLCC2

As usual everything is under the BSD license or the LCC license. No tricks, no revocable rights, no as-long-as-I-like-you clause.

The ROM binaries and their source code is available at https://github.com/lb3361/gigatron-rom in the master branch which used to contain the staged ROMv6. There are in fact three ROMs with different video loops.
  • If you have one of the few Gigatron 512k, use "dev512k7.rom". This ROM only works on a Gigatron 512k.
  • If you have a Gigatron 128k (any kind, including the Novatron), use "dev128k7.rom". This ROM only works on an expanded Gigatron.
  • If you have an unexpanded Gigatron, you can use "dev7.rom". Since MSCP cannot run on your Gigatron, you'll get a normal video loop and the application selection of the defunct ROMv6. But you'll still get all the other speed improvements. This would also be the ROM to
    use on a 128k Gigatron with the kind of hardware patch Hans61 has in mind.
All three ROMs are able to run all regular ROMv5a programs, without loss of performance, except for a 2% penalty for "dev128k7.rom" because it eats a couple vCPU cycles to switch banks on the fly. I expect near perfect compatibility except for programs that mess with the led tempo. You might notice that the LEDs are going a bit faster and that the ROM starts in mode 2 instead of mode 1.

The ROM source is in https://github.com/lb3361/gigatron-rom/ ... dev.asm.py. All three ROMs use the same source file and use conditional compilation constructs to change the video loop. You can use GLCC-2.0 (https://github.com/lb3361/gigatron-lcc/) to compile programs that use the new ROM goodies. Just use option "-rom=dev7" and observe a substantial speedup. There is also an option "-map=128k" which
copies bank1 into bank2, sets up the frame buffer in bank1, then switches to bank2 to run the program. This gives about 62KB of contiguous memory for code and data.

The adapted MSCP source is in https://github.com/lb3361/gigatron-rom/ ... s/MSCP/src. File "mscp.c" is 99% Marcel's code. The makefile calls the compiler with -map=128k and with an overlay that ensures that all code and data defined in file "core.c" lives in bank 0. I use it to swap banks and access the opening library or the transposition table. There is ample room to add new things such as a nice board display. I just did not have time. And also this is not what motivates me...

How final is the ROM? It could stay like this for a long time. I would like to reconsider two rare opcode and replace them with something different, but nothing guarantees that this is going to happen. Anyway, I will write about opcodes in a forthcoming addition to this thread. I did things quite differently from ROMvX0 and that paid off nicely (which was not obvious in the beginning.)
Last edited by lb3361 on 02 Mar 2023, 16:32, edited 3 times in total.
Hans61
Posts: 102
Joined: 29 Dec 2020, 16:15
Location: Saxonia
Contact:

Re: ROM adventures

Post by Hans61 »

Many thanks to lb3361. I have tested all ROM's on my hardware and could not find any errors.

dev7.rom on Gigatron 32K and 64K.
dev128k7.rom to 128K extension by Marcel, extension-retro by lb3361 and the Novatron
dev512k7.rom on my two extension-crazy

I made a new PCB of Marcel's expansion incorporating the changes. When it's there and I've tested it, I'll post it.
walter
Site Admin
Posts: 160
Joined: 13 May 2018, 08:00

Re: ROM adventures

Post by walter »

That is fantastic. I just ran it on my hardware. Marcel would be chuffed to bits. MSCP was a bit part of his life.
lb3361
Posts: 360
Joined: 17 Feb 2021, 23:07

Re: ROM adventures

Post by lb3361 »

3- vCPU extensions I. - integer operations

While working on MSCP increased my desire to have a additional vCPU instructions such as instructions discussed on at67’s thread (e.g. https://forum.gigatron.io/viewtopic.php?p=2053#p2053) and instructions I had coded for ROMvX0. Since having to redo things is an opportunity to try to do them better, I did them quite differently from ROMvX0.

In ROMv5a, the most important vCPU opcodes are implemented in ROM page 3. The opcode is simply the address of the implementation. In ROMvX0, all vCPU opcodes have been displaced into other pages in order to make room for a large number of new instructions. Displacing instructions costs a couple additional jumps. This extra overhead explains why ROMvX0 is about 10% slower when running ROMv5a programs. The idea was to catch up by providing a wealth of new efficient opcodes. A few of these new opcodes work around cumbersome aspects of the vCPU. All the other ones are simply “compressed” opcodes that are equivalent common sequences of traditional opcodes. For instance, ROMvX0 provides a three argument opcode called ADDVI that amounts to the sequence LDI(x);ADDW(y);STW(z). This opcode runs in 54 cycles, whereas the three opcode sequence requires 20+30+24=84 cycles. Things look less favorable when one recalls that these same three opcodes only cost 16+28+20=64 cycles in ROMv5a and when one considers that dispatching a 54 cycles instruction incurs a hidden cost because, on average, 54/2=27 cycles are lost when there is not enough time to complete this instruction before time-critical video tasks.

Therefore I tried the opposite approach! I looked for ways to create new instruction slots without creating additional overhead for the traditional vCPU opcodes. For instance I found that displacing the conditional branch implementation into a separate page gave me the opportunity to make them 2 cycles faster while opening space in page 3 for a dozen new opcodes. This gain gave me the right to displace the less frequently used stack instructions, creating space for another few new opcodes, and giving the breathing room necessary for implementing a true 16 bits stack. The key principle was that none of these additions came at the cost of decreasing the average performance of the traditional vCPU opcodes. In fact the most frequently used traditional vCPU opcodes, STW, ADDW, and LDW, remain the most frequently used opcodes in DEV7ROM.

Creating a special page for the conditional branch instructions also allows me to use the conditional branch opcode 0x35 as a prefix for several new opcodes performing long operations such as multiplications. I’ll describe these instructions in a following post. For now I am talking solely on the integer instructions that form the bulk of vCPU code. Meanwhile, the full set of new DEV7ROM opcodes is described in https://github.com/lb3361/gigatron-rom/ ... s/vCPU7.md.

Whether this approach would give better performance than ROMvX0 was not immediately obvious to me. The main reason to do so was to avoid creating a large API with scores of ultra-specialized opcodes that will be difficult to maintain in the future. But in the end, this turned to be quite efficient. Integer-only programs compiled for DEV7ROM tend to run 25% to 30% faster than the same program compiled for ROMv5a.

To compare with ROMvX0, one needs to resort to at67’s sieve benchmark (https://forum.gigatron.io/viewtopic.php?p=3066#p3066) and in particular the champion which runs in 11.1 seconds using very special video mode 1975. The message says
at67 wrote: 03 May 2022, 10:52 Messing around with Mode 1975 and new instructions recently added to ROMvX0, we have a new leader board champ!
For a fair comparison, it is also necessary to see how the sieve program initializes its main array. I contend that a C program is allowed to use the ANSI C standard function memset() which can initialize 24 to 32 bytes per scanline. But what we want here is to compare the vCPU implementations, not the languages. On the other hand, ROMvX0 provides two specialized instructions, POKE+ and DJNE that can be used for a tight loop and initialize 2 or 3 bytes per scanline. I believe GTBASIC uses this. DEV7ROM does not have such instructions because one is expected to use memset() for this. Anyway, best is to provide multiple points of comparison.
  • The first one, sieve0.c, runs the untouched C code from the 1981 Byte article.
  • The second one, sieve1.c (ptrloop), is a slightly reshuffled version whose main difference is initializing the array with the following loop. This loop clears about 1 byte per scanline and is therefore is slower than the likely ROMvX0 initialization. I cannot be sure here.

    Code: Select all

            { char *p = flags; while(p != flags+sizepl) *p++ = true; }
    
  • The last one, sieve1.c (memset), uses the memset library function and therefore is expected to be faster.
The following table gives the exact cycle counts and the corresponding runtimes assuming a 6.25 MHz crystal.

Code: Select all

Version               |  Cycles     | Time (seconds)
----------------------+-------------+----------------
sieve0.c (untouched)  |  70146220   |   11.22
sieve1.c (ptrloop)    |  67275080   |   10.76
sieve1.c (memset)     |  55624460   |    8.90
All these tests were run by going to https://github.com/lb3361/gigatron-lcc/ ... tuff/sieve and using the gtsim program to count the cycles using dev7.rom (not dev128k7.rom which is 2% slower). For instance the sieve1.c (ptrloop) result is obtained with

Code: Select all

% make clean sieve1-prof.txt DEFS=-DMEMSET=0
make: [clean] Error 1 (ignored)
../../build/glcc -rom=dev7 -map=sim,hionly -DMEMSET=0 sieve1.c -o sieve1-sim.gt1 --frags > sieve1-sim.frg
sieve1.c:59: warning: missing return value
gtsim -rom ../../gigatron/roms/dev7.rom -vmode 1975 -prof sieve1-sim.prf sieve1-sim.gt1
10 iterations

1899 primes
0 0/60 seconds
total 67275080 cycles (with video & overhead)
Anyway, the sieve0.c version (untouched code from Byte) runs 0.1 seconds behind the optimized ROMvX0 champ. The slightly optimized sieve1.c (ptrloop) version runs clearly faster despite having a slower array initialization code. What is happening here is not that DEV7ROM offers a magic instruction that makes sieve fast, but that each instructions in the body of the calculation is slightly faster. The last result sieve1.c (memset) is not a good comparison for this purpose but is worth having to show how much this array initialization matters.

It therefore seems that the DEV7ROM approach paid off nicely : smaller API and more efficient.

For reference the mode 3 times are given in the table below

Code: Select all

Version               |   ROMv5a    | DEV7 (2/2023) | DEV7 (12/2023) |
----------------------+-------------+---------------+----------------+
sieve0.c (untouched)  |  28s 38/60  |  20s 20/60    |    19s 40/60   |
sieve1.c (ptrloop)    |  26s  4/40  |  19s 34/60    |    18s 49/60   |
sieve1.c (memset)     |  21s  4/60  |  16s 15/60    |    15s 30/60   |
The total code size has not changed much (same runtime for printf and for the clocking code). The main function is only 196 bytes long in fact.

Code: Select all

Version               |   ROMv5a    | DEV7 (2/2023) | DEV7 (12/2023) |
----------------------+-------------+---------------+----------------+
sieve0.c (untouched)  |     2531    |     2136      |      1793      |    
sieve1.c (ptrloop)    |     2520    |     2133      |      1791      |
sieve1.c (memset)     |     2511    |     2131      |      1785      |   
Last edited by lb3361 on 24 Dec 2023, 14:15, edited 5 times in total.
at67
Site Admin
Posts: 647
Joined: 14 May 2018, 08:29

Re: ROM adventures

Post by at67 »

Sigh...

There is so much mis-information and outright erroneous hyperbole in this post, (with respect to ROMvX0), it's honestly cringe worthy and embarrassing to read.

Lets just keep this simple, leave me and my code out of your posts when talking about your ROM.

I am super happy for you that you have been able to build on the work of others, congratulations. But I want literally nothing to do with you or anything you work on and I would appreciate not being dragged into anymore drama.

Cheers.
lb3361
Posts: 360
Joined: 17 Feb 2021, 23:07

Re: ROM adventures

Post by lb3361 »

Mmhh. Let's just say that the proof of the pudding is in the eating!


4- vCPU extension II - heavy opcodes

I mentioned earlier that the prefix 0x35 of the conditional branches is now used as a prefix for a number of new opcodes. A few of these opcodes, such as the conditional branches, for instance, are directly implemented the prefix 35 page in order to minimize the overhead. The majority however is composed of opcodes whose execution takes a long time. The overhead is then acceptable because it represents a small fraction of the execution time. These opcodes include integer multiplications and divisions, long arithmetic, plus a number of opcodes that are useful to accelerate floating point computation such as bit shifts of the extended long accumulator, etc.

These heavy opcodes are implemented using a framework (the FSM framework) that allows for splitting them into small and easy-to-schedule pieces with minimal overhead. Although the FSM framework will receive a full explanation, there is a brief presentation in https://forum.gigatron.io/viewtopic.php?t=403 where I explain that it uses instructions that maybe are problematic on Gigatrons with slow RAM chips. Nothing of the sort has been observed yet, but who knows. What is certain is that this does not happen with expanded Gigatrons because they use a 45ns static ram that provides ample margin.

These opcodes have a drastic effect on long and floating point computations. An interesting benchmark is veekoo's ascbrot.c program (https://forum.gigatron.io/viewtopic.php?t=323&start=10). I slightly modified it to display its execution time (https://github.com/lb3361/gigatron-lcc/ ... /ascbrot.c). Since this benchmark works on all Gigatrons (except maybe those with a very slow SRAM), I attach the ROM and the GT1 files below.

Here is the ROMv5a version running in mode3:
ascbrot_v5a.png
ascbrot_v5a.png (161.99 KiB) Viewed 3249 times

And the DEV7.ROM version, which is about four times faster:
ascbrot_dev7.png
ascbrot_dev7.png (151.97 KiB) Viewed 3249 times
Veekoo has recently changes his fractal programs to use long arithmetic instead of floating point (https://github.com/kervinck/gigatron-ro ... actals/src). These versions also benefit heavily from the new opcodes supporting long and floating point arithmetic.


Since all these heavy opcodes are described in https://github.com/lb3361/gigatron-rom/ ... d-division, I am just going to outline two of them below:
  • Opcode MACX (multiply-accumulate extended) takes as input a 32 bits number in sysArgs[0..3] and a 8 bit number in vACL. It computes their product on 40 bits and adds it to a 40 bits extended accumulator called LAX. Four calls to this instruction are used to implement both the long and the floating point multiplication. Its implementation splits the work into up to 34 fragments taking between 20 and 30 cycles. These fragments are chained with very little overhead by the FSM machinery, allowing them to perform the mac operation much faster than any sequence of vCPU instructions.
  • Opcode MULQ (multiply quick), also a multiplication, compute the product of vAC and a small constant. The argument of the instruction however is not the small constant, but a bit pattern that describes a sequence of operations to perform to compute the product. In fact what this operation does is best described by the following little program:

    Code: Select all

    def mulq(vAC, KK):
        tmp = vAC
        vAC <<= 1
        while KK != 0:
            if KK & 0x80 == 0x80:
                vAC <<= 1; KK <<= 1
            elif KK & 0xc0 == 0x40:
                vAC += tmp; KK <<= 2
            else KK & 0xe0 == 0x20:
                vAC -= tmp; KK <<= 3
    
    Each code (KK above) corresponds to a particular multiplication which could have been implemented with explicit LSLW and ADDW. However in this case, they are sequenced by the low overhead FSM machinery instead of the vCPU dispatcher, making them about twice faster. Alas there is a catch because there is a price to pay to enter and leave the FSM sequencer. As a result, MULQ is a good solution small multiplications except the multiplication by 2 and 4 which are best implemented with LSLW. So in the end, MULQ is just a compromise between versatility and speed. The GLCC assembler (inside glink.py) offers pseudo-instructions _MULI() and _SHLI() that know how to best multiply or shift vAC with an immediate argument using various strategies: LSLW, MULQ, MULW, SYS_LSLW4, etc.
Attachments
dev7.rom
(128 KiB) Downloaded 120 times
ascbrot_v5a.gt1
(4.68 KiB) Downloaded 115 times
ascbrot_dev7.gt1
(3.37 KiB) Downloaded 120 times
lb3361
Posts: 360
Joined: 17 Feb 2021, 23:07

Re: ROM adventures

Post by lb3361 »

5- GLCC-2

A brief update about GLCC.

5.1 - The new stack

The goal of this major GLCC release was to take advantage of the vCPU7 true 16 bits stack

Earlier versions of GLCC had to manage its own stack pointer (SP) by hand instead of the vCPU stack pointer (vSP). Local variables that could not be located in registers had to be accessed with explicit address computation such as LDI(offset);ADDW(SP);DEEK(). The stack frame associated with function calls was optimized for this state-of-affairs. Functions that did not call other functions and did not have stack-based local variables were recognized as "frameless" and could use instead the normal vCPU stack.

In principle, things are much easier when the vCPU stack is a true 16 bits stack . Local variables can be accessed with a simple two-bytes instruction, LDLW(offset). Frameless functions are more similar to normal functions. The main difficulty was to take advantage of these possibilities without breaking the compiler's ability to work with earlier ROM versions. This was helped by defining pseudo-opcodes _LDLW and _STLW in glink that test whether SP == vSP and either use the native LDLW/STLW opcodes (using vSP) or construct a DEEK/DOKE sequence to do their job (using SP this time).

Anyway the effect of the true stack can be seen on the size of the mscp program (without the opening library):

Code: Select all

MSCP without opening library   | Size      | Comments
-------------------------------+-----------+------------
for ROMv5a                     | 31846     |
for an old ROMvx0              | 29570     | Because of DEEK/DOKE variants
for DEV7, without true stack   | 27732     | Because of smaller FP runtime
for DEV7, with true stack      | 26171     | Because LDLW/STLW
5.2 - Onload functions

The goal here was to produce a self-contained version of MSCP that can be loaded from a SD card or using the Loader.
(https://github.com/lb3361/gigatron-lcc/ ... stuff/mscp).

This is an improvement over the version that came with dev128k7.rom (https://github.com/lb3361/gigatron-rom/ ... s/MSCP/src) because this earlier version must find its opening library as a secondary file inside the ROM. Not only does this mean that it can only be loaded from the rom, but also that size constraints forced me to prune the opening library in questionable ways.

The trick was to add support in GLCC for "onload" functions whose goal is to prepare the execution environment. In the case of MSCP, three such functions are called in sequence. The first one, defined by mscp, copies the opening library into bank 3. The second one, defined by map128k, copies excess code to bank2, moves the framebuffer in bank1, and setups a contiguous 62KB address space spanning banks 0 and 2. The last one, define by the C library, clears the uninitialized variables as specifie by the C standard.

5.3 - Improved profiling support

Profiling (see https://forum.gigatron.io/viewtopic.php ... ling#p3007) has been made much more precise. Besides gtprof which gives cycle counts for all routines, the new version of gt1dump.py (https://github.com/lb3361/gigatron-rom/ ... gt1dump.py) can take a profiler file and display cycle counts for every disassembled instruction.

Example:

Code: Select all

$ cd stuff/veekoo/
$ glcc -rom=dev7 -map=sim ascbrot.c -o ascbrot.gt1 --frags > ascbrot.frg
$ gtsim -rom ../../gigatron/roms/dev7.rom -prof ascbrot.prf ascbrot.gt1 
total 162250841 cycles (with video & overhead)
$ gtprof ascbrot.prf ascbrot.frg | sort -nr | head -8
    98854964	TOTAL
    33956060	__@fmulm                      libc.a(rt_fmul.s)
    14197846	__@fdivloop                   libc.a(rt_fdivloop.s)
     8596472	_@_fmul                       libc.a(rt_fmul.s)
     8090322	__@fadd_t3                    libc.a(rt_faddt3.s)
     6610104	__@fnorm                      libc.a(rt_fnorm.s)
     5917386	mandelbrot                    ascbrot.c
     4432784	_@_clrfac                     libc.a(rt_rndfac.s)
$ ../../../gigatron-rom/Utils/gt1dump.py -d -p ascbrot.prf ascbrot.gt1  | head -30
* file: ascbrot.gt1

0036  20 3f 00 00 01 01 00 00  00 00                    | ?........|
* 10 bytes

08a0  21 1a             [vCPU] LDW    vLR                #7800        |!.|
08a2  2b 8c                    STW    $8c                #9814        |+.|
08a4  df e4                    ALLOC  $e4                #11178       |_d|
08a6  59 00                    LDI    0                  #6396        |Y.|
08a8  99 1c                    ADDW   vSP                #11138       |..|
08aa  85 a4 26                 CALLI  $26a4              #16960       |.$&|
08ad  4a 88 ba                 MOVQW  $ba,vT2            #11146       |J.:|
08b0  b1 8a 17 f6              MOVIW  $17f6,vT3          #12388       |1..v|
08b4  35 cf 05                 COPYN  5                  #53590       |5O.|

$ ../../../gigatron-rom/Utils/gt1dump.py -d -p ascbrot.prf ascbrot.gt1 | grep MACX
21ba  35 1c                    MACX                      #6727538     |5.|
21c2  35 1c                    MACX                      #6795372     |5.|
21ca  35 1c                    MACX                      #6841890     |5.|
21d2  35 1c                    MACX                      #7147708     |5.|

5.5 - Other
  • Glcc2 can compile for ROMv4 (-rom=v4), ROMv5a (-rom=v5a), proposed ROMv6 (-rom=v6), and DEV7ROM (-rom=dev7).
    GLCC also has best effort support for ROMvX0 (-rom=vx0) but it does not exploit it very well and might break if the opcodes change as they have in the past. The main reason to include ROMvX0 support was to broaden the coverage of the GLCC test sequence. This test sequence is very useful to expose ROM bugs as well as GLCC bugs. For instance (bug report here), in ROMvX0, the delay slot of jump instruction https://github.com/at67/ROMvX0/blob/ece ... m.py#L2862 tends to write stuff into random page zero locations.
  • New: It is also possible to compile with option "-rom=v6--" to compile for the proposed ROMv6 but without using the CMPHU/CMPHS opcodes that were displaced in ROMvX0. This is the only way to be compatible with all recent ROMS at the expense of performance.
  • Glcc2 produces improved code by tracking the register state. The code emitter uses annotations in the lcc templates to maintain assertions about register equality, which can then be used in conditional constructs. For instance, after STW(R3), we know that vAC and R3 contain the same. If a following template attempts to reload R3 into vAC, a conditional construct prevents the emission of LDW(R3) because we know that vAC already the right value. All this becomes more powerful when assertions can track constant values and memory locations as well, leading to non trivial optimizations...
  • Glink now tracks the usage of all locations in page zero: rom requirements, registers, temporaties, "near" variables. This provides means to deal with the various roms as well as having the runtime allocate temporaries if and only if they are needed.
Last edited by lb3361 on 22 Sep 2023, 21:42, edited 3 times in total.
axelb
Posts: 41
Joined: 07 Jan 2021, 06:27

Re: ROM adventures

Post by axelb »

When trying to compile your system.gt1 with glcc 2.1 the linker complains about
glink: loaderasm.s:126: fatal error: name '_BMOV' is not defined

Do you suggest to change the loaderasm.s or to insert

Code: Select all

module_dict['_BMOV'] = _MOVM
into glink.py as it was done previously ?
lb3361
Posts: 360
Joined: 17 Feb 2021, 23:07

Re: ROM adventures

Post by lb3361 »

axelb wrote: 11 Feb 2023, 22:17 When trying to compile your system.gt1 with glcc 2.1 the linker complains about
glink: loaderasm.s:126: fatal error: name '_BMOV' is not defined
Do you suggest to change the loaderasm.s or to insert

Code: Select all

module_dict['_BMOV'] = _MOVM
into glink.py as it was done previously ?
Many thanks!

Turns out I believe one should do both.
  • Adding the compatibility definitions cannot hurt.
  • There are two good reasons ot make changes in system.gt1. The first one is that it was trying to replicate an old loader bug which was in fact eliminated from ROMv5a (I just did not notice). No need to do this anymore. The second reason is that system.gt1 compiled for dev7 uses the 16 bits stack, and that one has to reinitialize the stack to its normal page zero location before launching programs that may not expect this. I only realized that when reviewing the sd card loader code :-(
Both changes are in now.

Many thanks again.


p.s.1 -- I do not know anymore which rom the system.gt1 makefile should target. I re-added the rom checking code, meaning that one should not see uncontrolled crashes (just the "smallest.gt1" pattern) when running the wrong version. Should I aim for a system.gt1 that works as widely as possible at the expense of performance? If you try compiling with make ROM=dev7, you'll clearly see how much faster the resulting system.gt1 goes. Any suggestion?

p.s.2 -- I changed the idiom "-rom=vx0 -cpu5" to simply be "-rom=v6--". This is simpler because it identifies the executable as v6 and because this is exactly what it produces: a v6 executable that does not use the v5a instructions CMPHU/CMPHS.
lb3361
Posts: 360
Joined: 17 Feb 2021, 23:07

Re: ROM adventures

Post by lb3361 »

6- Cardboot adjustments

Following axelb's post above, I changed the DEV7ROM Cardboot to first search for a file named SYSTEM7.GT1 then default to SYSTEM.GT1. I also changed the Gigatron OS https://github.com/lb3361/gigatron-os/tree/master/sys1 to generate both version:
  • SYSTEM7.GT1 is optimized for DEV7ROM but ignored by other ROMs.
  • SYSTEM.GT1 is compiled for the pseudo-rom "-rom=v6--" which works like v6 but avoids the CMPHU/CMPHS opcodes that have been displaced in ROMvX0. It is expected that this version works on a maximum number of machines but at the expense of a bit of performance displaying directories.
Note that the long term plan was to have a system file that resides in bank3 and offers various services (syscalls) to running programs. The beginning of this effort can be seen in https://github.com/lb3361/gigatron-os/t ... ter/oscall. However it quickly appeared that having a more C-friendly vCPU would be necessary. I always thought this would be ROMvX0 at some point, only to be disappointed by the outcome of the ROMv6 thread. vCPU7 certainly fits the bill, but the overall situation is regrettable.
Last edited by lb3361 on 18 Feb 2023, 15:46, edited 1 time in total.
Post Reply