Programming in native assembler

HGMuller · Post by **HGMuller** » 22 May 2018, 05:02

As I plan to write a chess program for the Gigatron, speed is very important to me. I don't want to lose a factor on the interpretation of an intermediate language like GCL, so I directly want to write native code, to put it in the EPROM.

This of course burdens me with the task of video generation in parallel with the thing I actually want to program. This task can be minimized by making most of the lines in the image black. With squares of 9x9 pixels a chess board only measures 72 pixels vertically. And by resorting to 1-out-of-4 rendering while the program is thinking, would require rendering only 72 scan lines out of 520 (including the retrace), so that 86% of the lines can be used for calculation. An alternative would be to stop generating video at all during calculations, require the monitor to re-sync when the program is done thinking, and wants to display its move.

The nasty thing about the vide generation is that the program has to count the cycles of its own execution to be able to generate the video sync pulses at the right time. For this the program has to be broken up in code sections of fixed execution time (i.e. without branches, or branching in such a way that all possible paths from beginning to end take an equal number of instructions). For each such code section it has to advance a cycle counter by the corresponding number of cycles. And it will also have to test frequently whether there is enough time left to execute the upcoming code section before the next sync pulse is due.

To facilitate this, I equiped the assembler with a pseudo-instruction SYNC <label>. This instruction will be expanded into the code

Code: Select all

L1:    LD L1 - <label>
       ADD [cycleBudget]
       ST [cycleBudget]
       BLT VideoHandler
       LD L2
L2:

By preceding every code section with such a SYNC instruction, referring to a <label> directly after the code section, this deducts the number of cycles required for the code section (plus this testing overhead) from the cycleBudget. When the cycleBudget underflows it will jump to the video handler, passing the information on where to resume the code in the accumulator. (Remember that the instruction after a branch is aways executed.)

The VideoHandler can then add the duration of the aborted code section back to the cycleBudget, and 'delay away' any remaining budget before generating the video sync pulse. As it got L2 passed to it, it also knows the address of L1 (namely L2 - 5), and can use the single-cycle subroutine trick for executing the LD instruction there. Like

Code: Select all

VideoHandler:
       STA [returnAddress]
       SUB 5
       BRA A
       BRA V1
V1:  SUB [cycleBudget]

This strategy aims at minimizing the overhead for testing, assuming that the tests will usually fail, as the code sections mostly are short compared to a scan line. Under such conditions as much code as possible should be moved to the (rarely executed) VideoHandler.

Note that the generated testing code spoils the accumulator contents, and that the VideoHandler is likely to spoil X and Y (as occasionally it will have to render a scan line, in addition to emitting a sync pulse). The programmer should be aware of this, and not assume any register will be the same after a SYNC directive.