I don't understand Ruby much at all, but I am well enough versed in vASM and GCL that I can see your code structure and follow it's low level flow; and so far your code looks great.
- You already seem to have mastered vASM low level coding tricks, e.g. using LDI where possible, inc high bytes of zero page variables, etc.
- I'm a little confuzzled by XORWI, is it a macro that saves/restores vAC and performs the XOR with an immediate 16 bit value? Obviously it can't be an extended vCPU instruction or wrapped SYS function as your code runs on current real hardware.
- I assume your Ruby macro's are emitting the DEF and RET instructions appropriately?
- I personally would try to batch SYS function calls to reduce the amount of SYS preamble that is effectively dead code, e.g.
Code: Select all
;batch non flipped sprites
LDWI SYS_Sprite6_v3_64
STW sysFn
CALL spriteNoFlip0
...
CALL spriteNoFlipN
- Zero page usage; zero page RAM is one of the most contested resources on the Gigatron, your code's global vars, function pointers and stack have to share it with system vars, system constants and system scratch, (e.g. VBlank temps in ROMv5a and above). With the current model you are using, everything is awesome until it is not, I generally find that I run out of zero page RAM before I run out of main RAM, (or main RAM becomes too fragmented), because of global var and func pointers growth; allowing for less often used functions that don't use page zero pointers can be a real life saver. e.g. the LDWI/CALL pattern or CALLI.
- Your code seems to be following the GCL convention of DEF functions first and main code later, there are other ways of organising your memory map, (obviously there is no right or wrong in this discussion, only what works); so I am not trying to dissuade you from the methodology that you have chosen, just offering alternative options.
In my projects I prioritise Gigatron resources, RAM size, RAM fragmentation, vCPU cycles and zero page RAM usage in the following priority, (especially as the project gets bigger and starts to approach the boundaries of the default memory map):
Code: Select all
- allocate largest RAM fragments to largest contiguous RAM data structures >96 bytes, e.g. arrays,
LUT's, etc).
- allocate largest RAM fragments to largest most frequently accessed code blocks, e.g. main loop,
graphics loops, etc.
- allocate offscreen video memory, (96 byte fragments), to initialisation code, functions <=96 bytes
and contiguous data <= 96 bytes, e.g. most functions, small data structures, (strings and small
arrays/LUT's, etc).
- organise code to reduce the amount of page jumps required.
- organise code and data to reduce the amount of RAM fragmentation.
- write code to save vCPU size and cycles wherever possible, e.g. self modifying code instead of
multiple functions.
- dead code elimination, e.g. SYS function preambles, page jumps, etc.
- move func pointers out of zero page RAM as I require more global vars and rely on LDWI/CALL and
CALLI to call functions.
Some final thoughts for expanded/upgraded Gigatron's; ideally every Gigatron owner would upgrade their ROM's and RAM and we could write code without the current limitations set as default, but this is an unlikely scenario so mostly just spitballing on my part, (although I do provide some of these features as pragmas in my compiler already):
- Multiple ROM versions of the code, (if possible), to take advantage of newer features, e.g. CALLI, (which can be a massive help in reducing dead code, RAM usage and RAM fragmentation). CALLI support would probably require wrapping/macros of your page jumps/function calls, which could be completely non trivial as it could cause major changes to your overall memory map, (i.e. code and data relocation).
- 64K RAM memory model version of the code; the 64K memory map is less constrained than the default 32K memory model and opens up a wider choice of memory maps, much larger contiguous data regions and potential greater code efficiency by not requiring as much dead code as the 32K memory model.
- Code that uses enhanced ROM's through SYS function extensions; there is massive scope for providing accelerated functionality to the vCPU, GCL and BASIC programming models by embedding expensive vCPU code into native functions. Marcel showed the way with some of his great accelerated functions in ROMv2 and above, (Mode, Sprite, Fill, etc), and this could be expanded even further with SYS functions for expensive arithmetic, (* / mod), generic memcpy, line/circle, etc. My guesstimate is that native SYS functions end up being about an order of magnitude faster than the equivalent vCPU/GCL code, (even taking into account 16bit and 8 bit programming models), and that overall the raw processing speed of Native code in a video mode such as Mode 2 is roughly equivalent to an original Acorn Archimedes.
When you consider how much work the native code is doing, (bit-banging input, audio, video, and then interpreting vCPU/6502 at the application level on top), it's astonishing what this 70's tech tiny bit of hardware is capable of.