Parsec - I'm making a game

Post by **at67** » 27 Jun 2020, 01:31

blaknite wrote: ↑26 Jun 2020, 11:41 If you spy any patterns I've been using that have more optimal approaches, I'd love the feedback

I don't understand Ruby much at all, but I am well enough versed in vASM and GCL that I can see your code structure and follow it's low level flow; and so far your code looks great.

- You already seem to have mastered vASM low level coding tricks, e.g. using LDI where possible, inc high bytes of zero page variables, etc.

- I'm a little confuzzled by XORWI, is it a macro that saves/restores vAC and performs the XOR with an immediate 16 bit value? Obviously it can't be an extended vCPU instruction or wrapped SYS function as your code runs on current real hardware.

- I assume your Ruby macro's are emitting the DEF and RET instructions appropriately?

- I personally would try to batch SYS function calls to reduce the amount of SYS preamble that is effectively dead code, e.g.

Code: Select all

;batch non flipped sprites
        LDWI    SYS_Sprite6_v3_64
        STW     sysFn
        CALL    spriteNoFlip0
        ...
        CALL    spriteNoFlipN

- If batching is not possible, (and it won't be in a lot of cases), then maybe a more intelligent caching system, that caches system variables like sysFn and sysArgs and therefore only generates code when actual system variables change. I have only just begun to implement a system like this in my compiler and so far I have found it non-trivial to implement.

- Zero page usage; zero page RAM is one of the most contested resources on the Gigatron, your code's global vars, function pointers and stack have to share it with system vars, system constants and system scratch, (e.g. VBlank temps in ROMv5a and above). With the current model you are using, everything is awesome until it is not, I generally find that I run out of zero page RAM before I run out of main RAM, (or main RAM becomes too fragmented), because of global var and func pointers growth; allowing for less often used functions that don't use page zero pointers can be a real life saver. e.g. the LDWI/CALL pattern or CALLI.

- Your code seems to be following the GCL convention of DEF functions first and main code later, there are other ways of organising your memory map, (obviously there is no right or wrong in this discussion, only what works); so I am not trying to dissuade you from the methodology that you have chosen, just offering alternative options.

In my projects I prioritise Gigatron resources, RAM size, RAM fragmentation, vCPU cycles and zero page RAM usage in the following priority, (especially as the project gets bigger and starts to approach the boundaries of the default memory map):

Code: Select all

- allocate largest RAM fragments to largest contiguous RAM data structures >96 bytes, e.g. arrays,
LUT's, etc).
	
- allocate largest RAM fragments to largest most frequently accessed code blocks, e.g. main loop,
graphics loops, etc.
	
- allocate offscreen video memory, (96 byte fragments), to initialisation code, functions <=96 bytes
and contiguous data <= 96 bytes, e.g. most functions, small data structures, (strings and small
arrays/LUT's, etc).
	
- organise code to reduce the amount of page jumps required.
	
- organise code and data to reduce the amount of RAM fragmentation.
	
- write code to save vCPU size and cycles wherever possible, e.g. self modifying code instead of
multiple functions.
	
- dead code elimination, e.g. SYS function preambles, page jumps, etc.
	
- move func pointers out of zero page RAM as I require more global vars and rely on LDWI/CALL and
CALLI to call functions.

Some final thoughts for expanded/upgraded Gigatron's; ideally every Gigatron owner would upgrade their ROM's and RAM and we could write code without the current limitations set as default, but this is an unlikely scenario so mostly just spitballing on my part, (although I do provide some of these features as pragmas in my compiler already):

- Multiple ROM versions of the code, (if possible), to take advantage of newer features, e.g. CALLI, (which can be a massive help in reducing dead code, RAM usage and RAM fragmentation). CALLI support would probably require wrapping/macros of your page jumps/function calls, which could be completely non trivial as it could cause major changes to your overall memory map, (i.e. code and data relocation).

- 64K RAM memory model version of the code; the 64K memory map is less constrained than the default 32K memory model and opens up a wider choice of memory maps, much larger contiguous data regions and potential greater code efficiency by not requiring as much dead code as the 32K memory model.

- Code that uses enhanced ROM's through SYS function extensions; there is massive scope for providing accelerated functionality to the vCPU, GCL and BASIC programming models by embedding expensive vCPU code into native functions. Marcel showed the way with some of his great accelerated functions in ROMv2 and above, (Mode, Sprite, Fill, etc), and this could be expanded even further with SYS functions for expensive arithmetic, (* / mod), generic memcpy, line/circle, etc. My guesstimate is that native SYS functions end up being about an order of magnitude faster than the equivalent vCPU/GCL code, (even taking into account 16bit and 8 bit programming models), and that overall the raw processing speed of Native code in a video mode such as Mode 2 is roughly equivalent to an original Acorn Archimedes.

When you consider how much work the native code is doing, (bit-banging input, audio, video, and then interpreting vCPU/6502 at the application level on top), it's astonishing what this 70's tech tiny bit of hardware is capable of.

blaknite · Post by **blaknite** » 27 Jun 2020, 11:55

Thanks at67! I used some of your code and others in the contrib section to as a reference. Glad you approve

I'm a little confuzzled by XORWI

Yep, it's a macro that saves/restores vAC and does an xorw. I wrote macros for all of the word instructions. I use them if there's not a more optimal way to write the code. Just cuts down the number of lines a little.

I assume your Ruby macro's are emitting the DEF and RET instructions appropriately?

Indeed! The macros also add in a bunch of labels.

I personally would try to batch SYS function calls ... If batching is not possible ...

This sounds like a way I could save a bunch of cycles. I've done a little bit of batching but I reckon there's more opportunity. Some caching sounds like it could be worth doing. Thanks for the tip!

Zero page usage; zero page RAM is one of the most contested resources on the Gigatron ... With the current model you are using, everything is awesome until it is not ...

I'm quickly realising this! I originally had the star positions in the zero page because it was easy but I realised this was using up a bunch of very important space that would be better used for other things. I've since moved them out into the 96-byte spaces next to the screen memory.

... allowing for less often used functions that don't use page zero pointers can be a real life saver. e.g. the LDWI/CALL pattern or CALLI.

CALLI is a ROMv5 instruction yeah? I figured I'd avoid those for now and go with what's going to ship with my kit. Does the LDWI/CALL pattern mean not having a pointer for every function and instead loading a pointer into the zero page right before call?

In my projects I prioritise Gigatron resources ... in the following priority

I hadn't yet thought of moving functions into the offscreen video memory. That makes a lot of sense. I have been loading my sprites into that space. I've found that space hardest to work with due to the fragmentation. Most likely I'll optimise my programs to use more of it as they start to take shape.

You mention reducing page jumps, does this mean using branches to call subroutines instead of CALL? I can certainly see that reducing a lot of cycles.

Self-modifying code is something I've not yet delved into. I reckon I could use that to reduce the code duplication in the ship drawing functions. I've been looking for a way to dry those up. This might just be what I was after!

Code that uses enhanced ROM's through SYS function extensions

This has me really interested. I've found the accelerated functions to be invaluable for building a game with sprites. Without those sprite sys functions I don't think I could get enough performance out of the Gigatron to pull it off. I'd be keen to contribute some useful accelerated functions to the ROM as they come to mind.

When you consider how much work the native code is doing, (bit-banging input, audio, video, and then interpreting vCPU/6502 at the application level on top), it's astonishing what this 70's tech tiny bit of hardware is capable of.

This has me amazed as well! I have an unfinished TTL breadboard computer I designed myself that doesn't have anywhere near the same capability despite the similarities in design. For example, bit-banging VGA wasn't even something I'd considered but it's something that makes the Gigatron really fun to tinker with.

I saw my kit finally left the Netherlands on Friday and local postage services here in Australia are running smooth, all things considered. Hopefully that means I won't have to wait too much longer to get my hands on real hardware

Post by **at67** » 28 Jun 2020, 07:16

blaknite wrote: ↑27 Jun 2020, 11:55 CALLI is a ROMv5 instruction yeah?

Yup, currently only ROMv5a and DEVROM support it.

blaknite wrote: ↑27 Jun 2020, 11:55 I figured I'd avoid those for now and go with what's going to ship with my kit. Does the LDWI/CALL pattern mean not having a pointer for every function and instead loading a pointer into the zero page right before call?

Exactly.

Code: Select all

; ROMv1 <-> ROMv4
		LDWI    label
		CALL    vAC
		...
                       
label		BLAH
		RET
		
; ROMv5a, DEVROM
		CALLI   label
		...
                       
label		BLAH
		RET

This pattern is used for both call's and 16bit jumps, PUSH or absence of PUSH decides which is which. If you do a 16bit jump within a subroutine, (which can be a common occurrence for big subroutines or once you are running code in highly fragmented memory), then you need to save vLR first, i.e. PUSH and then POP just before the RET. I tend to put my PUSH at the start of the sub and the POP at the end of the sub, rather than wrapping the CALL itself; this allows me to not worry if I need to add more CALL instructions to the sub body.

Code: Select all

		CALL	bigSub		; set vLR
		...

bigSub		PUSH			; save vLR
		BLAH
		...
		OOPS			; sub is bigger than current memory fragment, so extend it
		LDWI	bigSubExt	; 16bit jump to rest of bigSub
		CALL	vAC
                
bigSubExt	BLAH			; the missing PUSH here means will we return to the original root CALL bigSub, rather than the jump bigSubExt	
		POP			; restore vLR
		RET			; return to contents of vLR

blaknite wrote: ↑27 Jun 2020, 11:55 You mention reducing page jumps, does this mean using branches to call subroutines instead of CALL? I can certainly see that reducing a lot of cycles.

Do you mean setting vLR yourself and then branching? Because I can't see that being more optimal. I meant relocating your code and data optimally so that every sub, lut, string, loop, etc all fit in their most efficient areas/fragments as possible. e.g. using the subBig example above, if you could find an area of memory to contain it in whole, (e.g. 0x0200 to 0x06FF), then you could do away with the 16bit jump to subBigExt and just coalesce the two together. This is a massive issue in my compiler as I allow the programmer to write code as if the memory map is completely contiguous, so probably won't affect you anywhere near as much.

blaknite wrote: ↑27 Jun 2020, 11:55 Self-modifying code is something I've not yet delved into. I reckon I could use that to reduce the code duplication in the ship drawing functions. I've been looking for a way to dry those up. This might just be what I was after!

That looks like a perfect use case for it indeed.

blaknite wrote: ↑27 Jun 2020, 11:55 I'd be keen to contribute some useful accelerated functions to the ROM as they come to mind.

I'm not sure of what other tools are available, but my assembler specifically allows you to mix vCPU and native code for the express purpose of coding and testing SYS functions. The assembler will produce .gt1 files, (they really should be named something else as per Marcel's documentation), with embedded native segments that can be loaded into ROM, which obviously can't be used on real hardware, only emulation, (currently only my emulator supports this feature I think).

Checkout Contrib\at67\gasm\clearscreen1_ntv.gasm and Contrib\at67\gasm\sprites if you're interested as to how I went about it.

blaknite wrote: ↑27 Jun 2020, 11:55 I saw my kit finally left the Netherlands on Friday and local postage services here in Australia are running smooth, all things considered. Hopefully that means I won't have to wait too much longer to get my hands on real hardware

It's one thing seeing your code in an emulator, but a totally different experience seeing it run on actual hardware, it always gives me a buzz.

blaknite · Post by **blaknite** » 13 Jul 2020, 01:55

My kit arrived last week and I've assembled it and got my code running on real hardware. I'm absolutely loving the Gigatron as an ecosystem. As someone who gets real excited when their code makes things happen on a screen, the VGA graphics mode is really awesome. It's got me more enthused than other kits I've built.

Post by **at67** » 13 Jul 2020, 03:37

blaknite wrote: ↑13 Jul 2020, 01:55 My kit arrived last week and I've assembled it and got my code running on real hardware. I'm absolutely loving the Gigatron as an ecosystem. As someone who gets real excited when their code makes things happen on a screen, the VGA graphics mode is really awesome. It's got me more enthused than other kits I've built.

I have a bunch of retro systems and kits as well, (development consoles, retro PC's, etc), none of them provide the satisfaction of the Gigatron; whether it's the missing CPU, the insane bit-banging architecture, the severe masochistic limitations of it's memory map or just seeing how far I can push it; I am not sure.

Suffice to say, it's insanely fun.

Gigatron Hackers

Parsec - I'm making a game

Re: Parsec - I'm making a game

Re: Parsec - I'm making a game

Re: Parsec - I'm making a game

Re: Parsec - I'm making a game

Re: Parsec - I'm making a game