Page 1 of 1

Which sprite/memory functions to add to ROM?

Posted: 02 Jul 2018, 16:21
by marcelk
I would like to add some sprite and/or memory copy functions to an upcoming (very minor) ROM update for the kit. Any suggestions? On the short term (this week), I'm just looking for SYS functions that are well-tested and stable already, preferably no truly new stuff. I’m a bit at loss about the status of these in Contrib:
  • SYS_SpriteRow_118: Does 4x4 block copy?
  • SYS_SpriteCopy_118: What’s the difference? Same?
  • SYS_ClearRow32_56: Looks useful. I can add this..
  • SYS_BlinkyBlast_142: Doesn’t sound too generic... But it is cool.. I'll sleep on it
  • SYS_DrawPixel2x2_32: Small but usefull
Any new insights I missed? For example, optimise on handling 1D sprites with memcpy? Other mechanisms? I haven’t thought through any of this...

BTW: the video loop has been reordered a bit to enable more retro modes and speed levels. This will make it easier (in ROMs even further away in the future) to assign blitter-type of functions to certain scanline types. Now I just use it to get more vCPU cycles, but I see room to use it for other purposes.

Re: Which sprite/memory functions to add to ROM?

Posted: 02 Jul 2018, 16:30
by Cwiiis
I've been a bit busy of recent, but been following the developments on the forum and github and I'm very impressed with everyone's work :)

Of those listed functions, I think a 4x4 sprite blitter would be very useful - I have a GCL version of this that blits 4x4 sprites from packed sprite data ( ... is.gcl#L31), but I'd certainly take the memory hit for the enhanced speed. Even better would be a 4x4 packed sprite blitter though, as I find memory is at a much higher premium than speed on the default 32k system.

If we could go further though, a generic XxY blit function would be amazing, even if it had some strict limitations (on size, multiples, whatever). Some virtual 'DMA' functions would be very useful.

Re: Which sprite/memory functions to add to ROM?

Posted: 03 Jul 2018, 07:50
by at67
marcelk wrote:
02 Jul 2018, 16:21
  • SYS_SpriteRow_118: Does 4x4 block copy?
  • SYS_SpriteCopy_118: What’s the difference? Same?
  • SYS_ClearRow32_56: Looks useful. I can add this..
  • SYS_BlinkyBlast_142: Doesn’t sound too generic... But it is cool.. I'll sleep on it
  • SYS_DrawPixel2x2_32: Small but usefull
- Are the same routine and do a 4x4 copy using an intermediate buffer, this allows you to do the full gamut of sprites from border erasing to complete background save and restore.

- Modifying this routine to do 8 rows by 2 lines and having it called 4 times per sprite is probably a more efficient way of drawing sprites than doing 4 times 4x4, (should save a few vCPU cycles).

- Lets you clear the screen in around 100ms.

- Unless you want a tonne of Pacman clones, not sure about this one going into ROM, if you do put it in, you should probably do the other 3 and Mr Pacman as well.
- I would use this routine for a couple of generic shapes, maybe a few different sized filled circles, rectangles, triangles, missiles etc; some basic shapes that can possibly be used to create other shapes.
- This blit routine is amazingly fast and the fact that it contains the sprite data embedded in the blitter itself is extremely cool, if we had even just a few KBytes of scratch RAM for SYS routines like this one, oh what we could do.
- The system font can have an internal blit routine per character and thus you could print text probably 10-20times faster than we currently do. Once you are printing text that fast, real time character based BASIC games become a reality.
- It's 10x10 with a full black border, it can be optimised to 8x8, as you don't need to draw the full erase border, just one or 2 lines of the appropriate thickness, depending on how many pixels you are stepping and in what direction.
- DrawPixel2x2 was one of my first experiments, I wouldn't use it in it's current form, but an optimised version would be useful for magnified pixels and blocks, (i.e. LIFE, etc).

Re: Which sprite/memory functions to add to ROM?

Posted: 04 Jul 2018, 16:58
by marcelk
Thanks for your thoughts. I think we need to think it through a bit longer.

As a timeline indication, we're aiming for a more substantial ROM update by the end, or after, this summer. That should include embedded BASIC, but also the keyboard mappings needed when hooking up a matrix keyboard by Wattsekunde's scheme. If you have an ATTiny85 dongle for PS/2, you don't really need the BASIC to be embedded in the ROM because the dongle can inject it. But if you have a matrix keyboard, BASIC must be embedded, but it must ALSO understand the mapping. That's why ROM BASIC and keyboard mappings are interdepend. My C=16 keyboard just arrived, so I can start playing with it and compare it to my C=64 keys.

This should give time to think through the sprite business as well. We also have to think about compatibility of GT1 files and the meaning of the romType variable. I have drafted some thoughts on that today. I will put them in another thread for review/sharing later.

In the existing functions I see all kinds of variations in dimensions, source layout and source byte meaning.

For pixel source layouts there are three schemes I can think of:
  • Linear fixed-width (Eg: 4x4 sprite stored in 16 consecutive bytes in same page)
  • Linear variable width (something Racer does: offset per row and zero-terminated streams per row. The sprite source is almost as if it is written in an 8-bit loop-less mini-language that gets executed by a dumb processor)
  • Rectangular (4x4 spite stored over 4 pages). This is useful for saving and restoring a background.
For source byte interpretation there are also some schemes:
  • Byte-sized pixels blind copy
  • Byte-sized pixels with some kind operation added (e.g. transparency, collision detection, palette color mapping). It quickly becomes slow.
  • 6-bit packed pixels (I'm not a fan)
  • 8 pixels per byte (is very flexible with colors)
  • No source image: 1 variable color (good for clearing blocks)
  • No source image, 1 fixed color (black). We should not concern us with this...
Then there is the size variation. One line of thought is to make SYS functions for several fixed sizes in one dimension, and for each source layout of interest. To handle size variation in the other dimension we can let these SYS functions "self-repeat" with a counter, each time going back to vCPU while setting back vPC, causing a restart of the same SYS function for as long as there is more work (and advancing the drawing position at the same time). After all, much of the typical vCPU/GCL speed penalty is loop handling as much as individual byte copying. Racer has a function, SYS_RacerUpdateVideoX_40, that already does something like that. There is a second advantage, and that is that very long SYS functions (>80 cycles?) waste quite a bit of time (on average) waiting for the next scan line. So there should be some sweet spot for the trade-off between SYS-call-maximum-duration and invocations-per-sprite-segment.

[Edit: For now, all of this ignores the idea of doing sprite stuff outside the scope of SYS calls through vCPU...]

Re: Which sprite/memory functions to add to ROM?

Posted: 02 Sep 2018, 21:17
by marcelk
I'm proposing this concept:


Code: Select all

# Extension SYS_Sprite4_v3_54

# sysArgs[0:1] Source address (Yx4 pixels (values 0..63) terminated by byte value -Y)
# sysArgs[2:3] Destination address
# sysArgs[4:7] Scratch (user as copy buffer)
This SYS function draws a sprite of 4 pixels wide and Y pixels high. The pixel data is read sequentually, from RAM, in horizontal chunks of 4 pixels at a time, and written to the screen through the destination pointer (each chunk below the previous), drawing a 4xY stripe with one invocation. Pixel values should be non-negative. The first negative byte N after a chunk signals the end of the sprite data. So the sprite's height Y is determined by the source data and is therefore flexible. This negative byte value, typically N == -Y, is then used to adjust the destination pointer's high byte, to make it easier to draw sprites wider than 4 pixels: just repeat the SYS call for as many 4-pixel wide stripes you need. All arguments are already left in place to facilitate this. After one call, the source pointer will point past that source data, effectively
src += Y * 4 + 1
The destination pointer will have been adjusted as
dst += (Y + N) * 256 + 4
(With arithmetic wrapping around on the same memory page)

Y is only limited by source memory, not by CPU cycles. The implementation is such that the SYS function self-repeats, each time drawing the next 4-pixel chunk. It can typically draw 2x4 pixels per scanline this way. So the user program only sees one SYS call (per stripe), but under the hood the work is split in chunks.
Totally untested implementation:

Code: Select all

              0c00 1124  ld   [$24],x     ;Pixel data source address
              0c01 1525  ld   [$25],y
              0c02 0d00  ld   [y,x]       ;Next pixel or stop
              0c03 f410  bge  .sysDpx0
              0c04 de00  st   [y,x++]
              0c05 8127  adda [$27]       ;Adjust dst for convenience
              0c06 c227  st   [$27]
              0c07 0126  ld   [$26]
              0c08 8004  adda $04
              0c09 c226  st   [$26]
              0c0a 0124  ld   [$24]       ;Adjust src for convenience
              0c0b 8001  adda $01
              0c0c c224  st   [$24]
              0c0d 1403  ld   $03,y       ;Normal exit (no self-repeat)
              0c0e e0cb  jmp  y,$cb
              0c0f 00f2  ld   $f2
.sysDpx0:     0c10 c228  st   [$28]       ;Gobble 4 pixels into buffer
              0c11 0d00  ld   [y,x]
              0c12 de00  st   [y,x++]
              0c13 c229  st   [$29]
              0c14 0d00  ld   [y,x]
              0c15 de00  st   [y,x++]
              0c16 c22a  st   [$2a]
              0c17 0d00  ld   [y,x]
              0c18 de00  st   [y,x++]
              0c19 c22b  st   [$2b]
              0c1a 1126  ld   [$26],x     ;Screen memory destination address
              0c1b 1527  ld   [$27],y
              0c1c 0128  ld   [$28]       ;Write 4 pixls
              0c1d de00  st   [y,x++]
              0c1e 0129  ld   [$29]
              0c1f de00  st   [y,x++]
              0c20 012a  ld   [$2a]
              0c21 de00  st   [y,x++]
              0c22 012b  ld   [$2b]
              0c23 de00  st   [y,x++]
              0c24 0124  ld   [$24]       ;src += 4
              0c25 8004  adda $04
              0c26 c224  st   [$24]
              0c27 0127  ld   [$27]       ;dst += 256
              0c28 8001  adda $01
              0c29 c227  st   [$27]
              0c2a 0116  ld   [$16]       ;Self-repeating SYS call
              0c2b a002  suba $02
              0c2c c216  st   [$16]
              0c2d 1403  ld   $03,y
              0c2e e0cb  jmp  y,$cb
              0c2f 00e5  ld   $e5
If this works out, we could easily add variants that mirror the data in X and/or Y direction while reusing the same pixel source data and reducing RAM pressure. This is for vCPU/GCL programs. There will be no bindings with BASIC initially.

Please shoot...

[Eg: One idea is to go for 6-pixel chunks, and put the destination pointer in vAC..]