Using SYS calls (when possible) is just a way to read/write more pixels in the same time.
But then you have to account for the setup time, that is, deciding and specifying which pixels to copy.
There should be an opcode NCOPY in ROMvX0 that is slower than SYS_CopyMemory but might have smaller setup times. I wrote it. Alas I do not remember its peak performance. Looking at the code it seems to achieve 8 pixels per scanline, two thirds of the peak speed of SYS_CopyMemory. But when the chunk sizes are small, the setup times and the details matter a lot, and NCOPY might become beneficial.
Code: Select all
# pc = 0x23cd, Opcode = 0xcd
# Instruction NCOPY (lb3361): copy n bytes from [vAC] to [vDST]. vAC+=n. vDST+=n
label('NCOPY')
Code: Select all
7967 # Instruction COPYN (35 cf nn)
7968 # * Copy nn bytes from [T3] to [T2].
7969 # * Handles page crossings. Peak rate 10 bytes/scanline.
7970 # * On return, T3 and T2 contain the next addresses.
7971 # * N:Cycles 1:58 2:84 3:110 4:84 5:112 6:138 7:164 8:138
7972 # * Origin: this is an improved version of the copy
7973 # opcode I wrote for ROMvX0.
7974 oplabel('COPYN_v7')