lb3361 wrote: ↑28 Oct 2021, 14:28 It is probably impossible to split vCPU instructions into 7 cycle subunits. Maybe one could use a FIFO chip (as in the Video Repeater). The Gigatron would fill the FIFO at the beginning of each scanline, and the FIFO would deliver the pixels on time for the VGA screen. That way, all the inter-pixel time is consolidated in a single chunk that can be used to run the vCPU...
That is partly why I'd suggest going with a shadowed ROM-based CU. Because then you could add a couple more registers (by changing the control matrix in ROM) and keep the states during the pixels. I believe the biggest problem with interleaving code with pixels is not enough registers. Putting things into perspective, while 7-9 cycles between the pixels would be generous, if you have to take 4+ cycles to save the state and 4+ to set back up for the port, then you either overrun the time constraint or you have no time for actual work. But if you have more registers, such as 2 index register pairs instead of 1, that would mostly eliminate state changes. The port could have a dedicated register during the lines and the other pair could do the usable work. During porches, a 2nd index pair would be handy to boost things, though the reason for adding them would be more for making it possible to do more vCPU stuff between pixels.
And yes, a FIFO would help like you said, to consolidate the pixel time to where non-pixel time is also consolidated. I guess the transfer window would be separate from the sync window, and external circuitry could pull it together.
Also, the snooping video controller idea could help. If you could just monitor the bus to get the data, one wouldn't need bit-banging at all. Though I suspect that the faster the CPU clock, the more challenging it would be to create a snooping controller that can keep up. If it could keep everything in FPGA register memory (BRAM), then it would be easier than dealing with an SRAM external to it (latency and number of ports).