10MHz, 12.5MHz and Beyond!

Using, learning, programming and modding the Gigatron and anything related.
Forum rules
Be nice. No drama.
Sugarplum
Posts: 43
Joined: 30 Sep 2020, 22:19

Re: 10MHz, 12.5MHz and Beyond!

Post by Sugarplum »

lb3361 wrote: 28 Oct 2021, 14:28 It is probably impossible to split vCPU instructions into 7 cycle subunits. Maybe one could use a FIFO chip (as in the Video Repeater). The Gigatron would fill the FIFO at the beginning of each scanline, and the FIFO would deliver the pixels on time for the VGA screen. That way, all the inter-pixel time is consolidated in a single chunk that can be used to run the vCPU...

That is partly why I'd suggest going with a shadowed ROM-based CU. Because then you could add a couple more registers (by changing the control matrix in ROM) and keep the states during the pixels. I believe the biggest problem with interleaving code with pixels is not enough registers. Putting things into perspective, while 7-9 cycles between the pixels would be generous, if you have to take 4+ cycles to save the state and 4+ to set back up for the port, then you either overrun the time constraint or you have no time for actual work. But if you have more registers, such as 2 index register pairs instead of 1, that would mostly eliminate state changes. The port could have a dedicated register during the lines and the other pair could do the usable work. During porches, a 2nd index pair would be handy to boost things, though the reason for adding them would be more for making it possible to do more vCPU stuff between pixels.

And yes, a FIFO would help like you said, to consolidate the pixel time to where non-pixel time is also consolidated. I guess the transfer window would be separate from the sync window, and external circuitry could pull it together.

Also, the snooping video controller idea could help. If you could just monitor the bus to get the data, one wouldn't need bit-banging at all. Though I suspect that the faster the CPU clock, the more challenging it would be to create a snooping controller that can keep up. If it could keep everything in FPGA register memory (BRAM), then it would be easier than dealing with an SRAM external to it (latency and number of ports).
Sugarplum
Posts: 43
Joined: 30 Sep 2020, 22:19

Re: 10MHz, 12.5MHz and Beyond!

Post by Sugarplum »

While I don't see 100-125 Mhz as practical on a hardwired Gigatron, I think I know how to get it a little closer. If you have a shadowed LUT control unit, you could try to simplify the AND/OR of an immediate done to RAM and sent to OUT. A 2-bit mini "ALU" might be an option. To create syncs, you only need to toggle bits 6-7. Or change the instruction to a "set" instruction. So the low memory bits and the high immediate bits are the only ones touched for 2 specific instructions. Thus bit-banging syncs would still be as easy as they are now, even if the instruction set changes to disallow the memory and the ALU to be chained.

Another consideration could be splitting the AU and the LU to reduce latency. So maybe some 2 ns gates could do the logic and insert less latency for memory+logic ops. But you'd still have the AU+memory latency, and a split ALU might only help minimally here. Still, a shadowed LUT CU would make things more flexible. Then you could make its role more complex and be able to do things like adding more registers or splitting the AU and LU from the ALU without worrying about latency.
Sugarplum
Posts: 43
Joined: 30 Sep 2020, 22:19

Re: 10MHz, 12.5MHz and Beyond!

Post by Sugarplum »

The more I think of it, a 4-stage Gigatron compatible CPU could allow for a theoretical maximum of 100-120 Mhz. Looking at the instructions, the only write operations on RAM are simple stores. The reads can have ALU operations done on them. So stage 1 would be Fetch, 2 would be Decode, 3 would be for memory access, and 4 would be ALU. On reads, stage 3 would get the data, and stage 4 would compute with it if needed. Writes can be done in stage 3 while stage 4 could be idle.

This approach may open possibilities. For instance, if there are new registers (building on the LUT-based CU), some instructions could do simultaneous memory ops and register ops. You could load (or store) 1 thing from memory and work on something else in the otherwise unused ALU slot in the next cycle.

Somewhat unrelated, but something that can be tied in is the secondary execution unit idea I've floated around. That would be used with opcodes that use no operand in order to make the ROM more compact. Care would need to be taken to not slow the critical path and to not compete for resources. I'm not sure if it should have RAM access. Or alternately, it could have its own RAM, though I'm not sure there are enough operand-free instructions to justify that, let alone justify a 2nd control unit. And of course, if it does have RAM access, it could cut into the critical path on both sides due to maybe an additional multiplexer. If that existed, a different ROM would be needed, and to make more use of it, some of the register-only instructions that exist could be used instead of operand versions that do the same thing. For instance, instead of a load that loads 0, XOR Ac, Ac could be used instead to free up the operand slot for an instruction.

With the 100+ Mhz idea, there would be 15 cycles between pixels. Plus more registers could allow vCPU to run then. And if the secondary thing would work, that should boost power too. And of course, the bus-snooping video coprocessor could be used. In that case, Out could be removed entirely if the new controller handles sound and lights too and if that instruction space is needed. Now, going that fast, the coprocessor would need quite a pipelined input memory unit so that won't affect the critical path of the main CPU or cause any data loss from not keeping up.
Post Reply