10MHz, 12.5MHz and Beyond!

Using, learning, programming and modding the Gigatron and anything related.
Forum rules
Be nice. No drama.
Sugarplum
Posts: 93
Joined: 30 Sep 2020, 22:19

Re: 10MHz, 12.5MHz and Beyond!

Post by Sugarplum »

lb3361 wrote: 28 Oct 2021, 14:28 It is probably impossible to split vCPU instructions into 7 cycle subunits. Maybe one could use a FIFO chip (as in the Video Repeater). The Gigatron would fill the FIFO at the beginning of each scanline, and the FIFO would deliver the pixels on time for the VGA screen. That way, all the inter-pixel time is consolidated in a single chunk that can be used to run the vCPU...

That is partly why I'd suggest going with a shadowed ROM-based CU. Because then you could add a couple more registers (by changing the control matrix in ROM) and keep the states during the pixels. I believe the biggest problem with interleaving code with pixels is not enough registers. Putting things into perspective, while 7-9 cycles between the pixels would be generous, if you have to take 4+ cycles to save the state and 4+ to set back up for the port, then you either overrun the time constraint or you have no time for actual work. But if you have more registers, such as 2 index register pairs instead of 1, that would mostly eliminate state changes. The port could have a dedicated register during the lines and the other pair could do the usable work. During porches, a 2nd index pair would be handy to boost things, though the reason for adding them would be more for making it possible to do more vCPU stuff between pixels.

And yes, a FIFO would help like you said, to consolidate the pixel time to where non-pixel time is also consolidated. I guess the transfer window would be separate from the sync window, and external circuitry could pull it together.

Also, the snooping video controller idea could help. If you could just monitor the bus to get the data, one wouldn't need bit-banging at all. Though I suspect that the faster the CPU clock, the more challenging it would be to create a snooping controller that can keep up. If it could keep everything in FPGA register memory (BRAM), then it would be easier than dealing with an SRAM external to it (latency and number of ports).
Sugarplum
Posts: 93
Joined: 30 Sep 2020, 22:19

Re: 10MHz, 12.5MHz and Beyond!

Post by Sugarplum »

While I don't see 100-125 Mhz as practical on a hardwired Gigatron, I think I know how to get it a little closer. If you have a shadowed LUT control unit, you could try to simplify the AND/OR of an immediate done to RAM and sent to OUT. A 2-bit mini "ALU" might be an option. To create syncs, you only need to toggle bits 6-7. Or change the instruction to a "set" instruction. So the low memory bits and the high immediate bits are the only ones touched for 2 specific instructions. Thus bit-banging syncs would still be as easy as they are now, even if the instruction set changes to disallow the memory and the ALU to be chained.

Another consideration could be splitting the AU and the LU to reduce latency. So maybe some 2 ns gates could do the logic and insert less latency for memory+logic ops. But you'd still have the AU+memory latency, and a split ALU might only help minimally here. Still, a shadowed LUT CU would make things more flexible. Then you could make its role more complex and be able to do things like adding more registers or splitting the AU and LU from the ALU without worrying about latency.
Sugarplum
Posts: 93
Joined: 30 Sep 2020, 22:19

Re: 10MHz, 12.5MHz and Beyond!

Post by Sugarplum »

The more I think of it, a 4-stage Gigatron compatible CPU could allow for a theoretical maximum of 100-120 Mhz. Looking at the instructions, the only write operations on RAM are simple stores. The reads can have ALU operations done on them. So stage 1 would be Fetch, 2 would be Decode, 3 would be for memory access, and 4 would be ALU. On reads, stage 3 would get the data, and stage 4 would compute with it if needed. Writes can be done in stage 3 while stage 4 could be idle.

This approach may open possibilities. For instance, if there are new registers (building on the LUT-based CU), some instructions could do simultaneous memory ops and register ops. You could load (or store) 1 thing from memory and work on something else in the otherwise unused ALU slot in the next cycle.

Somewhat unrelated, but something that can be tied in is the secondary execution unit idea I've floated around. That would be used with opcodes that use no operand in order to make the ROM more compact. Care would need to be taken to not slow the critical path and to not compete for resources. I'm not sure if it should have RAM access. Or alternately, it could have its own RAM, though I'm not sure there are enough operand-free instructions to justify that, let alone justify a 2nd control unit. And of course, if it does have RAM access, it could cut into the critical path on both sides due to maybe an additional multiplexer. If that existed, a different ROM would be needed, and to make more use of it, some of the register-only instructions that exist could be used instead of operand versions that do the same thing. For instance, instead of a load that loads 0, XOR Ac, Ac could be used instead to free up the operand slot for an instruction.

With the 100+ Mhz idea, there would be 15 cycles between pixels. Plus more registers could allow vCPU to run then. And if the secondary thing would work, that should boost power too. And of course, the bus-snooping video coprocessor could be used. In that case, Out could be removed entirely if the new controller handles sound and lights too and if that instruction space is needed. Now, going that fast, the coprocessor would need quite a pipelined input memory unit so that won't affect the critical path of the main CPU or cause any data loss from not keeping up.
Sugarplum
Posts: 93
Joined: 30 Sep 2020, 22:19

Re: 10MHz, 12.5MHz and Beyond!

Post by Sugarplum »

I think I know mods to do to a near-stock Gigatron. The more I've thought about it, what would happen if we could get rid of the control unit and also combine that with the carry-skip adder, and all the performance mods that have already been done?

The control unit outputs 19 signals. Now, what if you remove the entire CU and the IR and replace that with enough ROMs and registers, then write software to make the new ROMs from the old ROM? So with control signals as the opcodes which are held in registers, you remove the control unit's bottleneck since that would be an inherent part of the fetch. The registers would keep the delay slot.

Such a strategy would not only remove the impact of the control unit (or most of it) and include it as part of the 70 ns delay of the first stage, it could also allow for new instructions in the wider control matrix. You could have more modes with the existing chips since you could create opcodes from the existing signals that don't already exist without rewiring anything beyond creating this design. And if you were to add more chips, you could easily add several more registers, like a 2nd accumulator and another set of index registers.

Now, with the above, you'd need 27 bits, or 19 control lines and 8 data lines. I got to thinking, what do you do with the other 5? They could be used for more registers as suggested above, or they could even be used as more data. If you used one of those to put the Y register on a 2nd bus, then you could have 12-bit operands. Then you would have 4K immediate addressing or store 2 pixels per address (and use a more efficient unpacker).

With this approach, the "wasted opcodes" are irrelevant and take up nothing if they are not used. With a proper assembler for this configuration, you could use control line configurations that are not otherwise possible, allowing for more efficient execution, though this would be less efficient overall in regards to ROM.
Hans61
Posts: 102
Joined: 29 Dec 2020, 16:15
Location: Saxonia
Contact:

Re: 10MHz, 12.5MHz and Beyond!

Post by Hans61 »

I played something with the Gigasaur today. Since the Gigasaur uses the 74F00 family, I used a 50 MHz oscillator.
This means that the Gigatron clone runs at 12.5 MHz. To get an image I used the ROMv3y.
Compared to the ROMv3, the Mandelbrot is about 4 times faster. However, it is still significantly slower than the Mandelbrot in the dev128k ROM at 7.25MHz.
My controller doesn't work with ROMv3 and ROMv3y, only the keyboard.
1-gigasaur-ROMv3y.jpg
1-gigasaur-ROMv3y.jpg (171.87 KiB) Viewed 2356 times
2-gigasaur-12,5MHz-ROMv3y.jpg
2-gigasaur-12,5MHz-ROMv3y.jpg (101.75 KiB) Viewed 2356 times
3-gigasaur-12,5MHz-ROMv3y-mandelbrot.jpg
3-gigasaur-12,5MHz-ROMv3y-mandelbrot.jpg (90.85 KiB) Viewed 2356 times
monsonite
Posts: 101
Joined: 17 May 2018, 07:17

Re: 10MHz, 12.5MHz and Beyond!

Post by monsonite »

Good Work, Hans.
Post Reply