Re: 10MHz, 12.5MHz and Beyond!
Posted: 25 May 2021, 21:33
To go beyond a little past 12 Mhz, that is where you start to rethink other things.
Redesigning the ALU to use a carry-skip adder configuration would likely help because that would allow both carry possibilities to be calculated at the same time with a minor switching delay rather than calculating the nybbles in series. That adds only 2 more chips. Once you get to 20 Mhz, you'd likely need to make adders yourself using many of the fastest AND and XOR gates such as Drass is using in his 100 Mhz TTL/CMOS 6502 project. But that would not be 5v tolerant at all. Once you get really fast, you'd want to split the arithmetic and logic functions into 2 separate units. Or you could distribute the math/logic with even more chips where each operation has its own circuits.
Then you reach the idea of adding another pipeline. That would require a new ROM and inserting registers into the control lines to separate the control unit from the ALU. Theoretically, that should allow for up to 50% more speed, depending on how balanced your latencies are between the stages. By that point, your ROM will be the limiting factor.
When ROM becomes a limiting factor, one can add the fastest 16-bit SRAM and have a circuit to copy the ROM to the SRAM on boot and then execute out of the ROM shadow SRAM. So with a 3-stage pipeline and 40 ns SRAM shadowing the ROM, that would put you closer to 25 Mhz. If you are not afraid of 3.3v and lower voltages or SMTs, you might find 8-10 ns SRAM (with a theoretical maximum of 100-120 Mhz, depending on the other stages).
If one is not interested in native mode compatibility, they might be able to rework the ISA to help simplify the control unit. That would be incompatible with what we have, but you could still have vCPU compatibility. Finding a way around chaining decoders would be desirable. If you can't avoid that, then maybe one could borrow a cue from the carry-skip adder arrangement and calculate multiple values at the same time and use a "switch" (multiplexer) to put the correct one on the bus as determined by the earlier decoder.
Of course, the clock rate is not everything. Other speedups will be fruitful. The line repeater would allow you to use mode 4 all the time. Separating the video generation from the CPU would be helpful too. Doing that will increase performance at lower clock rates, but depending on how you do it, that could limit higher clock rates (unless you get more sophisticated with caches). More usable native opcodes would help gain speed through improving code density. Adding more registers and instructions that work with multiple data would speed up things, as would being able to run multiple instructions at the same time. More cores could help, but that depends on the software.
Redesigning the ALU to use a carry-skip adder configuration would likely help because that would allow both carry possibilities to be calculated at the same time with a minor switching delay rather than calculating the nybbles in series. That adds only 2 more chips. Once you get to 20 Mhz, you'd likely need to make adders yourself using many of the fastest AND and XOR gates such as Drass is using in his 100 Mhz TTL/CMOS 6502 project. But that would not be 5v tolerant at all. Once you get really fast, you'd want to split the arithmetic and logic functions into 2 separate units. Or you could distribute the math/logic with even more chips where each operation has its own circuits.
Then you reach the idea of adding another pipeline. That would require a new ROM and inserting registers into the control lines to separate the control unit from the ALU. Theoretically, that should allow for up to 50% more speed, depending on how balanced your latencies are between the stages. By that point, your ROM will be the limiting factor.
When ROM becomes a limiting factor, one can add the fastest 16-bit SRAM and have a circuit to copy the ROM to the SRAM on boot and then execute out of the ROM shadow SRAM. So with a 3-stage pipeline and 40 ns SRAM shadowing the ROM, that would put you closer to 25 Mhz. If you are not afraid of 3.3v and lower voltages or SMTs, you might find 8-10 ns SRAM (with a theoretical maximum of 100-120 Mhz, depending on the other stages).
If one is not interested in native mode compatibility, they might be able to rework the ISA to help simplify the control unit. That would be incompatible with what we have, but you could still have vCPU compatibility. Finding a way around chaining decoders would be desirable. If you can't avoid that, then maybe one could borrow a cue from the carry-skip adder arrangement and calculate multiple values at the same time and use a "switch" (multiplexer) to put the correct one on the bus as determined by the earlier decoder.
Of course, the clock rate is not everything. Other speedups will be fruitful. The line repeater would allow you to use mode 4 all the time. Separating the video generation from the CPU would be helpful too. Doing that will increase performance at lower clock rates, but depending on how you do it, that could limit higher clock rates (unless you get more sophisticated with caches). More usable native opcodes would help gain speed through improving code density. Adding more registers and instructions that work with multiple data would speed up things, as would being able to run multiple instructions at the same time. More cores could help, but that depends on the software.