Who wants to see a 100 Mhz Gigatron?
Posted: 20 Jan 2022, 04:38
Intro
I figured this needs its own thread. I don't want to clog the 10+ Mhz thread. Over the last couple of years, I've been brainstorming various ways to speed up the Gigatron and have discarded many.
One way to make it faster is to not use actual opcodes, just horizontal microcode, and remove the control unit. That would be harder to program (preferably with macros) and take more ROMs and pipeline registers. That would be a somewhat simple way to even up the pipeline some and get the clock rate up a little since no decoding would be needed. I likely wouldn't do that as it isn't efficient in terms of space. If one wanted to, they could use this approach and shadow it. If you could get all the SRAM to 7-8 ns and shadow everything, it could take you to about 45 Mhz. And if the shadow RAM is fast enough (as fast as a register), one could then remove the delay slot. I wouldn't do that either since that would prevent using any trampoline code.
The biggest latency is in the execution unit, particularly when RAM is used, though the delayed clock helps that, at least at slower clock rates. In the execution unit, the control unit takes about the longest. I had proposed a carry skip adder arrangement for the high nibble, but that would only gain a couple of Mhz (over what has been tried and using those ideas). Even if that doesn't get you to 18 Mhz, 15 Mhz would be more stable than on the test machine. However, if you split the execution unit in half, that should do more to increase the clock rate, and a CSA arrangement would be moot unless drastic design changes are made. Marcel had suggested finding a way to decouple the memory, though nobody really commented on that. Below, I will propose how to do that.
Something to keep in mind at higher speeds is video production. The reason that Marcel put everything on the left side of the screen in the test ROM is that there's currently no easy way to process between pixels. You can't use any meaningful instruction between the pixels. If you go to maybe 100 Mhz, you have 15 cycles between each pixel, making it a necessity to figure out how to use vCPU between the pixels. While some have said to buffer the video output and have circuitry to use it as needed, I'd say to add several more registers (and the needed opcodes). That would give room for both the video context and the vCPU context. So you could then take the time of several pixels for a vCPU instruction, or whatever. Plus I think you could then get rid of restarting the vCPU for instructions that are interrupted.
Design changes needed for going faster
Beyond most of the earlier changes such as using faster parts, more board layers, more board fill, faster diodes, smaller resistors, one would need to rethink the design as a whole. I propose a 4-stage pipeline. So you have Fetch, Decode, Access, and Execute. That would make the pipeline stages take less time and be more balanced. If you keep the 70 ns ROM, a 4-stage pipeline would get you closer to 14 Mhz without other optimizations. But if you use 40 ns for the RAM and ROM with a 4 stage pipeline, then you'd get closer to 25 Mhz. To get 100 Mhz, the slowest stage cannot exceed 10 ns.
The clock
With faster designs, you might want to move beyond a discrete chip clock. I'd propose an oscillator "can" and perhaps a chip to buffer and distribute the signal. The clock splitter would be good in that you could use different voltage chips since you could add resistors and Zener diodes without loading the other lines. The current clock has some ripple from higher harmonics, and for a really fast machine, it would help to have a cleaner signal.
The Native ROM (Fetch)
The native ROM could be shadowed during boot to go from 70 ns to 7-8 ns.
The Control Unit (Decode)
The control unit is one of the slower parts of the Gigatron. You can speed it up some with faster parts. Or one could rearrange the chips with a new opcode map and try to find a faster combination. Now, for a 100 Mhz Gigatron, I'm considering a LUT-based control unit. While that sounds slower, it could be buffered to a 7-8 ns SRAM. What makes LUT-based attractive to me is the ability to arbitrarily create the control signals. A current shortcoming is not being able to process between the pixels when operating at 12.5 Mhz or faster. But 3 more registers would help that. Plus instructions such as shifts and multiplication would help the vCPU efficiency. So designing a LUT-based control unit means you'd be able to add lines to make such things possible. The ALU has only 3 operation lines, and a LUT ALU could have more.
User RAM (Access)
The user RAM cuts into the critical path. The Gigatron uses a Clk2 signal to help mitigate this. In my approach, it can be a pipeline stage. It would be placed before the ALU since the ALU modifies reads. However, no writeback stage is needed. So you can read RAM in stage 3 and use it in stage 4. Writes can be done here too. It might be helpful to find other things to do during this stage since all instructions don't use memory, and we can use memory even less thanks to changes in the 2 surrounding units.
The ALU
A bottleneck you have to keep in mind with faster machines is the ALU. Once you get close to 20 Mhz, you can show some improvement from a Carry-Skip Adder arrangement. When you go even faster, you must rethink your ALU. The new 1G SMD parts in the 74xx family don't include any adder chips. So you'd need to make a different ALU altogether. One way is to use high-speed gates and transparent latches. Drass at 6502.org managed to create a 6.9 ns adder that way. Drass won't be available for a while and my other contact isn't available much. There is another way. If I were to design this, I'd consider a ROM-based LUT for the ALU. At first, you might ask, how will that help if ROMs tend to be slow? That's where I'd consider shadowing the ALU ROM into 7-8 ns SRAM. So that would get rid of the multiplexers and the diodes. Plus if you use a big enough ROM/RAM combo for this, you could even add more ALU functionality. For instance, one could use a ROM with 21 address lines. So you use 16 bits for operands and the other 5 would be control lines. That means you could have up to 32 ops. Since it would be a 16-bit ROM, you could have an 8-bit result and flags. If one wants to add a multiplier, then the upper byte would be needed for the most significant byte. So I guess a multiplexer and a control line would be what's needed to split between a FLAGS register and an upper accumulator (or a sub-lower accumulator for fixed-point division). If division of any sort is added, it might be good to make it only 15 bits max (if fixed-point results are desired) to save the upper bit as an exception/DBZ bit.
Differences in booting
Since most functionality is contained in LUTs, there has to be a way to fill them. The boot mechanism shouldn't be any faster than 14 Mhz (for 70 ns ROMs), and 8-10 Mhz should be fast enough. The largest LUT would be the ALU which could have up to 2 Mb of addresses. So 1/4 second to boot isn't bad. As for how this would work, I imagine one could throw in some multiplexers and a large enough counter. I guess it would need to hold things in reset until complete.
The motherboard
Such a design would like need every motherboard design optimization in the book. One would likely go for 4-layer and maximum fill for sure. I am wondering how far one should go with inter-trace grounding. ATA hard drive cables, for instance, add grounds between all the signal layers for UDMA-50 and higher. So I don't know if one should add vias for SMD chips to where half the traces are on each side with shield traces between them that are grounded on each end. Cross-talk could be an issue at these speeds, and even with good shielding, I don't really see how over 133 Mhz would be possible, again judging by hard drives. UDMA-133 was the fastest ATA interface, and SATA took it to 150 and beyond. I don't think going that fast would be possible. Even if you could get all 4 stages to 7 ns., 140 Mhz would be the theoretical maximum I see. The traces would need to be as short as possible since you add about 1 ns for every 7 inches of trace. If things end up about 9 ns as the worst delay, that means 111 Mhz would be the max. It would be nice to keep the clock at an even multiple of 6.25. 112.5-112.95 Mhz might be a good upper limit to shoot for, but 100 would be wonderful. But if things don't go as expected, even 75 Mhz would be okay.
Extra features
It would be nice to integrate Pluggy Reloaded and the I/O expander and do so in a way that takes the best features of both and removes redundancy.
The LUT Control Unit and the LUT ALU could allow for a 1-cycle "hardware" multiplier (up to 8/8/16 width, with the numbers being A/B/Q). That would give faster multiplication than even a 286, and loads faster than the 8088/8086. Better multiplication is one of a number of reasons the 286 was faster than the 8086. You wouldn't need an FPU as much if the ALU had some FPU functionality. Since the ALU would be a LUT, there is no reason why one couldn't add some basic trig functions and a simple divider (maybe 8/8/8).
I have mixed feelings about a separate video controller. I'm thinking that maybe if one were to make this, they should add a socket for a Digilen A7 or other small FPGA board, as well as jumpers and cable headers. Then a memory-snooping video/sound/lights coprocessor could be added. That would require a little more thinking. With this much power, such a controller would not be necessary for sure. However, since the idea is to integrate the IO controller, tightly integrating the 2 controllers would be an idea. Then one could use an FPGA to help with faster I/O and possibly open the door to a real math coprocessor. Adding such a controller could help simplify the main ROM. The vCPU could then have maximum potential since video would not be a consideration. There might still need to be software syncs, even then, depending on how the rest of the I/O is done. At the least, keep a "vertical sync" in software for the benefit of the keyboard/game controller, and for user applications.
The above-proposed controller or controller set would make higher resolution sound more possible. While the current ROM uses 6-bit samples, the sound portion of the controller could do 8-bit output. While that could make for cleaner sound when merging the channels, it could be also possible to include internal 8-bit samples. In that case, the controller should have at least a 10-bit ALU (really, adder-shifter) to give enough mixing headroom. However, it would be wise to leave the 6-bit samples in the memory map. Some software relies on those for non-sound purposes, and the controller should have a fall-back mode to where the ones in RAM are used. So the controller should determine if any software changes the samples and shadow/use the changes. That way, PucMon, and other games would sound as expected. A neat feature could be to collect all the user-modified samples, put them in the controller and have a way to select different sound palettes. That could make for interesting audio software since more samples and hopefully the ability to change the samples rapidly, thus making software that's closer to an Amiga tracker. And depending on how the controller is done, one might also be able to break past the 3900 Hz ceiling. While 15 Khz would be nice to have, even 7800 Hz would be better than now. The controller would have to translate the rates to whatever is actually used. For instance, the video could be clocked at 12.5 Mhz with pixel doubling to emulate 6.25 Mhz. That would allow for maybe faster I/O and higher sound frequencies.
Plus, with a faster video clock, one could have a crisper text mode, so video information coming from the machine would be treated as 6.25 Mhz while internal data could be treated as 12.5. So the internal character set could be a higher resolution than what the Gigatron provides.
I don't know how feasible a hardware RNG would be. I know this sounds a bit like feature creep. The memory scouring software technique could still be used. However, a little extra circuitry might give another option. I mean, there would need to be 2 clocks. There would need to be one about 14 Mhz or lower (12.5 or 6.25 would also work) to initialize the various SRAMs, and there would need to be the system clock. So that is 2 clocks right there. Adding a PLL or clock multiplier/divider chip could add more if needed. I don't know how well it would work to XOR 2 different clocks, feed it into a shift register, and sample with a 3rd clock with no respect to domain-crossing rules. I thought of the idea of having a table in the ALU ROM. The only problem is that it would be predictable (no worse than a linear feedback shift register approach), and the numbers would only be scrambled. It could be possible that the "ALU" could fetch another number when it would be otherwise stalled. That could be used as a supplement to the RAM entropy method. It depends on what one wants to do in the ROM.
Something cute, though I likely wouldn't really consider adding unless there is a demand would be "TV emulation. So the video controller could delay in using the memory contents and use an LFSR or a table to produce "snow." The LFSR (or the noise sample) could be used to create audio white noise. And if one were into details, being able to send 15.75 Khz out another sound channel would be neat. Shoot, maybe even add a small amount of 50-60 Hz hum. So when you turn it on, you could have a more retro experience. Going with that theme, one could even add some I/O and/or typing noises. On the Atari 800, for instance, there were keyboard chirps and disk I/O noises, perhaps produced by the PIA chip (the Pokey was used for actual sound). The PIA was a couple of shift registers and timers with the ability to make IRQs. The VIA (Commodore used that) was a more advanced PIA. The VIA (as suggested by WDC) was geared more to 16-bit machines, but plenty of 8-bit machines used it.
If anyone has suggestions for extra features, let us know. What is mentioned above is more of a wish list for extra features. The only real must-have in this category would be enhanced storage/memory/keyboard. Everything else is optional.
Questions and Considerations
I'm not sure I am up to the task, but it sounds like it could be fun. I know next to nothing about SMT. Obviously, the voltages of the chips used need to be taken into account, and levelers or other parts used to match things. For some things, resistors with occasional Zeners could be enough, but bidirectional traffic will need level shifters. It is best to shoot for a frequency that is a multiple of 6.25 (or a larger multiple of 6.25), and slightly faster should be fine. The 6.25 Mhz is slightly slower than standard, and Marcel dealt with that by making the porches a tad smaller. That is why the vertical refresh is slightly under 60 Hz.
I do have many questions and design considerations that I'm unsure of, but I might want to start a thread for those since they would have more value as general reference material applying to anyone wanting to modify, respin, or create peripherals. Like asking some vCPU, LDR, Pluggy, RNG, sound, and I/O Expander questions. That would be more useful in making a ROM than building new hardware.
As I said, I might not be up to the task, but if I start it, I'd need help. The areas where I'd likely need help would be part selection, board design, schematics software, and SMT. As Walter suggested before, Hackaday is probably more suitable for this. I might start a page there and if anyone wants to join as a "team member," I'd gladly add them.
I figured this needs its own thread. I don't want to clog the 10+ Mhz thread. Over the last couple of years, I've been brainstorming various ways to speed up the Gigatron and have discarded many.
One way to make it faster is to not use actual opcodes, just horizontal microcode, and remove the control unit. That would be harder to program (preferably with macros) and take more ROMs and pipeline registers. That would be a somewhat simple way to even up the pipeline some and get the clock rate up a little since no decoding would be needed. I likely wouldn't do that as it isn't efficient in terms of space. If one wanted to, they could use this approach and shadow it. If you could get all the SRAM to 7-8 ns and shadow everything, it could take you to about 45 Mhz. And if the shadow RAM is fast enough (as fast as a register), one could then remove the delay slot. I wouldn't do that either since that would prevent using any trampoline code.
The biggest latency is in the execution unit, particularly when RAM is used, though the delayed clock helps that, at least at slower clock rates. In the execution unit, the control unit takes about the longest. I had proposed a carry skip adder arrangement for the high nibble, but that would only gain a couple of Mhz (over what has been tried and using those ideas). Even if that doesn't get you to 18 Mhz, 15 Mhz would be more stable than on the test machine. However, if you split the execution unit in half, that should do more to increase the clock rate, and a CSA arrangement would be moot unless drastic design changes are made. Marcel had suggested finding a way to decouple the memory, though nobody really commented on that. Below, I will propose how to do that.
Something to keep in mind at higher speeds is video production. The reason that Marcel put everything on the left side of the screen in the test ROM is that there's currently no easy way to process between pixels. You can't use any meaningful instruction between the pixels. If you go to maybe 100 Mhz, you have 15 cycles between each pixel, making it a necessity to figure out how to use vCPU between the pixels. While some have said to buffer the video output and have circuitry to use it as needed, I'd say to add several more registers (and the needed opcodes). That would give room for both the video context and the vCPU context. So you could then take the time of several pixels for a vCPU instruction, or whatever. Plus I think you could then get rid of restarting the vCPU for instructions that are interrupted.
Design changes needed for going faster
Beyond most of the earlier changes such as using faster parts, more board layers, more board fill, faster diodes, smaller resistors, one would need to rethink the design as a whole. I propose a 4-stage pipeline. So you have Fetch, Decode, Access, and Execute. That would make the pipeline stages take less time and be more balanced. If you keep the 70 ns ROM, a 4-stage pipeline would get you closer to 14 Mhz without other optimizations. But if you use 40 ns for the RAM and ROM with a 4 stage pipeline, then you'd get closer to 25 Mhz. To get 100 Mhz, the slowest stage cannot exceed 10 ns.
The clock
With faster designs, you might want to move beyond a discrete chip clock. I'd propose an oscillator "can" and perhaps a chip to buffer and distribute the signal. The clock splitter would be good in that you could use different voltage chips since you could add resistors and Zener diodes without loading the other lines. The current clock has some ripple from higher harmonics, and for a really fast machine, it would help to have a cleaner signal.
The Native ROM (Fetch)
The native ROM could be shadowed during boot to go from 70 ns to 7-8 ns.
The Control Unit (Decode)
The control unit is one of the slower parts of the Gigatron. You can speed it up some with faster parts. Or one could rearrange the chips with a new opcode map and try to find a faster combination. Now, for a 100 Mhz Gigatron, I'm considering a LUT-based control unit. While that sounds slower, it could be buffered to a 7-8 ns SRAM. What makes LUT-based attractive to me is the ability to arbitrarily create the control signals. A current shortcoming is not being able to process between the pixels when operating at 12.5 Mhz or faster. But 3 more registers would help that. Plus instructions such as shifts and multiplication would help the vCPU efficiency. So designing a LUT-based control unit means you'd be able to add lines to make such things possible. The ALU has only 3 operation lines, and a LUT ALU could have more.
User RAM (Access)
The user RAM cuts into the critical path. The Gigatron uses a Clk2 signal to help mitigate this. In my approach, it can be a pipeline stage. It would be placed before the ALU since the ALU modifies reads. However, no writeback stage is needed. So you can read RAM in stage 3 and use it in stage 4. Writes can be done here too. It might be helpful to find other things to do during this stage since all instructions don't use memory, and we can use memory even less thanks to changes in the 2 surrounding units.
The ALU
A bottleneck you have to keep in mind with faster machines is the ALU. Once you get close to 20 Mhz, you can show some improvement from a Carry-Skip Adder arrangement. When you go even faster, you must rethink your ALU. The new 1G SMD parts in the 74xx family don't include any adder chips. So you'd need to make a different ALU altogether. One way is to use high-speed gates and transparent latches. Drass at 6502.org managed to create a 6.9 ns adder that way. Drass won't be available for a while and my other contact isn't available much. There is another way. If I were to design this, I'd consider a ROM-based LUT for the ALU. At first, you might ask, how will that help if ROMs tend to be slow? That's where I'd consider shadowing the ALU ROM into 7-8 ns SRAM. So that would get rid of the multiplexers and the diodes. Plus if you use a big enough ROM/RAM combo for this, you could even add more ALU functionality. For instance, one could use a ROM with 21 address lines. So you use 16 bits for operands and the other 5 would be control lines. That means you could have up to 32 ops. Since it would be a 16-bit ROM, you could have an 8-bit result and flags. If one wants to add a multiplier, then the upper byte would be needed for the most significant byte. So I guess a multiplexer and a control line would be what's needed to split between a FLAGS register and an upper accumulator (or a sub-lower accumulator for fixed-point division). If division of any sort is added, it might be good to make it only 15 bits max (if fixed-point results are desired) to save the upper bit as an exception/DBZ bit.
Differences in booting
Since most functionality is contained in LUTs, there has to be a way to fill them. The boot mechanism shouldn't be any faster than 14 Mhz (for 70 ns ROMs), and 8-10 Mhz should be fast enough. The largest LUT would be the ALU which could have up to 2 Mb of addresses. So 1/4 second to boot isn't bad. As for how this would work, I imagine one could throw in some multiplexers and a large enough counter. I guess it would need to hold things in reset until complete.
The motherboard
Such a design would like need every motherboard design optimization in the book. One would likely go for 4-layer and maximum fill for sure. I am wondering how far one should go with inter-trace grounding. ATA hard drive cables, for instance, add grounds between all the signal layers for UDMA-50 and higher. So I don't know if one should add vias for SMD chips to where half the traces are on each side with shield traces between them that are grounded on each end. Cross-talk could be an issue at these speeds, and even with good shielding, I don't really see how over 133 Mhz would be possible, again judging by hard drives. UDMA-133 was the fastest ATA interface, and SATA took it to 150 and beyond. I don't think going that fast would be possible. Even if you could get all 4 stages to 7 ns., 140 Mhz would be the theoretical maximum I see. The traces would need to be as short as possible since you add about 1 ns for every 7 inches of trace. If things end up about 9 ns as the worst delay, that means 111 Mhz would be the max. It would be nice to keep the clock at an even multiple of 6.25. 112.5-112.95 Mhz might be a good upper limit to shoot for, but 100 would be wonderful. But if things don't go as expected, even 75 Mhz would be okay.
Extra features
It would be nice to integrate Pluggy Reloaded and the I/O expander and do so in a way that takes the best features of both and removes redundancy.
The LUT Control Unit and the LUT ALU could allow for a 1-cycle "hardware" multiplier (up to 8/8/16 width, with the numbers being A/B/Q). That would give faster multiplication than even a 286, and loads faster than the 8088/8086. Better multiplication is one of a number of reasons the 286 was faster than the 8086. You wouldn't need an FPU as much if the ALU had some FPU functionality. Since the ALU would be a LUT, there is no reason why one couldn't add some basic trig functions and a simple divider (maybe 8/8/8).
I have mixed feelings about a separate video controller. I'm thinking that maybe if one were to make this, they should add a socket for a Digilen A7 or other small FPGA board, as well as jumpers and cable headers. Then a memory-snooping video/sound/lights coprocessor could be added. That would require a little more thinking. With this much power, such a controller would not be necessary for sure. However, since the idea is to integrate the IO controller, tightly integrating the 2 controllers would be an idea. Then one could use an FPGA to help with faster I/O and possibly open the door to a real math coprocessor. Adding such a controller could help simplify the main ROM. The vCPU could then have maximum potential since video would not be a consideration. There might still need to be software syncs, even then, depending on how the rest of the I/O is done. At the least, keep a "vertical sync" in software for the benefit of the keyboard/game controller, and for user applications.
The above-proposed controller or controller set would make higher resolution sound more possible. While the current ROM uses 6-bit samples, the sound portion of the controller could do 8-bit output. While that could make for cleaner sound when merging the channels, it could be also possible to include internal 8-bit samples. In that case, the controller should have at least a 10-bit ALU (really, adder-shifter) to give enough mixing headroom. However, it would be wise to leave the 6-bit samples in the memory map. Some software relies on those for non-sound purposes, and the controller should have a fall-back mode to where the ones in RAM are used. So the controller should determine if any software changes the samples and shadow/use the changes. That way, PucMon, and other games would sound as expected. A neat feature could be to collect all the user-modified samples, put them in the controller and have a way to select different sound palettes. That could make for interesting audio software since more samples and hopefully the ability to change the samples rapidly, thus making software that's closer to an Amiga tracker. And depending on how the controller is done, one might also be able to break past the 3900 Hz ceiling. While 15 Khz would be nice to have, even 7800 Hz would be better than now. The controller would have to translate the rates to whatever is actually used. For instance, the video could be clocked at 12.5 Mhz with pixel doubling to emulate 6.25 Mhz. That would allow for maybe faster I/O and higher sound frequencies.
Plus, with a faster video clock, one could have a crisper text mode, so video information coming from the machine would be treated as 6.25 Mhz while internal data could be treated as 12.5. So the internal character set could be a higher resolution than what the Gigatron provides.
I don't know how feasible a hardware RNG would be. I know this sounds a bit like feature creep. The memory scouring software technique could still be used. However, a little extra circuitry might give another option. I mean, there would need to be 2 clocks. There would need to be one about 14 Mhz or lower (12.5 or 6.25 would also work) to initialize the various SRAMs, and there would need to be the system clock. So that is 2 clocks right there. Adding a PLL or clock multiplier/divider chip could add more if needed. I don't know how well it would work to XOR 2 different clocks, feed it into a shift register, and sample with a 3rd clock with no respect to domain-crossing rules. I thought of the idea of having a table in the ALU ROM. The only problem is that it would be predictable (no worse than a linear feedback shift register approach), and the numbers would only be scrambled. It could be possible that the "ALU" could fetch another number when it would be otherwise stalled. That could be used as a supplement to the RAM entropy method. It depends on what one wants to do in the ROM.
Something cute, though I likely wouldn't really consider adding unless there is a demand would be "TV emulation. So the video controller could delay in using the memory contents and use an LFSR or a table to produce "snow." The LFSR (or the noise sample) could be used to create audio white noise. And if one were into details, being able to send 15.75 Khz out another sound channel would be neat. Shoot, maybe even add a small amount of 50-60 Hz hum. So when you turn it on, you could have a more retro experience. Going with that theme, one could even add some I/O and/or typing noises. On the Atari 800, for instance, there were keyboard chirps and disk I/O noises, perhaps produced by the PIA chip (the Pokey was used for actual sound). The PIA was a couple of shift registers and timers with the ability to make IRQs. The VIA (Commodore used that) was a more advanced PIA. The VIA (as suggested by WDC) was geared more to 16-bit machines, but plenty of 8-bit machines used it.
If anyone has suggestions for extra features, let us know. What is mentioned above is more of a wish list for extra features. The only real must-have in this category would be enhanced storage/memory/keyboard. Everything else is optional.
Questions and Considerations
I'm not sure I am up to the task, but it sounds like it could be fun. I know next to nothing about SMT. Obviously, the voltages of the chips used need to be taken into account, and levelers or other parts used to match things. For some things, resistors with occasional Zeners could be enough, but bidirectional traffic will need level shifters. It is best to shoot for a frequency that is a multiple of 6.25 (or a larger multiple of 6.25), and slightly faster should be fine. The 6.25 Mhz is slightly slower than standard, and Marcel dealt with that by making the porches a tad smaller. That is why the vertical refresh is slightly under 60 Hz.
I do have many questions and design considerations that I'm unsure of, but I might want to start a thread for those since they would have more value as general reference material applying to anyone wanting to modify, respin, or create peripherals. Like asking some vCPU, LDR, Pluggy, RNG, sound, and I/O Expander questions. That would be more useful in making a ROM than building new hardware.
As I said, I might not be up to the task, but if I start it, I'd need help. The areas where I'd likely need help would be part selection, board design, schematics software, and SMT. As Walter suggested before, Hackaday is probably more suitable for this. I might start a page there and if anyone wants to join as a "team member," I'd gladly add them.