Gigatron Hackers

Posted: **20 Jan 2022, 04:38**

Intro
I figured this needs its own thread. I don't want to clog the 10+ Mhz thread. Over the last couple of years, I've been brainstorming various ways to speed up the Gigatron and have discarded many.

One way to make it faster is to not use actual opcodes, just horizontal microcode, and remove the control unit. That would be harder to program (preferably with macros) and take more ROMs and pipeline registers. That would be a somewhat simple way to even up the pipeline some and get the clock rate up a little since no decoding would be needed. I likely wouldn't do that as it isn't efficient in terms of space. If one wanted to, they could use this approach and shadow it. If you could get all the SRAM to 7-8 ns and shadow everything, it could take you to about 45 Mhz. And if the shadow RAM is fast enough (as fast as a register), one could then remove the delay slot. I wouldn't do that either since that would prevent using any trampoline code.

The biggest latency is in the execution unit, particularly when RAM is used, though the delayed clock helps that, at least at slower clock rates. In the execution unit, the control unit takes about the longest. I had proposed a carry skip adder arrangement for the high nibble, but that would only gain a couple of Mhz (over what has been tried and using those ideas). Even if that doesn't get you to 18 Mhz, 15 Mhz would be more stable than on the test machine. However, if you split the execution unit in half, that should do more to increase the clock rate, and a CSA arrangement would be moot unless drastic design changes are made. Marcel had suggested finding a way to decouple the memory, though nobody really commented on that. Below, I will propose how to do that.

Something to keep in mind at higher speeds is video production. The reason that Marcel put everything on the left side of the screen in the test ROM is that there's currently no easy way to process between pixels. You can't use any meaningful instruction between the pixels. If you go to maybe 100 Mhz, you have 15 cycles between each pixel, making it a necessity to figure out how to use vCPU between the pixels. While some have said to buffer the video output and have circuitry to use it as needed, I'd say to add several more registers (and the needed opcodes). That would give room for both the video context and the vCPU context. So you could then take the time of several pixels for a vCPU instruction, or whatever. Plus I think you could then get rid of restarting the vCPU for instructions that are interrupted.

Design changes needed for going faster
Beyond most of the earlier changes such as using faster parts, more board layers, more board fill, faster diodes, smaller resistors, one would need to rethink the design as a whole. I propose a 4-stage pipeline. So you have Fetch, Decode, Access, and Execute. That would make the pipeline stages take less time and be more balanced. If you keep the 70 ns ROM, a 4-stage pipeline would get you closer to 14 Mhz without other optimizations. But if you use 40 ns for the RAM and ROM with a 4 stage pipeline, then you'd get closer to 25 Mhz. To get 100 Mhz, the slowest stage cannot exceed 10 ns.

The clock
With faster designs, you might want to move beyond a discrete chip clock. I'd propose an oscillator "can" and perhaps a chip to buffer and distribute the signal. The clock splitter would be good in that you could use different voltage chips since you could add resistors and Zener diodes without loading the other lines. The current clock has some ripple from higher harmonics, and for a really fast machine, it would help to have a cleaner signal.

The Native ROM (Fetch)
The native ROM could be shadowed during boot to go from 70 ns to 7-8 ns.

The Control Unit (Decode)
The control unit is one of the slower parts of the Gigatron. You can speed it up some with faster parts. Or one could rearrange the chips with a new opcode map and try to find a faster combination. Now, for a 100 Mhz Gigatron, I'm considering a LUT-based control unit. While that sounds slower, it could be buffered to a 7-8 ns SRAM. What makes LUT-based attractive to me is the ability to arbitrarily create the control signals. A current shortcoming is not being able to process between the pixels when operating at 12.5 Mhz or faster. But 3 more registers would help that. Plus instructions such as shifts and multiplication would help the vCPU efficiency. So designing a LUT-based control unit means you'd be able to add lines to make such things possible. The ALU has only 3 operation lines, and a LUT ALU could have more.

User RAM (Access)
The user RAM cuts into the critical path. The Gigatron uses a Clk2 signal to help mitigate this. In my approach, it can be a pipeline stage. It would be placed before the ALU since the ALU modifies reads. However, no writeback stage is needed. So you can read RAM in stage 3 and use it in stage 4. Writes can be done here too. It might be helpful to find other things to do during this stage since all instructions don't use memory, and we can use memory even less thanks to changes in the 2 surrounding units.

The ALU
A bottleneck you have to keep in mind with faster machines is the ALU. Once you get close to 20 Mhz, you can show some improvement from a Carry-Skip Adder arrangement. When you go even faster, you must rethink your ALU. The new 1G SMD parts in the 74xx family don't include any adder chips. So you'd need to make a different ALU altogether. One way is to use high-speed gates and transparent latches. Drass at 6502.org managed to create a 6.9 ns adder that way. Drass won't be available for a while and my other contact isn't available much. There is another way. If I were to design this, I'd consider a ROM-based LUT for the ALU. At first, you might ask, how will that help if ROMs tend to be slow? That's where I'd consider shadowing the ALU ROM into 7-8 ns SRAM. So that would get rid of the multiplexers and the diodes. Plus if you use a big enough ROM/RAM combo for this, you could even add more ALU functionality. For instance, one could use a ROM with 21 address lines. So you use 16 bits for operands and the other 5 would be control lines. That means you could have up to 32 ops. Since it would be a 16-bit ROM, you could have an 8-bit result and flags. If one wants to add a multiplier, then the upper byte would be needed for the most significant byte. So I guess a multiplexer and a control line would be what's needed to split between a FLAGS register and an upper accumulator (or a sub-lower accumulator for fixed-point division). If division of any sort is added, it might be good to make it only 15 bits max (if fixed-point results are desired) to save the upper bit as an exception/DBZ bit.

Differences in booting
Since most functionality is contained in LUTs, there has to be a way to fill them. The boot mechanism shouldn't be any faster than 14 Mhz (for 70 ns ROMs), and 8-10 Mhz should be fast enough. The largest LUT would be the ALU which could have up to 2 Mb of addresses. So 1/4 second to boot isn't bad. As for how this would work, I imagine one could throw in some multiplexers and a large enough counter. I guess it would need to hold things in reset until complete.

The motherboard
Such a design would like need every motherboard design optimization in the book. One would likely go for 4-layer and maximum fill for sure. I am wondering how far one should go with inter-trace grounding. ATA hard drive cables, for instance, add grounds between all the signal layers for UDMA-50 and higher. So I don't know if one should add vias for SMD chips to where half the traces are on each side with shield traces between them that are grounded on each end. Cross-talk could be an issue at these speeds, and even with good shielding, I don't really see how over 133 Mhz would be possible, again judging by hard drives. UDMA-133 was the fastest ATA interface, and SATA took it to 150 and beyond. I don't think going that fast would be possible. Even if you could get all 4 stages to 7 ns., 140 Mhz would be the theoretical maximum I see. The traces would need to be as short as possible since you add about 1 ns for every 7 inches of trace. If things end up about 9 ns as the worst delay, that means 111 Mhz would be the max. It would be nice to keep the clock at an even multiple of 6.25. 112.5-112.95 Mhz might be a good upper limit to shoot for, but 100 would be wonderful. But if things don't go as expected, even 75 Mhz would be okay.

Extra features
It would be nice to integrate Pluggy Reloaded and the I/O expander and do so in a way that takes the best features of both and removes redundancy.

The LUT Control Unit and the LUT ALU could allow for a 1-cycle "hardware" multiplier (up to 8/8/16 width, with the numbers being A/B/Q). That would give faster multiplication than even a 286, and loads faster than the 8088/8086. Better multiplication is one of a number of reasons the 286 was faster than the 8086. You wouldn't need an FPU as much if the ALU had some FPU functionality. Since the ALU would be a LUT, there is no reason why one couldn't add some basic trig functions and a simple divider (maybe 8/8/8).

I have mixed feelings about a separate video controller. I'm thinking that maybe if one were to make this, they should add a socket for a Digilen A7 or other small FPGA board, as well as jumpers and cable headers. Then a memory-snooping video/sound/lights coprocessor could be added. That would require a little more thinking. With this much power, such a controller would not be necessary for sure. However, since the idea is to integrate the IO controller, tightly integrating the 2 controllers would be an idea. Then one could use an FPGA to help with faster I/O and possibly open the door to a real math coprocessor. Adding such a controller could help simplify the main ROM. The vCPU could then have maximum potential since video would not be a consideration. There might still need to be software syncs, even then, depending on how the rest of the I/O is done. At the least, keep a "vertical sync" in software for the benefit of the keyboard/game controller, and for user applications.

The above-proposed controller or controller set would make higher resolution sound more possible. While the current ROM uses 6-bit samples, the sound portion of the controller could do 8-bit output. While that could make for cleaner sound when merging the channels, it could be also possible to include internal 8-bit samples. In that case, the controller should have at least a 10-bit ALU (really, adder-shifter) to give enough mixing headroom. However, it would be wise to leave the 6-bit samples in the memory map. Some software relies on those for non-sound purposes, and the controller should have a fall-back mode to where the ones in RAM are used. So the controller should determine if any software changes the samples and shadow/use the changes. That way, PucMon, and other games would sound as expected. A neat feature could be to collect all the user-modified samples, put them in the controller and have a way to select different sound palettes. That could make for interesting audio software since more samples and hopefully the ability to change the samples rapidly, thus making software that's closer to an Amiga tracker. And depending on how the controller is done, one might also be able to break past the 3900 Hz ceiling. While 15 Khz would be nice to have, even 7800 Hz would be better than now. The controller would have to translate the rates to whatever is actually used. For instance, the video could be clocked at 12.5 Mhz with pixel doubling to emulate 6.25 Mhz. That would allow for maybe faster I/O and higher sound frequencies.

Plus, with a faster video clock, one could have a crisper text mode, so video information coming from the machine would be treated as 6.25 Mhz while internal data could be treated as 12.5. So the internal character set could be a higher resolution than what the Gigatron provides.

I don't know how feasible a hardware RNG would be. I know this sounds a bit like feature creep. The memory scouring software technique could still be used. However, a little extra circuitry might give another option. I mean, there would need to be 2 clocks. There would need to be one about 14 Mhz or lower (12.5 or 6.25 would also work) to initialize the various SRAMs, and there would need to be the system clock. So that is 2 clocks right there. Adding a PLL or clock multiplier/divider chip could add more if needed. I don't know how well it would work to XOR 2 different clocks, feed it into a shift register, and sample with a 3rd clock with no respect to domain-crossing rules. I thought of the idea of having a table in the ALU ROM. The only problem is that it would be predictable (no worse than a linear feedback shift register approach), and the numbers would only be scrambled. It could be possible that the "ALU" could fetch another number when it would be otherwise stalled. That could be used as a supplement to the RAM entropy method. It depends on what one wants to do in the ROM.

Something cute, though I likely wouldn't really consider adding unless there is a demand would be "TV emulation. So the video controller could delay in using the memory contents and use an LFSR or a table to produce "snow." The LFSR (or the noise sample) could be used to create audio white noise. And if one were into details, being able to send 15.75 Khz out another sound channel would be neat. Shoot, maybe even add a small amount of 50-60 Hz hum. So when you turn it on, you could have a more retro experience. Going with that theme, one could even add some I/O and/or typing noises. On the Atari 800, for instance, there were keyboard chirps and disk I/O noises, perhaps produced by the PIA chip (the Pokey was used for actual sound). The PIA was a couple of shift registers and timers with the ability to make IRQs. The VIA (Commodore used that) was a more advanced PIA. The VIA (as suggested by WDC) was geared more to 16-bit machines, but plenty of 8-bit machines used it.

If anyone has suggestions for extra features, let us know. What is mentioned above is more of a wish list for extra features. The only real must-have in this category would be enhanced storage/memory/keyboard. Everything else is optional.

Questions and Considerations

I'm not sure I am up to the task, but it sounds like it could be fun. I know next to nothing about SMT. Obviously, the voltages of the chips used need to be taken into account, and levelers or other parts used to match things. For some things, resistors with occasional Zeners could be enough, but bidirectional traffic will need level shifters. It is best to shoot for a frequency that is a multiple of 6.25 (or a larger multiple of 6.25), and slightly faster should be fine. The 6.25 Mhz is slightly slower than standard, and Marcel dealt with that by making the porches a tad smaller. That is why the vertical refresh is slightly under 60 Hz.

I do have many questions and design considerations that I'm unsure of, but I might want to start a thread for those since they would have more value as general reference material applying to anyone wanting to modify, respin, or create peripherals. Like asking some vCPU, LDR, Pluggy, RNG, sound, and I/O Expander questions. That would be more useful in making a ROM than building new hardware.

As I said, I might not be up to the task, but if I start it, I'd need help. The areas where I'd likely need help would be part selection, board design, schematics software, and SMT. As Walter suggested before, Hackaday is probably more suitable for this. I might start a page there and if anyone wants to join as a "team member," I'd gladly add them.

Posted: **31 Jan 2022, 23:10**

Any thoughts or comments? Any considerations I need to take into account?

I guess I'd need to study all the instructions. In a way, the Native would then work like vCPU, like a LUT. Then for the instructions not used, come up with tentative replacement instructions and work out how many control lines are needed.

I did discover that finding suitable memory for doing it my way is next to impossible. Anything faster than 7-8 ns will likely be synchronous SRAM, BGAs, DDR/QDR, with more control lines than I'm used to. I don't have experience with any of that tech. I might be able to find 10-15 ns in the sizes/dimensions I'm looking for. So I might need to be less ambitious and shoot for 62.5 to 75 Mhz. Much faster SRAM is available (maybe 300 ps), but I don't know how to use it. When you get to 300 Mhz or so, you end up working with synchronous DDR/QDR SRAM if you still want to use SRAM.

So if anyone wants to collaborate or help with making a BOM and schematics, I could use it. The chip shortage situation is certainly not helping.

Posted: **01 Feb 2022, 13:58**

I have carefully read your post but my knowledge in electronic does not allow me to elaborate how feasible it is.

But I would like to put it in perspective: a Gigatron at 6 MHz vs a "Gigatron" at 100 MHz are a bit like an Intel 8086 vs an Intel Pentium (P5) meaning 15 years of technological progress gap.

On one hand you will probably hardly find a bunch of discrete components that will run together at a 100 MHz frequency, on the other hand it is possible to implement a Gigatron on an RPi board at several hundreds MHz clock frequency with HDMI output.

Posted: **01 Feb 2022, 14:39**

Just my 2 cents, but here's my guess as to why this topic hasn't generated more responses... At what point does it cease to be a Gigatron? For me personally, the appeal of the Gigatron is that it's simple enough that I still have hope of understanding what's going on and yet there is enough of an ecosystem in place that you can do some really cool stuff for just a handful of chips. While what you're proposing sounds very interesting and I look forward to seeing its fruition, much of it is over my head and I suspect that the moment you lose code compatibility you will lose access to said ecosystem and subsequently the attention of some folks on this forum as it is no longer a Gigatron. You will probably have a larger audience on Hackaday (or AnyCPU or VCF) like Walter suggested as a novel 100MHz TTL computer that is Gigatron inspired. If you broke down the changes into smaller chunks while maintaining code compatibility and take a more evolutionary (meaning folks can modify their existing boards to replicate your changes) vs revolutionary (ie requiring a totally new motherboard design) approach, you might see more participation from others on this forum.

Posted: **02 Feb 2022, 03:01**

Thank you both for the perspective. I think we could actually go even faster than 100 Mhz with the memory in the ps range, but I wouldn't know how to wire it up. When it goes that fast, the throughput is faster than the clock rate, it is synchronous, and there are more control lines. As far as that goes, going from 8 to 16-bits wide with memory introduces more control lines.

The reason for building this is like building anything else. I mean, why have a Gigatron at all if you can make an RPI/ARM run things faster? In the case of the original Gigatron, we can argue history or nostalgia. But mine would have other reasons I guess since I'd extend the native set directly. And I'd want to try to figure out how to incorporate the add-on boards.

At that speed, I don't know how I'd be able to build an asynchronous bus snooper due to crossing clock domains. I'd rather not have to cross any, but if I wanted to put the snooper on a Digilent A7 board, it would have to be async. Even if I could use clock tiles to go 100 Mhz or faster, it could never match the main clock. Of course, one solution would be to install the FPGA directly onto the board and add a USB UART or a JTAG plug. Then a PLL or clock multiplier chip could sync them or add the desired skew. That would be ideal. Of course, at that speed, bit-banging would be fine.

Now, as for understandability, what I propose is actually easier in some ways, though you miss some of the educational value. Plus it is the very approach that Marcel didn't want to take. The FAQ has the question, "Why not just use a great big ROM?" In this case, it would be multiple ROMs that have been copied to SRAMs.

The control unit would be only memory. You send the opcodes in as addresses and get control signals out the data lines. And of course, you'd need software, even a BASIC program, to create the ROM image for you. The ALU would be a ROM too. That is simple. Just use the two 8-bit operands and the operation control lines for the addresses; use the data lines as the result (and any flags). Other than that, shadow the ROMs before starting the CPU, and have 4 pipeline stages. That just means to separate the stages with registers. A register is always a cycle behind. I don't know how having 4 stages would affect trampoline code. And you'd have 3 delay slots.

As for compatibility, the goal is to keep vCPU-level compatibility. Just imagine a system call getting a hardware multiplier. That and the clock rate could make a blindingly fast Mandelbrot. And yes, throttling would be needed for compatibility. On the vCPU side, there would be no changes other than speed. More pipelines, more speed, and more native instructions won't break vCPU, but that would need a new ROM.

Posted: **02 Feb 2022, 04:18**

As for a new board, it can't be avoided. Too bad the Gigatron didn't start as a backplane board. That would be neat as you could replace sections of functionality at a time.

The incremental approach is also inefficient. For instance, let's say you need higher clock rates on other boards. Then you'd need a multiplier or PLL on every board that needs that. And TBH, that isn't a bad approach. If you use the same multiplier/PLL chips then each board would have a clean signal.

You couldn't change the entire ALU on the existing Gigatron with a high-speed one. Changing the adders would be about the best you can do. And the same for the control unit as it is distributed over much of the board. I wouldn't know how to drastically speed up the existing CU. I don't even understand it. But a lookup table in a RAM copied from a ROM to drive control lines seems simpler and makes more sense to me. The biggest speed hit in the existing CU to me is the use of all the decoders, with at least 1 being cascaded from another. That gives the same type of performance hit that you get in the ALU with the cascaded adders.

And if by chance you could add a ROM+SRAM ALU or control unit to the Gigagron, there would need to be lots of bodges and for the necessary boot delay, I guess one would tie to the power supervisor line or something.

So my goal is to have a more "finished" Gigatron from the start. So more speed, more native instructions, better I/O, etc. So it would be nice to have all the functionality of all the add-on boards. And when more compatibility is needed, the ROM can take care of that.

Posted: **02 Feb 2022, 10:03**

I cannot speak for the two fathers of the Gigatron themselves but my guess is that the Gigatron is how it is because this is a trade off between multiple goals to reach: making a TTL computer, making it as easy as possible to understand, to build for almost anyone, keeping costs reasonable, making a beautiful object almost a piece of art (original kit was sold with this lovely wooden frame), make it an educational object when you build it with an incredible manual and looking to it because each section purpose is written on the PCB, you can follow the busses on the PCB, etc.

I discovered the Gigatron lately end of 2021 and this is for those reasons I ordered the Budgetronics kit and am now building it and exchanging with the community here.

Posted: **03 Feb 2022, 03:45**

I wasn't really speaking for anyone, just paraphrasing their own words. One of Marcel's philosophies was to not use hardware unless you had to. Plus there is what he said in the FAQ about not wanting to build a machine entirely out of memory. The memory was too slow and expensive for use back in the day.

And I agree with Zebulon in that you can't get much more builder-friendly than the Gigatron. What I propose requires learning new skills. I am right at 50 and my eyes aren't working any better than a couple of decades ago. So having to build things that require a microscope will be a challenge. Every once in a while, things amaze me in that regard, like thinking I saw a pixel crawling up the screen with a "spidery" type motion. I knew that was impossible, and a hunch told me that was a mite. Looking at the types of mites, I found the most likely type, a common dust mite. I was amazed I could see that. So if I can identify a near-microscopic arachnid, then maybe I can work with SMDs.

As for the artistic comments and attention to detail, I agree. I've heard the term "German engineering" in television ads, and yes, that is true. Imagine using things efficiently for multiple purposes and still having them look like works of art. The silk-screening and the way the diodes are arranged are neat touches and help understandability. I still don't get the ALU. I get that the multiplexers form the logic portion and the adders form the arithmetic portion. But as for how the multiplexers do logic, I don't grasp that. I get the subtraction part -- just invert (NOT) the 2nd number and add with the carry-in line set. (-A = (!A+1)).

I wish that I had gotten the Budgetronics kit when it came out since the global chip shortage continues. One of the largest foreign foundries might have a new one by next year or so, and COVID is blamed in part due to more people working or learning from home. The world has had such shortages before, but not this bad since in the late '80s or so, only RAM was affected, perhaps due to a fire.

Posted: **03 Feb 2022, 17:11**

??? on stock
https://www.budgetronics.eu/en/building ... a-25779-20

Posted: **08 Feb 2022, 17:02**

Okay, I think I got confused. The one with the purple PCB has gone up.

Where I could use help with my idea is that it is hard to find memory that matches my specs. That is harder than I thought. For a 21-bit ROM, those are hard to find and marked obsolete, and likely only 90 ns. You can likely get them on eBay. And not sure what to do about the control unit LUT or how many data lines I'd need. I think there may be some 1K ROMs out there.

As for the various SRAMs, 10-15 ns might be the fastest for certain denominations. If 15 is the fastest I can get in common, then I guess clock it at 62.5 Mhz since 66.6 would be the fastest possible like that. That would still leave 9 instructions between pixels. I'd like to be on some even boundary of 25 Mhz (or 25.1) if possible. If not, then a boundary of 12.5, and if not, a boundary of 6.25 Mhz.

Gigatron Hackers

Who wants to see a 100 Mhz Gigatron?

Who wants to see a 100 Mhz Gigatron?

Re: Who wants to see a 100 Mhz Gigatron?

Re: Who wants to see a 100 Mhz Gigatron?

Re: Who wants to see a 100 Mhz Gigatron?

Re: Who wants to see a 100 Mhz Gigatron?

Re: Who wants to see a 100 Mhz Gigatron?

Re: Who wants to see a 100 Mhz Gigatron?

Re: Who wants to see a 100 Mhz Gigatron?

Re: Who wants to see a 100 Mhz Gigatron?

Re: Who wants to see a 100 Mhz Gigatron?