It's been a long time since I've looked at the Gigatron, but it keeps calling me back. I want to make a version that removes the things I didn't really like about it, but wanted to hear what people here thought.
I want a von Neumann architecture so I could do native assembly executing out of RAM (no vCPU reliance). This opens the possibility of expansion slots for hardware with their own memory space. I also didn't like counting lines for racing the beam in native assembly. So I want to include an interrupt and an external timer. Then you shove the video drawing into a interrupt service routine, and don't ever have to worry about counting lines again. And of course, I still want to get a native keyboard (no pluggy).
I laid out a design, but it adds 10 more chips to the design. It keeps the same instructions with two exceptions: instead of the input register being put on the databus, it's now a peripheral in the memory space. Instead the Y register can be put on the databus. This is required for saving the register values during an interrupt anyway. And then I have to add a return from interrupt instruction.
Of course this means software is not really compatible anymore. It might not take a lot to convert some software, but I just wanted to make native assembly more fun to write. What do people here think about this? Do you just prefer to write in vCPU or BASIC? Is it worth the 10 extra chips?
New Gigatron architecture idea
Forum rules
Be nice. No drama.
Be nice. No drama.
Re: New Gigatron architecture idea
I read your post, I understand your wishes.
I met Marcel in September 2019 in Cambridge UK, for a vintage computer festival. We presented the Gigatron to the UK computer enthusiasts.
Marcel and I worked on a superfast Gigatron 12.5 MHz.
IMHO. Please don't try to re-invent the Gigatron. Leave it, as it is - as a lasting memorial to it's creator Marcel van Kervinck.
If you have ambitious ideas, then post them here, but leave the Gigatron for what it is, so eveyone can enjoy it, and Marcel's talent
I met Marcel in September 2019 in Cambridge UK, for a vintage computer festival. We presented the Gigatron to the UK computer enthusiasts.
Marcel and I worked on a superfast Gigatron 12.5 MHz.
IMHO. Please don't try to re-invent the Gigatron. Leave it, as it is - as a lasting memorial to it's creator Marcel van Kervinck.
If you have ambitious ideas, then post them here, but leave the Gigatron for what it is, so eveyone can enjoy it, and Marcel's talent
Re: New Gigatron architecture idea
I agree. The Gigatron and Marcel are a great inspiration. And they are not the only ones that I draw inspiration from. Even the Gigatron includes ideas from other designs. For example, the ALU design did not originate with Marcel and other CPU designs share the same ALU. I think it's which ideas we choose to draw from which create something new and unique. The Novasuar, the Kobold, the Isetta, Eater's CPU - they all are very similar because they're all gate-level CPU designs with graphics outputs. But they are still all very different. I don't have a name for mine just yet, but I'll let you know when I do.
Re: New Gigatron architecture idea
Are you familiar with this project Minimal-64x4:
https://github.com/slu4coder/Minimal-64x4-Home-Computer
https://www.youtube.com/watch?v=L1oECH6rPvs
https://github.com/slu4coder/Minimal-64x4-Home-Computer
https://www.youtube.com/watch?v=L1oECH6rPvs
-
- Posts: 69
- Joined: 16 Jul 2019, 09:19
- Location: UK
Re: New Gigatron architecture idea
I think that for me a lot of what is interesting about the Gigatron, and would thus make something "Gigatron-like" is exactly the sort of thing you might be trying to get away from.
I feel Marcel found a point of balance, where he could use a relatively small amount of logic and yet get a machine that was relatively fast and powerful. I imagine that most small changes would result in something that was both bigger and worse. Quite how much luck or judgement were involved, I don't know.
Programming the Gigatron often involves quite a lot of "Sudoku solving", but I think the quirky architecture throws up solutions as often as problems. A good example of this is the way that the combination of the Harvard architecture, pipeline without hazard detection and segmented memory map allows for the right-shift table, which would be much slower and more space intensive otherwise. Apparent shortcomings that actually go a long way to compensate for another apparent shortcoming (no right-shift instruction).
A good point for discussion might be what the essential Gigatron traits are, and which are incidental. For example, I definitely consider the Harvard architecture an essential feature, but the XOUT register is definitely an implementation detail.
I feel Marcel found a point of balance, where he could use a relatively small amount of logic and yet get a machine that was relatively fast and powerful. I imagine that most small changes would result in something that was both bigger and worse. Quite how much luck or judgement were involved, I don't know.
Programming the Gigatron often involves quite a lot of "Sudoku solving", but I think the quirky architecture throws up solutions as often as problems. A good example of this is the way that the combination of the Harvard architecture, pipeline without hazard detection and segmented memory map allows for the right-shift table, which would be much slower and more space intensive otherwise. Apparent shortcomings that actually go a long way to compensate for another apparent shortcoming (no right-shift instruction).
A good point for discussion might be what the essential Gigatron traits are, and which are incidental. For example, I definitely consider the Harvard architecture an essential feature, but the XOUT register is definitely an implementation detail.
Re: New Gigatron architecture idea
I'll touch on what is already here and then post some ideas. I use "Gigasimilar" to refer to homebrew ideas/projects that heavily borrow from the Gigatron. We really shouldn't try to promote something as a "Gigatron 2.0" or similar since you cannot significantly "improve" on the Gigatron and still have a Gigatron. We have the effort and sacrifice of Marcel. Most of us didn't know about his health, and he pushed through. He wanted to leave us a legacy, and we should respect that.
Touching on the first post
For the Gigatron, there isn't that much difference between Von Neumann and Harvard really. If you remove most of what is in ROM and replace it with only a vCPU implementation, and do the rest in a "classical" manner and not use multiplexing, then you would have a VN machine with little effort. The ROM would function as microcode. As for interrupts, while it's ideal to use specialized hardware, you could repurpose the Input port (use DMA or I/O mapping to add one) and poll it each instruction. Leave the function calls, though some would need to be modified for the new usage.
Better Hardware vCPU Support
Building on the above, while you can use most of the Gigatron as-is as a VN machine, there are things one can add. One of the bottlenecks, of course, is having to bit-bang the peripherals, but so is the overhead of all the context-switching and dispatching. So avoiding bit-banging not only increases the net speed, it also improves the gross speed in that overhead is not spent changing contexts, scheduling, and dispatching.
Another bottleneck is the lack of a true ADC (and I guess SBB) instruction. Unlike the 6502 which uses ADC and requires CLC if you want to use ADC as ADD, the Gigatron has the opposite issue that is harder to manage. There is an ADD instruction and no ADC. So there is no way to propagate the carry for multibyte additions, and that contributes to the vCPU bottlenecks. The ADDW instruction and likely the vPC suffer from needing to do multi-byte additions another way than expected.
Then there is the jumplist issue. You cannot currently use all the virtual opcode slots in the map. You can't even make a decent trampoline (in the general sense, not the Gigatron sense) since there is not enough room. Also, jumping to another jump isn't efficient. So what if you could jump entire paragraphs or 2 in the ROM? So a solution I see is the ability to have 256 paragraph jumps. So the Ac or the immediate field replaces the center 8 bytes of the address and always sets the lowest 4 bits to 0 (the other 4 bits are the top half of Y). So I'd like to see a native paragraph jump instruction. That way, you can use inline code for most of your vCPU handlers inline and make every potential opcode available without prefixes. (If there is a significant number where this isn't enough, you could make it double-page addressable. If it is just a few, then reserve enough room to jump elsewhere.
Another thing that could help would be more registers. Even if you keep bit-banging, you can save context swap time this way. But we must be reasonable since, like adders and decoders, muxes don't scale well. One way to visualize a multiplexer is to decode the selector lines, AND them with their corresponding inputs, and OR the results of each stage. The more selectors mean more decoding circuitry to signal a 1 when only the desired combination is selected, and the longer the OR chain. (I know this is an over-simplification, and it might use the opposite type of logic, since negative logic may be cheaper and faster.) So you'd likely use tristate buffers instead of muxes to add more registers, though you have to be mindful of parasitic capacitance.
If Speed Is All That Matters
If you really want to overclock things and beat the record of around 15 MHz, you could first start with the usual stuff. A faster crystal, a 4-layer board with ground planes, BAT42/BAT43/Toshiba diodes, lower value resistors, 74F or 74ACT components, etc. You could also mod the ALU with another adder and at least one mux. Now, one more step would be to gut most of the control unit, add 1-2 more ROMs, and add just as many flip-flops as new ROMs. Then each "instruction" would be control signals instead of numbered opcodes. When clocked at the normal speed, it should work the same as before. Just write a program to read the existing ROM file and convert that to your control signal matrix format. And you will actually have more instructions available at the discrete signal level, and you might find some to make the ROM code faster.
My Previous 4-Stage Pipeline Idea
Now, I had considered a 4-stage pipeline, hoping that 100 MHz was possible. I now figure that with existing parts you can get, maybe 25-37.5 MHz is more reasonable. The 4 proposed stages were Fetch, Decode, Access, and Execute. But I found a problem. You have to think about how neighboring instructions interact with one another in a pipeline. See, I was planning on reading/writing the SRAM in Stage 3 and making stage 4 have the ALU. But I recently realized that would cause a race condition and would almost never work. If you compute a value in Stage 4 and write it during the next instruction, the asymmetry here would cause the Accumulator to be flushed a cycle early (while the ALU is working on the needed value). So maybe the way to mitigate it is to force writing in Stage 4 (instead of using the ALU for computations). Reading in Stage 3 makes sense so the ALU stage can read this out of the pipeline. But writing there will likely be a hazard.
Touching on the first post
For the Gigatron, there isn't that much difference between Von Neumann and Harvard really. If you remove most of what is in ROM and replace it with only a vCPU implementation, and do the rest in a "classical" manner and not use multiplexing, then you would have a VN machine with little effort. The ROM would function as microcode. As for interrupts, while it's ideal to use specialized hardware, you could repurpose the Input port (use DMA or I/O mapping to add one) and poll it each instruction. Leave the function calls, though some would need to be modified for the new usage.
Better Hardware vCPU Support
Building on the above, while you can use most of the Gigatron as-is as a VN machine, there are things one can add. One of the bottlenecks, of course, is having to bit-bang the peripherals, but so is the overhead of all the context-switching and dispatching. So avoiding bit-banging not only increases the net speed, it also improves the gross speed in that overhead is not spent changing contexts, scheduling, and dispatching.
Another bottleneck is the lack of a true ADC (and I guess SBB) instruction. Unlike the 6502 which uses ADC and requires CLC if you want to use ADC as ADD, the Gigatron has the opposite issue that is harder to manage. There is an ADD instruction and no ADC. So there is no way to propagate the carry for multibyte additions, and that contributes to the vCPU bottlenecks. The ADDW instruction and likely the vPC suffer from needing to do multi-byte additions another way than expected.
Then there is the jumplist issue. You cannot currently use all the virtual opcode slots in the map. You can't even make a decent trampoline (in the general sense, not the Gigatron sense) since there is not enough room. Also, jumping to another jump isn't efficient. So what if you could jump entire paragraphs or 2 in the ROM? So a solution I see is the ability to have 256 paragraph jumps. So the Ac or the immediate field replaces the center 8 bytes of the address and always sets the lowest 4 bits to 0 (the other 4 bits are the top half of Y). So I'd like to see a native paragraph jump instruction. That way, you can use inline code for most of your vCPU handlers inline and make every potential opcode available without prefixes. (If there is a significant number where this isn't enough, you could make it double-page addressable. If it is just a few, then reserve enough room to jump elsewhere.
Another thing that could help would be more registers. Even if you keep bit-banging, you can save context swap time this way. But we must be reasonable since, like adders and decoders, muxes don't scale well. One way to visualize a multiplexer is to decode the selector lines, AND them with their corresponding inputs, and OR the results of each stage. The more selectors mean more decoding circuitry to signal a 1 when only the desired combination is selected, and the longer the OR chain. (I know this is an over-simplification, and it might use the opposite type of logic, since negative logic may be cheaper and faster.) So you'd likely use tristate buffers instead of muxes to add more registers, though you have to be mindful of parasitic capacitance.
If Speed Is All That Matters
If you really want to overclock things and beat the record of around 15 MHz, you could first start with the usual stuff. A faster crystal, a 4-layer board with ground planes, BAT42/BAT43/Toshiba diodes, lower value resistors, 74F or 74ACT components, etc. You could also mod the ALU with another adder and at least one mux. Now, one more step would be to gut most of the control unit, add 1-2 more ROMs, and add just as many flip-flops as new ROMs. Then each "instruction" would be control signals instead of numbered opcodes. When clocked at the normal speed, it should work the same as before. Just write a program to read the existing ROM file and convert that to your control signal matrix format. And you will actually have more instructions available at the discrete signal level, and you might find some to make the ROM code faster.
My Previous 4-Stage Pipeline Idea
Now, I had considered a 4-stage pipeline, hoping that 100 MHz was possible. I now figure that with existing parts you can get, maybe 25-37.5 MHz is more reasonable. The 4 proposed stages were Fetch, Decode, Access, and Execute. But I found a problem. You have to think about how neighboring instructions interact with one another in a pipeline. See, I was planning on reading/writing the SRAM in Stage 3 and making stage 4 have the ALU. But I recently realized that would cause a race condition and would almost never work. If you compute a value in Stage 4 and write it during the next instruction, the asymmetry here would cause the Accumulator to be flushed a cycle early (while the ALU is working on the needed value). So maybe the way to mitigate it is to force writing in Stage 4 (instead of using the ALU for computations). Reading in Stage 3 makes sense so the ALU stage can read this out of the pipeline. But writing there will likely be a hazard.
Re: New Gigatron architecture idea
I've been refining my thoughts.
Hardware vCPU Support, Continued
If you want to make a machine like a Gigatron closer to Von Neumann, one of the first steps would be to make the rest of the machine not rely on bit-banging. After designing for that, give a Giga-similar machine a ROM paragraph jump.
It would be nice to give it its own program counter that can be treated as another register with vPC++ and instructions to move things in and out of the vPC counter registers. I don't guess moving things RAM @ vPC would be of much use, but indirect reads would be handy. So make it so that when the paragraph jump is used based on vPC to read the RAM and jump to the ROM at that location to execute the vCPU handler, it could also enable the post-increment. So with inline code arranged as a microcode jump table, do you even need a main interpreter loop at all? It can be like a self-contained state machine. When the last instruction is reached in a handler group, it could simply jump to RAM@vPC++ in ROM. ChatGPT gave me that refinement to try.
Interrupts and DMA
If you use memory-mapped addresses, DMA, bus-sniffing, etc., then you could sacrifice the ports and repurpose them. One low-hanging fruit would be the input port. And you can find a way to mux it rather than give it up completely. For instance, the game and keyboard logic could use an interrupt signal on the In port and then let you read the same port for the data. Or use another mechanism. But the idea is to use the port to signal the IRQ number. There can be a separate IRQ signal line and with some logic, it could drive an address pin on the ROM under certain conditions. So you can then have a performance vCPU handler set AND in another segment, have an interrupt mode handler set. The interrupt version would jump to a central handler and check for interrupts and branch to them as needed. The interrupt signal should only work between instructions. You do the paragraph jump in ROM and can end up in 1 of 3 places, either the performance vPC handlers, the interrupt/DMA version handlers, or the interrupt mode central handler loop.
DMA can be easier. The DRQ line could enable the slower vCPU mode with the IRQ/DMA polling. When the handler code detects the DMA signal its handler sends an ACK signal, "halts" the CPU (in a spinlock), and the device is free to unlatch the RAM from the CPU, manipulate it, and let go of the DRQ/halt line. So there is no need to mess with the clock or the native PC, only put the ROM in a polling loop.
Hardware vCPU Support, Continued
If you want to make a machine like a Gigatron closer to Von Neumann, one of the first steps would be to make the rest of the machine not rely on bit-banging. After designing for that, give a Giga-similar machine a ROM paragraph jump.
It would be nice to give it its own program counter that can be treated as another register with vPC++ and instructions to move things in and out of the vPC counter registers. I don't guess moving things RAM @ vPC would be of much use, but indirect reads would be handy. So make it so that when the paragraph jump is used based on vPC to read the RAM and jump to the ROM at that location to execute the vCPU handler, it could also enable the post-increment. So with inline code arranged as a microcode jump table, do you even need a main interpreter loop at all? It can be like a self-contained state machine. When the last instruction is reached in a handler group, it could simply jump to RAM@vPC++ in ROM. ChatGPT gave me that refinement to try.
Interrupts and DMA
If you use memory-mapped addresses, DMA, bus-sniffing, etc., then you could sacrifice the ports and repurpose them. One low-hanging fruit would be the input port. And you can find a way to mux it rather than give it up completely. For instance, the game and keyboard logic could use an interrupt signal on the In port and then let you read the same port for the data. Or use another mechanism. But the idea is to use the port to signal the IRQ number. There can be a separate IRQ signal line and with some logic, it could drive an address pin on the ROM under certain conditions. So you can then have a performance vCPU handler set AND in another segment, have an interrupt mode handler set. The interrupt version would jump to a central handler and check for interrupts and branch to them as needed. The interrupt signal should only work between instructions. You do the paragraph jump in ROM and can end up in 1 of 3 places, either the performance vPC handlers, the interrupt/DMA version handlers, or the interrupt mode central handler loop.
DMA can be easier. The DRQ line could enable the slower vCPU mode with the IRQ/DMA polling. When the handler code detects the DMA signal its handler sends an ACK signal, "halts" the CPU (in a spinlock), and the device is free to unlatch the RAM from the CPU, manipulate it, and let go of the DRQ/halt line. So there is no need to mess with the clock or the native PC, only put the ROM in a polling loop.