I've thought of other ways to mod the Gigatron. For instance, with an I/O expander board, why not connect the ports to it and let it provide the keyboard, video, and sound? The ports could be repurposed in the ROM to send/receive commands to the expander/controller. So you can even emulate interrupts. So every so often, you can read the In port and take data from it, and even use the jump to address trick to go to the relevant handler. A device could even request a functional "halt" or DMA access this way. If a device wants the Gigatron to halt, it can send a signal requesting the halt/DMA time, the ROM can see that, maybe output an acknowledge code, and then enter into a spinlock as it reads the In port until a clear signal is found. The RAM will be untouched during this time, and external devices are free to manipulate the RAM. The controller would be free to snoop the bus for the video and sound, understand the indirection table system, provide its own syncs, accept input and place it directly into the RAM, produce its own sound and video, provide file I/O and more. All Pluggy and Pluggy reloaded functionality can go on that board, and even file I/O assistance can be added. So you have the much wider parallel pipe and microcontroller assistance in one place. So this could allow more communication with an outside controller that works mostly out of memory.
If you can move all the bit-banging to a controller board, you would be free to clock the base machine at any speed you want without new ROMs. The controller could update at least the vertical sync in memory or otherwise make the base machine aware of when that changes so that software that uses a realtime clock could still work, and that could allow for dynamic profiling on boot to know how many machines cycles per page frame there are, and maybe per raster as well. So the ROM can then be free to alter the behavior dynamically based on the speed differences between the syncs and the base machine.
I haven't forgotten about my 75+ Mhz Giga-similar machine idea, but I likely will never do it. It does sound neat using a 3-4 stage pipeline, adding more native instructions like full shifting, multiplication, division, random numbers, more registers, some native 16-bit support, etc. The stages would be Fetch, Decode, Access, and Execute. Access would be before Execute because only reads are modified by instructions, never writes. So if you need to write/store, that could be done in the next instruction and use the Access stage that time. Plus, what would be neat would be also having an auxiliary ALU in the access stage to use that slot another way when RAM is not needed. So you could natively do 16-bit addition/subtraction/logic, but only using registers. Plus the extra "ALU" could provide random numbers when it is not being used for instructions and allow the result to be manipulated in the next pipeline slot (such as inverting it or adding an offset). Additional registers would need to be added if you want to still do bit-banging at 75+ Mhz. That way, both the video thread and the vCPU thread would have both contexts live at the same time and can switch without penalty. So you can have 1 clock for a pixel, 11 instructions to use for vCPU, and then output a pixel, etc. At such speeds, you really don't need much external behavior as you'd have more power than you need. But my idea for how to make fast, custom control units would be costly, inefficient, and require SMD parts for most things. The idea would be to use memory for the CU and the ALU(s) and to copy from ROM to fast SRAMs on boot and use LUTs for everything.
***
Moving on
The more I think about things, I might want to just mess with a Propeller 2 chip and make my own ISA, memory map, etc. With 8 cogs, that is enough to have at least one CPU, one or more coprocessors, sound, I/O, etc. But I don't know what instruction set and features I'd like to add.
Instructions and ISA
While I could use the native P2 instructions, I think it might be more fun to make my own. I don't know what all to include. Probably include most of what is in the 6502 and/or vCPU instruction set, and if there is any space in the opcode map left over, add things like RNG, Mult, Div, and maybe a trig function or 2.
As for the ISA size, I haven't worked that out yet. I'd love to get to a point where I can use word memory with 20 address lines as external memory. That sounds like a challenge. Counting up to 5 control lines (word and wider memory add a control line per byte), that would take 41 GPIO lines out of 56 non-shared lines (64 in total). That isn't too bad. Of the 15 left, that would mean 5 for video (built-in DAC), 2 for keyboard, 5 or whatever for SD, and maybe 2 for sound. If more are needed, maybe the external memory could be multiplexed.
As for the ISA, I'm not sure. If I want to use external 16-bit RAM, maybe have instructions with 8 and 16-bit opcodes. The byte instructions can have a byte for an operand. For 16-bit operands, that will tie up a doubleword, and not sure what to do with the other byte, whether to let it access up to 256 byte registers or just make it use a 24-bit operand or do both. The 24-bit operand might be a good thing since it would allow an absolute jump for the entire range as an immediate. That might be better than how Intel did things since even protected mode didn't access memory in a flat plane. Oh, that was presented to the user like that as an emulation, but it always used segment:offset under the hood, even when the user didn't see it, and despite 32-bit operands. I'm not sure what instructions I'd like to have beyond the basics. I'd like Mult, Div, RND, bounded RND, and due to the overhead of it being an emulation, probably block instructions and maybe loop instructions. I'm not sure if I'd want elaborate memory instructions like ternary memory ops (eg., [mem]+[mem]=[mem]).
Of course, I'd need to decide on what to do for sound and video. I'd want no fewer than 4 sound channels, and probably 15-18 Khz as the top frequency. For accuracy, I'd likely want to use an external crystal. Sure, the P2 has an internal clock that does around 20 Mhz, and that varies per chip. The exact frequency doesn't matter, so long as it is known and doesn't drift. One can code the ROM to use the internal PLL and VCO to get whatever you need. And I don't know what waveforms and capabilities to provide. Obviously, I'd want square, ramp, triangle, and noise. Sine might be nice to have, as well as combination waveforms and near-instrument sounds. I might want a sound coprocessor besides just a sound generator to produce more complex sounds.
For video, I don't know if I want 320x240 or what. I'd want a text mode. I'm not sure what features I'd want. I'd probably want hardware scrolling and sprites. I don't know if I'd want to use 2 cogs for video or not. You could use 2 (preferably neighboring to use shared LUT RAM) and have one for rendering/effects and one for output. Some old computers did it that way, namely the Ataris. You had a chip to render on the fly and another to handle the output.
Any ideas? Wishlist?