List of possible Gigatron mods
Posted: 18 Nov 2020, 19:36
Here is a list of as many possible Gigatron modifications I can see. Keep in mind that this list is only to give ideas and inspire creativity. Nothing said here is to be taken as a criticism of the platform. The Gigatron is already beautiful as it is.
Opcode Mods
Add more memory addressing modes -- Most would love to have more addressing modes to simplify coding.
Increment Y as a carry of X -- Of course, this should not be the only option here. There is beauty in X rolling over without incrementing Y since that facilitates video scrolling. The 96 bytes past the 160-pixel limit means that you can have extra graphics data for a row. However, if you need the pointers to act in a "far" fashion, an upper byte counter would be nice. Or add a new register that does this and give it flexible address mode options.
Modify the Ac=Ac+Ac instruction to be a full left-shift instruction -- One could make SHL 0 the same as SHL 1 to maintain compatibility. This would make unsigned multiplication easier. The immediate field could be used to specify the shift distance.
Add a Shift Right instruction -- This will make programming life easier and allow some of the most common division operations from a single instruction.
Add additional ISAs -- One could use a redundant instruction to page between instruction sets. So if you need more memory access modes than what can fit in the standard design, you could place them on another page. So you could fit 510 or more instructions in an 8-bit space. It would likely be good to have the port instructions on every page and keep the paging instruction in the same place on every page. An idea from the forum proposes converting the control unit into a module and inserting different control units.
Use the immediate field for instructions in instructions with no operands. -- Instructions with no operands waste the immediate space, so it would be nice to do work in those. However, jumps and things that need the main bus would be out of the question. It would be interesting to generate sound or toggle lights while slinging pixels, for instance. Alternatively, use that space to drive a port directly or even use that area for program storage, much like steganography. So it is possible to store additional vCPU programs in this space.
Use the Instruction field for a block copy opcode -- For storage of tables and images, one could have an opcode that specifies a block ROM to RAM copy command with an argument to specify the number of words. So up to 512 bytes or 680 packed bytes could be stored in a single block. That would mainly affect density, not speed, depending on how this is implemented. Halting may be needed as it copies, though you could implement interleaved RAM channels that are 8-bit addressed but can be made to act as 16-bits (though that would not give enough time if you store packed bytes). Or some sort of concurrent DMA could be used with very fast SRAM.
Port Mods
Add a port status/command register with supporting instructions -- This could be a command port for changing video modes, multiplexing the ports, and more. Or at least add a line to reset an external frame buffer if you use one. Then you'd have a known state. This idea might work even better if one adds a vCPU coprocessor. Then it could have 16-bit port instructions and send commands and data together. So a Harvard-like port. One bit of a higher port could be used to provide 9-bit graphics. That is something I know of no platform ever using. One could use the high bit to choose between using a 15-bit video mode or using the upper 8 bits for commands. If you want to use the port for 16-bit data transfers, one possibility would be to add an "appointment" command so the mode bit could be used for data for a specified number of cycles.
Use different port strategies -- The Gigatron lets the port hold onto a byte, so you can run other code as long as the byte doesn't need to change. There may be times when a single-shot signal is more appropriate. For instance, if you use a ported frame buffer for video, you don't want to overwrite what is in there with what is in the port if you don't want to send anything. Another possibility would be to have the port tied to a more autonomous memory unit. So when you are using only the ROM, you could have built-in DMA capabilities to send data to the port from RAM.
General Performance Mods
Use a carry-select adder arrangement in the ALU -- This involves adding a 3rd adder chip and a multiplexer. It is faster to switch which high nybble is on the bus than waiting for the high nybble adder to add the carry. One high nybble adder would get the carry signal from the ground plane, and the other would get it from Vcc. The carry-out line from the low nybble would determine which result goes on the bus. Doing this could help achieve faster clock rates.
Rework the control unit to avoid using Ac on non-port Moves -- This would prevent clobbering the Accumulator during moves and create less work for the programmer in some cases. An unfortunate side-effect is that X-Out will likely stop working. One could add instructions to give another port or at least change how the Out port is multiplexed. Besides simplifying some code, another benefit could be more stability at higher clock rates since the ALU would be utilized less, reducing the critical path for some instructions and possibly reducing heat issues. It would need to use the ALU (or at least Ac) on port-related moves to ensure that the ports can still use logic ops.
Decentralize the ALU -- One could probably do some of the ALU ops faster using dedicated chips for specific functions. That could even get past the previous item's problems since the port could have its own logic unit. After all, the X-register has its own adder. So the port could have its own AND and OR gates. This would be better to attempt in FPGA.
Add another pipeline stage -- This could increase the clock rate since the execution stage is the longest. So this could entail using registers to split decoding from the ALU. That might push things closer to 25 Mhz. Some quick calculations say at least 18 Mhz. If a 6502 designed in TTL can do 20 Mhz, I'm sure the Gigatron could be boosted to that. A side-effect here might mean a second delay slot after branches. That would require a new ROM to support that since that would alter how trampoline code and branches work.
Video Mods
Hardware-generated syncs -- In and of itself, this can give modest gains with bit-banging since there would be fewer port instructions. You'd still need to count cycles. It doesn't particularly matter if this is clocked at 25 Mhz since the Gigatron' could use quadruple the pixels in both directions. Just make sure the syncs are in phase with the Gigatron. One way to do that would be to derive the Gigatron clock from the video clock. If you had 2 more registers, it would be easier to alternate between vCPU and bit-banged video since it would be easier to interleave the 2.
Do line quadrupling in hardware --- This would give slightly better performance than skipping 3 lines (due to ROM changes) and would allow sending all the lines.
Compress 4 pixels into 3 bytes in hardware --- This would allow four 64-color pixels to be sent in the time of 3. So you save a clock cycle every 4 pixels. This would be more useful with a frame buffer (and hardware syncs) since you can make all the free time contiguous.
Add a frame buffer -- This could improve performance in various ways. This would mean video persistence, so you can send a frame and use entire frames for data processing. (Actually, we already have a frame buffer, though I imagine a remote one would have some advantages. You could probably ask it to do things to free up memory accesses on the local one.)
Create a text mode -- There are many ways to do this. The simplest way would be to create a monochrome text mode and send at least 6 pixels at once (8 if you do hardware syncs). This approach would free up memory and use fewer instructions. If you want colors, add color registers or an attribute map. If you use an attribute map (one entry per character), you wouldn't want to attempt graphics while in that mode, or you'd get color-smear. If one is making a video controller, you could send it only ASCII when it is in text mode. The video circuitry could do a ROM lookup and plot the pixels according to the ASCII code.
Add higher res modes -- For instance, 320 x 240 is not far out of reach. If memory is a concern, use 4 bits for the pixels, 2 for the color, and 2 for the syncs (unless other mods are applied). Without the syncs, you could send 4 pixels, 4 colors each. To do a full 64 colors or more, you'd likely need a frame buffer with 75K RAM.
Add hardware sprites -- If you are making a video card and using a frame buffer, you could do sprites in hardware, especially if using programmable logic. What would be nice would be if there were a sprite "chase" mode. Like what if one sprite could use the frame buffer as a map to only travel in reachable transparent/background areas to travel to another sprite (with non-colliding behavior regarding other sprites in the same layer) and somehow report its position to code? I wonder what that could do with PacMan if the ghosts were managed in hardware?
Use hardware scrolling -- With a frame buffer, this could be a matter of virtualizing the video RAM addresses. Just change the wrapping points. One would need a protocol for filling the missing/corrupted pixels after scrolling. Vertical scrolling is mostly a prerequisite for text mode, and virtualizing the frame buffer might be the most efficient way to "scroll" the screen. Maybe have an adder that works during syncs.
Add a command port -- This would allow for changing video modes in software, changing palettes, and more. This could even be used for sound, lights, and storage, assuming you have a frame buffer and hardware syncs. This idea would work better on a 16-bit design.
Add a .GIF/.PNG decompressor -- This would be more for advanced FPGA designs. If you build a port controller that includes hardware graphics, sound, and storage, you could tell the hardware CODEC to read from storage, convert, and send its data to the frame buffer. One might want to stick to the GIF87 format since it is simpler. GIF89 includes animation.
Add "weird" video modes -- What if we clock the Gigatron at 8.3 Mhz and have 213 X 160? That means that the virtual rows would be 3 actual lines rather than 4. Or, if one uses a frame buffer, one could do it like the Atari 800 and have a mode where you have a graphics window and a text box. The reason to do it that way is to save some buffer memory since it takes less RAM to store references to characters than individual pixels. While it would be good for hi-res games (put the stats, title, score, etc., in the box), it could have some technical or scientific uses. One could have a bitmapped diagram and an item key or text description in the text area.
Coprocessors
Add a math coprocessor -- At the least, add a hardware multiplier. You can easily do the first ten multipliers for unsigned values in a single cycle. You can simultaneously fill registers with all the shifted values you need and then add or subtract. For instance, if you want to multiply by 5, you have the original multiplicand with it shifted already by 1, by 2, by 3, etc. Then you'd take the SHL 2 and add to the original, then put that on the bus. Multiplying by 7 is a tad trickier, and subtracting seems to be the easiest strategy. So take the SHL 3 value and subtract the original. Really, going this far, one might want to build a binary multiplier. It works like long multiplication, though no actual multiplication is needed. Then add all the shifted intermediates together. This might be doable in 4 cycles, depending on how you implement it. This is much like the previous method, but without lookup tables or matrices.
Another multiplier idea could be a LUT and addition hybrid. You could have 4 nybble tables and use a total of 1k. Then you could look up and add the nybbles in parallel. That might save a cycle from the shift-add idea. This would be much like the FOIL method in algebra.
Add a vCPU coprocessor -- One way to speed up vCPU would be to make it an actual core. It should have a way to access RAM directly. So you'd need a memory unit or an arbiter of some kind that ties into the Gigatron's memory unit. One way to do this could be to give the Gigatron priority to the memory and only run vCPU when the RAM is not used. Another way could be to have a time-slicing memory controller that operates at a much higher speed. Thus the memory unit would have different channels. Or do a combination of both to make it more overclocking friendly. On the ROM side of things, the emulator could be replaced with some "listener" code to determine when the Gigatron core needs to do something. In FPGA, you might be able to have a BRAM table to tell the vCPU how long to halt before a valid result. Of course, you could simply have the system calls done on the vCPU core.
Sound
Hardware PSG -- While it is neat that the Gigatron can do everything in software, a hardware PSG would have some advantages. Like a dedicated video controller, it can manage the timings itself. So you save cycles of the precious sync timings. The sound could be cleaner since it would be working during the entire frame. You must admit that the "gritty" sound has an endearing quality. If you want to keep the roughness, that could be a sound controller mode. The PSG could add additional waveforms and noise. An FPGA version could be more flexible, but one could find a way to wire in the TI sound chip (used in the TI994A, also the Sega Genesis in addition to the Yamaha chip).
A better idea might be to implement the Gigatron's method as a PSG in FPGA. So it uses all the same memory. In that case, it could be a part of the video controller and let the syncs do arbitration as they do now. That would avoid hardware races.
Music mode -- I've never seen a PSG with a music mode. It could have a table of count values for every note. The 440 Hz "even-temperament scale" would be the most useful, but it could be tuned to other scales. At one point in time, the A above middle C was tuned to 434 Hz before it was standardized worldwide around WWII. It could even have added features such as buffering, looping, tempo adjustment, different note lengths, etc.
Add hardware .MP3/.OGG support -- An integrated I/O controller could contain a CODEC for this.
Blinkenlights
Add an autonomous light controller -- The Blinkenlights are fun, but removing them could offer a slight performance boost. A compromise would be to add a light sequencer. To stay within the spirit of the design, such a controller could have at least 3 modes. It could have a learn mode, a replay mode, and a manual mode. It could also have an off mode (or enable line) and a default mode. So you can set it and forget it if you want, use it the old way, or program the sequence and run that.
Peripherals and Storage
USB -- At the least, if one uses FPGA, one could have a USB slave port with at least UART capabilities. That would allow for installing, running, or transferring programs.
Different Platform Ideas
vCPU as a platform -- If one were to make dedicated controllers for everything such as video, sound, lights, I/O, etc., and if vCPU were an actual core, then would the Gigatron core be needed at all? Yes, that defeats the whole purpose of this project. The I/O controller might need to initialize the RAM with what the ROM puts there now.
Use 2 Gigatrons -- There are various ways to do this. Marcel gave possible plans for merging 2 Gigatrons. That plan was to let one handle the input and user code, and the other handle the lights, sound, and video. Another way to do this could be to multiplex the memory between them.
Build a Gigatron-like machine around Monosonite's Suite-16 ISA -- Since the Suite-16 is designed to use 16-bit RAM, it can do 8-bit operations in a single cycle. But it could do 16-bit (or even 24-bit) operations in 2 cycles.
Use a Gigatron with a Propeller chip -- The Propeller is a 32-bit microcontroller that has 8 "cogs." It was designed using FPGA and converted to an ASIC. You can use different cogs to provide support for different peripherals. So you could use 1-2 cogs for video, 1-2 cogs for sound, 1 cog for keyboard support, 1 cog for mouse support, etc. One might be able to use it as a math coprocessor too. It has built-in video support with yes, 6 bits, and 2 syncs. Someone even used the Propeller to emulate the Commodore SID chip.
Opcode Mods
Add more memory addressing modes -- Most would love to have more addressing modes to simplify coding.
Increment Y as a carry of X -- Of course, this should not be the only option here. There is beauty in X rolling over without incrementing Y since that facilitates video scrolling. The 96 bytes past the 160-pixel limit means that you can have extra graphics data for a row. However, if you need the pointers to act in a "far" fashion, an upper byte counter would be nice. Or add a new register that does this and give it flexible address mode options.
Modify the Ac=Ac+Ac instruction to be a full left-shift instruction -- One could make SHL 0 the same as SHL 1 to maintain compatibility. This would make unsigned multiplication easier. The immediate field could be used to specify the shift distance.
Add a Shift Right instruction -- This will make programming life easier and allow some of the most common division operations from a single instruction.
Add additional ISAs -- One could use a redundant instruction to page between instruction sets. So if you need more memory access modes than what can fit in the standard design, you could place them on another page. So you could fit 510 or more instructions in an 8-bit space. It would likely be good to have the port instructions on every page and keep the paging instruction in the same place on every page. An idea from the forum proposes converting the control unit into a module and inserting different control units.
Use the immediate field for instructions in instructions with no operands. -- Instructions with no operands waste the immediate space, so it would be nice to do work in those. However, jumps and things that need the main bus would be out of the question. It would be interesting to generate sound or toggle lights while slinging pixels, for instance. Alternatively, use that space to drive a port directly or even use that area for program storage, much like steganography. So it is possible to store additional vCPU programs in this space.
Use the Instruction field for a block copy opcode -- For storage of tables and images, one could have an opcode that specifies a block ROM to RAM copy command with an argument to specify the number of words. So up to 512 bytes or 680 packed bytes could be stored in a single block. That would mainly affect density, not speed, depending on how this is implemented. Halting may be needed as it copies, though you could implement interleaved RAM channels that are 8-bit addressed but can be made to act as 16-bits (though that would not give enough time if you store packed bytes). Or some sort of concurrent DMA could be used with very fast SRAM.
Port Mods
Add a port status/command register with supporting instructions -- This could be a command port for changing video modes, multiplexing the ports, and more. Or at least add a line to reset an external frame buffer if you use one. Then you'd have a known state. This idea might work even better if one adds a vCPU coprocessor. Then it could have 16-bit port instructions and send commands and data together. So a Harvard-like port. One bit of a higher port could be used to provide 9-bit graphics. That is something I know of no platform ever using. One could use the high bit to choose between using a 15-bit video mode or using the upper 8 bits for commands. If you want to use the port for 16-bit data transfers, one possibility would be to add an "appointment" command so the mode bit could be used for data for a specified number of cycles.
Use different port strategies -- The Gigatron lets the port hold onto a byte, so you can run other code as long as the byte doesn't need to change. There may be times when a single-shot signal is more appropriate. For instance, if you use a ported frame buffer for video, you don't want to overwrite what is in there with what is in the port if you don't want to send anything. Another possibility would be to have the port tied to a more autonomous memory unit. So when you are using only the ROM, you could have built-in DMA capabilities to send data to the port from RAM.
General Performance Mods
Use a carry-select adder arrangement in the ALU -- This involves adding a 3rd adder chip and a multiplexer. It is faster to switch which high nybble is on the bus than waiting for the high nybble adder to add the carry. One high nybble adder would get the carry signal from the ground plane, and the other would get it from Vcc. The carry-out line from the low nybble would determine which result goes on the bus. Doing this could help achieve faster clock rates.
Rework the control unit to avoid using Ac on non-port Moves -- This would prevent clobbering the Accumulator during moves and create less work for the programmer in some cases. An unfortunate side-effect is that X-Out will likely stop working. One could add instructions to give another port or at least change how the Out port is multiplexed. Besides simplifying some code, another benefit could be more stability at higher clock rates since the ALU would be utilized less, reducing the critical path for some instructions and possibly reducing heat issues. It would need to use the ALU (or at least Ac) on port-related moves to ensure that the ports can still use logic ops.
Decentralize the ALU -- One could probably do some of the ALU ops faster using dedicated chips for specific functions. That could even get past the previous item's problems since the port could have its own logic unit. After all, the X-register has its own adder. So the port could have its own AND and OR gates. This would be better to attempt in FPGA.
Add another pipeline stage -- This could increase the clock rate since the execution stage is the longest. So this could entail using registers to split decoding from the ALU. That might push things closer to 25 Mhz. Some quick calculations say at least 18 Mhz. If a 6502 designed in TTL can do 20 Mhz, I'm sure the Gigatron could be boosted to that. A side-effect here might mean a second delay slot after branches. That would require a new ROM to support that since that would alter how trampoline code and branches work.
Video Mods
Hardware-generated syncs -- In and of itself, this can give modest gains with bit-banging since there would be fewer port instructions. You'd still need to count cycles. It doesn't particularly matter if this is clocked at 25 Mhz since the Gigatron' could use quadruple the pixels in both directions. Just make sure the syncs are in phase with the Gigatron. One way to do that would be to derive the Gigatron clock from the video clock. If you had 2 more registers, it would be easier to alternate between vCPU and bit-banged video since it would be easier to interleave the 2.
Do line quadrupling in hardware --- This would give slightly better performance than skipping 3 lines (due to ROM changes) and would allow sending all the lines.
Compress 4 pixels into 3 bytes in hardware --- This would allow four 64-color pixels to be sent in the time of 3. So you save a clock cycle every 4 pixels. This would be more useful with a frame buffer (and hardware syncs) since you can make all the free time contiguous.
Add a frame buffer -- This could improve performance in various ways. This would mean video persistence, so you can send a frame and use entire frames for data processing. (Actually, we already have a frame buffer, though I imagine a remote one would have some advantages. You could probably ask it to do things to free up memory accesses on the local one.)
Create a text mode -- There are many ways to do this. The simplest way would be to create a monochrome text mode and send at least 6 pixels at once (8 if you do hardware syncs). This approach would free up memory and use fewer instructions. If you want colors, add color registers or an attribute map. If you use an attribute map (one entry per character), you wouldn't want to attempt graphics while in that mode, or you'd get color-smear. If one is making a video controller, you could send it only ASCII when it is in text mode. The video circuitry could do a ROM lookup and plot the pixels according to the ASCII code.
Add higher res modes -- For instance, 320 x 240 is not far out of reach. If memory is a concern, use 4 bits for the pixels, 2 for the color, and 2 for the syncs (unless other mods are applied). Without the syncs, you could send 4 pixels, 4 colors each. To do a full 64 colors or more, you'd likely need a frame buffer with 75K RAM.
Add hardware sprites -- If you are making a video card and using a frame buffer, you could do sprites in hardware, especially if using programmable logic. What would be nice would be if there were a sprite "chase" mode. Like what if one sprite could use the frame buffer as a map to only travel in reachable transparent/background areas to travel to another sprite (with non-colliding behavior regarding other sprites in the same layer) and somehow report its position to code? I wonder what that could do with PacMan if the ghosts were managed in hardware?
Use hardware scrolling -- With a frame buffer, this could be a matter of virtualizing the video RAM addresses. Just change the wrapping points. One would need a protocol for filling the missing/corrupted pixels after scrolling. Vertical scrolling is mostly a prerequisite for text mode, and virtualizing the frame buffer might be the most efficient way to "scroll" the screen. Maybe have an adder that works during syncs.
Add a command port -- This would allow for changing video modes in software, changing palettes, and more. This could even be used for sound, lights, and storage, assuming you have a frame buffer and hardware syncs. This idea would work better on a 16-bit design.
Add a .GIF/.PNG decompressor -- This would be more for advanced FPGA designs. If you build a port controller that includes hardware graphics, sound, and storage, you could tell the hardware CODEC to read from storage, convert, and send its data to the frame buffer. One might want to stick to the GIF87 format since it is simpler. GIF89 includes animation.
Add "weird" video modes -- What if we clock the Gigatron at 8.3 Mhz and have 213 X 160? That means that the virtual rows would be 3 actual lines rather than 4. Or, if one uses a frame buffer, one could do it like the Atari 800 and have a mode where you have a graphics window and a text box. The reason to do it that way is to save some buffer memory since it takes less RAM to store references to characters than individual pixels. While it would be good for hi-res games (put the stats, title, score, etc., in the box), it could have some technical or scientific uses. One could have a bitmapped diagram and an item key or text description in the text area.
Coprocessors
Add a math coprocessor -- At the least, add a hardware multiplier. You can easily do the first ten multipliers for unsigned values in a single cycle. You can simultaneously fill registers with all the shifted values you need and then add or subtract. For instance, if you want to multiply by 5, you have the original multiplicand with it shifted already by 1, by 2, by 3, etc. Then you'd take the SHL 2 and add to the original, then put that on the bus. Multiplying by 7 is a tad trickier, and subtracting seems to be the easiest strategy. So take the SHL 3 value and subtract the original. Really, going this far, one might want to build a binary multiplier. It works like long multiplication, though no actual multiplication is needed. Then add all the shifted intermediates together. This might be doable in 4 cycles, depending on how you implement it. This is much like the previous method, but without lookup tables or matrices.
Another multiplier idea could be a LUT and addition hybrid. You could have 4 nybble tables and use a total of 1k. Then you could look up and add the nybbles in parallel. That might save a cycle from the shift-add idea. This would be much like the FOIL method in algebra.
Add a vCPU coprocessor -- One way to speed up vCPU would be to make it an actual core. It should have a way to access RAM directly. So you'd need a memory unit or an arbiter of some kind that ties into the Gigatron's memory unit. One way to do this could be to give the Gigatron priority to the memory and only run vCPU when the RAM is not used. Another way could be to have a time-slicing memory controller that operates at a much higher speed. Thus the memory unit would have different channels. Or do a combination of both to make it more overclocking friendly. On the ROM side of things, the emulator could be replaced with some "listener" code to determine when the Gigatron core needs to do something. In FPGA, you might be able to have a BRAM table to tell the vCPU how long to halt before a valid result. Of course, you could simply have the system calls done on the vCPU core.
Sound
Hardware PSG -- While it is neat that the Gigatron can do everything in software, a hardware PSG would have some advantages. Like a dedicated video controller, it can manage the timings itself. So you save cycles of the precious sync timings. The sound could be cleaner since it would be working during the entire frame. You must admit that the "gritty" sound has an endearing quality. If you want to keep the roughness, that could be a sound controller mode. The PSG could add additional waveforms and noise. An FPGA version could be more flexible, but one could find a way to wire in the TI sound chip (used in the TI994A, also the Sega Genesis in addition to the Yamaha chip).
A better idea might be to implement the Gigatron's method as a PSG in FPGA. So it uses all the same memory. In that case, it could be a part of the video controller and let the syncs do arbitration as they do now. That would avoid hardware races.
Music mode -- I've never seen a PSG with a music mode. It could have a table of count values for every note. The 440 Hz "even-temperament scale" would be the most useful, but it could be tuned to other scales. At one point in time, the A above middle C was tuned to 434 Hz before it was standardized worldwide around WWII. It could even have added features such as buffering, looping, tempo adjustment, different note lengths, etc.
Add hardware .MP3/.OGG support -- An integrated I/O controller could contain a CODEC for this.
Blinkenlights
Add an autonomous light controller -- The Blinkenlights are fun, but removing them could offer a slight performance boost. A compromise would be to add a light sequencer. To stay within the spirit of the design, such a controller could have at least 3 modes. It could have a learn mode, a replay mode, and a manual mode. It could also have an off mode (or enable line) and a default mode. So you can set it and forget it if you want, use it the old way, or program the sequence and run that.
Peripherals and Storage
USB -- At the least, if one uses FPGA, one could have a USB slave port with at least UART capabilities. That would allow for installing, running, or transferring programs.
Different Platform Ideas
vCPU as a platform -- If one were to make dedicated controllers for everything such as video, sound, lights, I/O, etc., and if vCPU were an actual core, then would the Gigatron core be needed at all? Yes, that defeats the whole purpose of this project. The I/O controller might need to initialize the RAM with what the ROM puts there now.
Use 2 Gigatrons -- There are various ways to do this. Marcel gave possible plans for merging 2 Gigatrons. That plan was to let one handle the input and user code, and the other handle the lights, sound, and video. Another way to do this could be to multiplex the memory between them.
Build a Gigatron-like machine around Monosonite's Suite-16 ISA -- Since the Suite-16 is designed to use 16-bit RAM, it can do 8-bit operations in a single cycle. But it could do 16-bit (or even 24-bit) operations in 2 cycles.
Use a Gigatron with a Propeller chip -- The Propeller is a 32-bit microcontroller that has 8 "cogs." It was designed using FPGA and converted to an ASIC. You can use different cogs to provide support for different peripherals. So you could use 1-2 cogs for video, 1-2 cogs for sound, 1 cog for keyboard support, 1 cog for mouse support, etc. One might be able to use it as a math coprocessor too. It has built-in video support with yes, 6 bits, and 2 syncs. Someone even used the Propeller to emulate the Commodore SID chip.