I've thought of other ways to mod the Gigatron. For instance, with an I/O expander board, why not connect the ports to it and let it provide the keyboard, video, and sound? The ports could be repurposed in the ROM to send/receive commands to the expander/controller. So you can even emulate interrupts. So every so often, you can read the In port and take data from it, and even use the jump to address trick to go to the relevant handler. A device could even request a functional "halt" or DMA access this way. If a device wants the Gigatron to halt, it can send a signal requesting the halt/DMA time, the ROM can see that, maybe output an acknowledge code, and then enter into a spinlock as it reads the In port until a clear signal is found. The RAM will be untouched during this time, and external devices are free to manipulate the RAM. The controller would be free to snoop the bus for the video and sound, understand the indirection table system, provide its own syncs, accept input and place it directly into the RAM, produce its own sound and video, provide file I/O, and more. All Pluggy and Pluggy reloaded functionality can go on that board, and even file I/O assistance can be added. So you have the much wider parallel pipe and microcontroller assistance in one place. So this could allow more communication with an outside controller that works mostly out of memory.
If you can move all the bit-banging to a controller board, you'd be free to clock the base machine at any speed you want without new ROMs each time. The controller could update at least the vertical sync in memory or otherwise make the base machine aware of when that changes so that software that uses a real-time clock could still work. That could also allow for dynamic profiling on boot to know how many machine cycles per page frame there are, and maybe per raster as well. So the ROM can then be free to alter the behavior dynamically based on the speed differences between the syncs and the base machine.
I haven't forgotten about my 75+ Mhz Giga-similar machine idea, but I likely will never do it. It does sound neat using a 3-4 stage pipeline, adding more native instructions like full shifting, multiplication, division, random numbers, more registers, some native 16-bit support, etc. The stages would be Fetch, Decode, Access, and Execute. Access would be before Execute because only reads are modified by instructions, never writes. So if you need to write/store, that could be done in the next instruction and use the Access stage that time. Plus, what would be neat would be also having an auxiliary ALU in the access stage to use that slot another way when RAM is not needed. So you could natively do 16-bit addition/subtraction/logic, but only using registers. Plus the extra "ALU" could provide random numbers when it is not being used for instructions and allow the result to be manipulated in the next pipeline slot (such as inverting it or adding an offset). Additional registers would need to be added if you want to still do bit-banging at 75+ Mhz. That way, both the video thread and the vCPU thread would have both contexts live at the same time and can switch without penalty. So you can have 1 clock for a pixel, 11 instructions to use for vCPU, and then output a pixel, etc. At such speeds, you really don't need much external support as you'd have more power than you need. However, a machine with custom, RAM-based control units would be costly, inefficient, and require SMD parts for most things. The idea would be to use memory for the CU and the ALU(s), copy from ROM to fast SRAMs on boot, and use LUTs for everything.
***
Moving on
The more I think about things, I might want to just mess with a Propeller 2 chip and make my own ISA, memory map, etc. With 8 cogs, that is enough to have at least one CPU, one or more coprocessors, sound, I/O, etc. But I don't know what instruction set and features I'd like to add.
Instructions and ISA
While I could use the native P2 instructions, I think it might be more fun to make my own. I don't know what all to include. Probably include most of what is in the 6502 and/or vCPU instruction set, and if there is any space in the opcode map left over, add things like RNG, Mult, Div, and maybe a trig function or 2.
I haven't worked out the ISA size yet. I'd love to get to a point where I can use word memory with 20 address lines as external memory. That sounds like a challenge. Counting up to 5 control lines (word and wider memory add a control line per byte), that would take 41 GPIO lines out of 56 non-shared lines (64 in total). That isn't too bad. Of the 15 left, that would mean 5 for video (built-in DAC), 2 for keyboard, 5 or whatever for SD, and maybe 2 for sound. If more are needed, maybe the external memory could be multiplexed.
As for the ISA, I'm not sure. If I want to use external 16-bit RAM, maybe have instructions with 8 and 16-bit opcodes. The byte instructions can have a byte for an operand. For 16-bit operands, that will tie up a doubleword, and not sure what to do with the other byte, whether to let it access up to 256 byte registers or just make it use a 24-bit operand or do both. The 24-bit operand might be a good thing since it would allow an absolute jump for the entire range as an immediate. That might be better than how Intel did things since even protected mode didn't access memory in a flat plane. Oh, that was presented to the user like that as an emulation, but it always used segment:offset under the hood, even when the user didn't see it, and despite 32-bit operands. I'm not sure what instructions I'd like to have beyond the basics. I'd like Mult, Div, RND, bounded RND, and due to the overhead of it being an emulation, probably block instructions and maybe loop instructions. I'm not sure if I'd want elaborate memory instructions like ternary memory ops (eg., [mem]+[mem]=[mem]).
Of course, I'd need to decide on what to do for sound and video. I'd want no fewer than 4 sound channels, and probably 15-18 Khz as the top frequency. For accuracy, I'd likely want to use an external crystal. Sure, the P2 has an internal clock that does around 20 Mhz, and that varies per chip. The exact frequency doesn't matter, so long as it is known and doesn't drift. One can code the ROM to use the internal PLL and VCO to get whatever you need. And I don't know what waveforms and capabilities to provide. Obviously, I'd want square, ramp, triangle, and noise. Sine might be nice to have, as well as combination waveforms and near-instrument sounds. I might want a sound coprocessor besides just a sound generator to produce more complex sounds.
For the video, I don't know if I want 320x240 or what. I'd want a text mode. I'm not sure what features I'd want. I'd probably want hardware scrolling and sprites. I don't know if I'd want to use 2 cogs for video or not. You could use 2 (preferably neighboring to use shared LUT RAM) and have one for rendering/effects and one for output. Some old computers did it that way, namely the Ataris. You had a chip to render on the fly and another to handle the output.
Any ideas? Wishlist?
Beyond the Gigatron
Beyond the Gigatron
Last edited by Sugarplum on 01 Sep 2023, 09:44, edited 2 times in total.
Re: Beyond the Gigatron
Something else to consider would be if one wants to move beyond the Gigatron and still have compatibility with the Gigatron are alternatives to the GT1 format with new extensions. For instance, if one wants to use word RAM and move to a true 16-bit machine, one could use mostly the same format as the GT1 but use words for the cargo, with the segment length being the number of words.
TBH, a new memory map should be used in the above case and preferably with the most important addresses on word-aligned boundaries. Then, a word-based memory map could be more feasible, and an extended machine with word memory and word native instructions would be possible. Since the native core machine is 8 bits, no thought was given to alignment and misalignment penalties, since that only applies when memory is wider than 8-9 bits. There is 8-bit, 16-bit, 32-bit, and the lesser-common 24-bit memory. Plus there is parity RAM and FPGA BRAM which can be 9, 18, 27, or 36-bits. You don't have to use it as parity RAM (where you have circuitry to count the 1s and set it if the number of 1s is even). You could use the extra bits for other things. If it is 8-16 (or 8-24, 8-32, 9-18, etc.) then you don't have to worry about alignment as all fetches, loads, and stores are 8 bits at a time. So there is no additional overhead since you are using the maximum overhead regardless. But if you have 16-bits and want to do a 16-bit transfer, you'd prefer to move 16-bits at a time. So that could add up to 2 cycles, in some systems, though in this case, it could add just one due to the load with increment feature. On an unaligned read, the microcode or core ROM would have to read the upper byte of the base address and the lower byte of the next address. So if I (or anyone) make something like the Gigatron with 16-bit RAM and native instructions to use it, then I'd want to use a new memory map that puts everything important on even addresses (if using a byte map and word memory) or requires 16-bit addresses.
And then there are the allowed segment addresses. I mean, GT1 files only provide for pages and offsets. Yet I hear talk of loading programs larger than 64K. How do they do that? With a new format such as what I proposed above that uses words for program data, one could include a header to specify whether the code is intended to run on a machine with an explicit memory segment register or not. So it is a way of specifying whether there are 2-byte, 3-byte (or more) addresses.
Then, another consideration is how one might do overlays or DLL equivalents. Right now, there is no way of using files with more code that can fit into memory at once and being able to usefully use all of the code (beyond using GT1 as an animated bitmap display format). Sure, GT1 files can overwrite any memory they have already written to, but can only run what was placed there last. This hasn't really been discussed because folks are still working on mass storage peripherals and gaining speed there. There is no use to make a game with a huge playfield map if it takes two minutes every time you reach a threshold to load the next map fragment. But assuming overlays can be used, it would be nice to have a file extension, either for an overlay-specific file or even an extensible format that includes internal overlays. So how would such a program work? You'd have initialization code, common code and global variables, an overlay manager, and overlay modules. Any initialization code or splash screens would be fair game to evict once they are executed. There would likely need to be an API to select the file segments and jump to those. Maybe the overlay file should have a table in a fixed location within it with the addresses of each starting code fragment. Then from there, it could be similar to a GT1 file. So you have the master table of file locations for each code module (possibly with IDs if needed, or at least a count of how many exist), and then at each of those, you have the pages and offsets, lengths, and code for as much as needed and then the entry point (like a GT1 file).
TBH, a new memory map should be used in the above case and preferably with the most important addresses on word-aligned boundaries. Then, a word-based memory map could be more feasible, and an extended machine with word memory and word native instructions would be possible. Since the native core machine is 8 bits, no thought was given to alignment and misalignment penalties, since that only applies when memory is wider than 8-9 bits. There is 8-bit, 16-bit, 32-bit, and the lesser-common 24-bit memory. Plus there is parity RAM and FPGA BRAM which can be 9, 18, 27, or 36-bits. You don't have to use it as parity RAM (where you have circuitry to count the 1s and set it if the number of 1s is even). You could use the extra bits for other things. If it is 8-16 (or 8-24, 8-32, 9-18, etc.) then you don't have to worry about alignment as all fetches, loads, and stores are 8 bits at a time. So there is no additional overhead since you are using the maximum overhead regardless. But if you have 16-bits and want to do a 16-bit transfer, you'd prefer to move 16-bits at a time. So that could add up to 2 cycles, in some systems, though in this case, it could add just one due to the load with increment feature. On an unaligned read, the microcode or core ROM would have to read the upper byte of the base address and the lower byte of the next address. So if I (or anyone) make something like the Gigatron with 16-bit RAM and native instructions to use it, then I'd want to use a new memory map that puts everything important on even addresses (if using a byte map and word memory) or requires 16-bit addresses.
And then there are the allowed segment addresses. I mean, GT1 files only provide for pages and offsets. Yet I hear talk of loading programs larger than 64K. How do they do that? With a new format such as what I proposed above that uses words for program data, one could include a header to specify whether the code is intended to run on a machine with an explicit memory segment register or not. So it is a way of specifying whether there are 2-byte, 3-byte (or more) addresses.
Then, another consideration is how one might do overlays or DLL equivalents. Right now, there is no way of using files with more code that can fit into memory at once and being able to usefully use all of the code (beyond using GT1 as an animated bitmap display format). Sure, GT1 files can overwrite any memory they have already written to, but can only run what was placed there last. This hasn't really been discussed because folks are still working on mass storage peripherals and gaining speed there. There is no use to make a game with a huge playfield map if it takes two minutes every time you reach a threshold to load the next map fragment. But assuming overlays can be used, it would be nice to have a file extension, either for an overlay-specific file or even an extensible format that includes internal overlays. So how would such a program work? You'd have initialization code, common code and global variables, an overlay manager, and overlay modules. Any initialization code or splash screens would be fair game to evict once they are executed. There would likely need to be an API to select the file segments and jump to those. Maybe the overlay file should have a table in a fixed location within it with the addresses of each starting code fragment. Then from there, it could be similar to a GT1 file. So you have the master table of file locations for each code module (possibly with IDs if needed, or at least a count of how many exist), and then at each of those, you have the pages and offsets, lengths, and code for as much as needed and then the entry point (like a GT1 file).
Re: Beyond the Gigatron
I only begun to read about your intentions.
If so why not to use some RISC-V as for example latest ESP module base processor for all of that your ideas?
What is Gigatron?
Gigatron has every part of modern days personal computer but every as much primitive as it could only fulfil its function.
Instead of soundcard it beeps video accelerator only slows it and actually it the same with video board and with CPU just with different microcode for each task.
It's math deadly slow but sufficient to demonstrate every kind of software.
But exactly such primitive way resolves to understand what is inside of every computer where all the complexity is made of of such primitive units which Gigatron reuses.
At flea market I bought MB optioned with 4GB of memory. It has characteristics of all what did you shaped as so development.
Why not to make Matrix of Gigatrones. with plural processors. If one could be dedicated for sound one or two for video we could have sampler's of several kinds of synthesised sound and video with higher resolution. Good idea, I think, to craft memory modules interface with memory protection and quicker ALU.
If so why not to use some RISC-V as for example latest ESP module base processor for all of that your ideas?
What is Gigatron?
Gigatron has every part of modern days personal computer but every as much primitive as it could only fulfil its function.
Instead of soundcard it beeps video accelerator only slows it and actually it the same with video board and with CPU just with different microcode for each task.
It's math deadly slow but sufficient to demonstrate every kind of software.
But exactly such primitive way resolves to understand what is inside of every computer where all the complexity is made of of such primitive units which Gigatron reuses.
At flea market I bought MB optioned with 4GB of memory. It has characteristics of all what did you shaped as so development.
Why not to make Matrix of Gigatrones. with plural processors. If one could be dedicated for sound one or two for video we could have sampler's of several kinds of synthesised sound and video with higher resolution. Good idea, I think, to craft memory modules interface with memory protection and quicker ALU.
Re: Beyond the Gigatron
Why no multiplication and division?
As for example when I needed envelop for wave table sound synthesiser on ATMEGA I used reduced 4-bits or 5 multiplication applied on fixed saw wave table voices.
It took more just 5 or 4 bits multiplication took more processor performance then rest of the sound synthesis.
It was MACRO with conditional addition and shifting of both operands.
It were 4 voices of two solo on bass and one drum with reduced multiplication took more processor speed then melody and sound synthesiser.
Full multiplication and division, will make developers to abuse with them, when without appropriate hardware implementation it works really slow.
P.S. At morning that synthesiser time played a-la mixture Persian Blues of music group named " Fortran-V" with Persian traditional music. simply my contentiousness adaptates to samplers and even loud Kurzweil-style sounds do not awake me. (Making me to wonder how Kurzweil himself planned to awake dead in his theory.) That is why I crafted synthesiser of melodies and sound.
Recently I listen to air-raid alarms, really psychedelic polyphonic sound with deep glissando and and deep natural reverberation before falling to bed.
As for example when I needed envelop for wave table sound synthesiser on ATMEGA I used reduced 4-bits or 5 multiplication applied on fixed saw wave table voices.
It took more just 5 or 4 bits multiplication took more processor performance then rest of the sound synthesis.
It was MACRO with conditional addition and shifting of both operands.
It were 4 voices of two solo on bass and one drum with reduced multiplication took more processor speed then melody and sound synthesiser.
Full multiplication and division, will make developers to abuse with them, when without appropriate hardware implementation it works really slow.
P.S. At morning that synthesiser time played a-la mixture Persian Blues of music group named " Fortran-V" with Persian traditional music. simply my contentiousness adaptates to samplers and even loud Kurzweil-style sounds do not awake me. (Making me to wonder how Kurzweil himself planned to awake dead in his theory.) That is why I crafted synthesiser of melodies and sound.
Recently I listen to air-raid alarms, really psychedelic polyphonic sound with deep glissando and and deep natural reverberation before falling to bed.
Re: Beyond the Gigatron
Hello there.
I though Why not to make matrix style interface between different board.
For it to be unified hardware interface though which different boards could be connected.
As for example one is making sound other one producing video, yet another possibly managing network yet another managing file system.
They may be different architectures. Some may be even Arduino running KontikiOS, other may be Gigatron with appropriate ROM to run Z-LisP or Prologue machines with video is implemented on yet another one.
It would be enough to have few lines of interface.
And in total count from 1 to 8 of such interfaces some of which may be even single directional.
output
output signals
input signals
I though Why not to make matrix style interface between different board.
For it to be unified hardware interface though which different boards could be connected.
As for example one is making sound other one producing video, yet another possibly managing network yet another managing file system.
They may be different architectures. Some may be even Arduino running KontikiOS, other may be Gigatron with appropriate ROM to run Z-LisP or Prologue machines with video is implemented on yet another one.
It would be enough to have few lines of interface.
And in total count from 1 to 8 of such interfaces some of which may be even single directional.
output
output signals
- D0..D7
- tag ready
- data ready
- received
input signals
- D0..D7
- tag ready
- data ready
- received
Re: Beyond the Gigatron
Another idea of Gigatron development is to make a board with an opportunity to upgrade with better functionality.
Basically only AC X Y and even may be no ADD. And few ways to upgrade possibly with more registers. better instruction, ect...
with conditional generator of system ROM.
Basically only AC X Y and even may be no ADD. And few ways to upgrade possibly with more registers. better instruction, ect...
with conditional generator of system ROM.
Re: Beyond the Gigatron
On multiplication and division, my Gigasimilar idea was to have that in an external ROM. But there are other ways to do that in a homebrew design. In the 74xx family, there used to be "Wallace Tree Adders." I don't remember if Radio Shack used to sell them, but I had seen them before somewhere. I didn't know what they were. They had 8 lines going in and 8 lines going out. Then recently, it dawned on me. Those were nibble multipliers. Propagation was quite hefty with about 45 ns or so.
I think most here know that doubling the operand sizes for multiplication takes about 6 times the effort. In hardware, if the Wallace tree adders were available, you could use 4 of them and do 8/8/16 multiplication. You do the 2 end multiplications and put them both in your 16-bit accumulator if you have one. You also do the cross-multiplications. In the next cycle, add the 2 cross-multiplications, and in the last cycle, add that partial to the accumulator starting a nibble up. And 3 cycles aren't bad for that, and modern-day microcontrollers can do that in a single cycle, even at hundreds of MHz.
Or, if you only want to do an unsigned multiplication times 10 and get a 12-bit result, doing that in hardware is rather easy and requires only 2 adders. You take the original number and add just the upper 6 bits of the same number and add them together. The 2 missing inputs on the top adder are tied to the ground plane. Bits 3-11 of the result come from the adders (bit 11 comes from the carry-out of the top nibble). Bit 0 comes from the ground plane. Bits 1-2 come from bits 0-1 of the original number. So you can do all that within Gigatron speeds in a single cycle easily.
Multiplying by 10 and adding is commonly used to convert string digits to binary. You take the right-most digit and add that to the next digit times 10 and the next times 100, etc. I left out the part about subtracting the ASCII code for 0 for each string digit before adding and multiplying by powers of 10.
One reason why early machines used BCD, was that it made conversion to strings easier. Since each nibble is a decimal digit, you'd only add 48 to each digit and you'd have a string. You wouldn't have to keep dividing by 10 and using the remainder each time to build the string (right to left).
***
Now, I wouldn't remove ADD/SUB from the Gigatron. The challenge is to use the opcode slots of the more useless/redundant instructions. Most of those are register instructions where the same register is both arguments. If you XOR or subtract the accumulator from the accumulator, you always get 0. If you AND/OR/LD the accumulator with itself, you get the same number, thus making it a NOP. Adding the accumulator to itself is useful as that is a left shift. Conditions are irrelevant for register-only instructions (not affecting the PC), and that leads to duplication.
But if one wants to use a memory-mapped or near-DMA-based Gigalike machine, you could get rid of all the IN and OUT instructions and free up 100 opcodes (64 INs +36 remaining OUTs). Then you could use my idea for a unified, autonomous I/O controller. Though really, keeping the ports would be desirable as they could be repurposed.
One way to "arbitrarily" change the instructions would be to change the control unit with some sort of fast ROM or programmable logic. Then you can use new control lines where the current opcodes don't make sense to use.
***
At this point, I am pondering replacing the ALU with an L4C383 (or IDT7383). That is a 16-bit ALU with a lower latency than the Gigatron ALU. One challenge is that it is a PLCC. It could replace 11 chips and do 16-bit operations. But then I'd have to rework the diodes to generate the correct control codes and figure out how to do the other operations. The Gigatron ALU also loads, stores, and branches. So I'd need other logic for those.
I think most here know that doubling the operand sizes for multiplication takes about 6 times the effort. In hardware, if the Wallace tree adders were available, you could use 4 of them and do 8/8/16 multiplication. You do the 2 end multiplications and put them both in your 16-bit accumulator if you have one. You also do the cross-multiplications. In the next cycle, add the 2 cross-multiplications, and in the last cycle, add that partial to the accumulator starting a nibble up. And 3 cycles aren't bad for that, and modern-day microcontrollers can do that in a single cycle, even at hundreds of MHz.
Or, if you only want to do an unsigned multiplication times 10 and get a 12-bit result, doing that in hardware is rather easy and requires only 2 adders. You take the original number and add just the upper 6 bits of the same number and add them together. The 2 missing inputs on the top adder are tied to the ground plane. Bits 3-11 of the result come from the adders (bit 11 comes from the carry-out of the top nibble). Bit 0 comes from the ground plane. Bits 1-2 come from bits 0-1 of the original number. So you can do all that within Gigatron speeds in a single cycle easily.
Multiplying by 10 and adding is commonly used to convert string digits to binary. You take the right-most digit and add that to the next digit times 10 and the next times 100, etc. I left out the part about subtracting the ASCII code for 0 for each string digit before adding and multiplying by powers of 10.
One reason why early machines used BCD, was that it made conversion to strings easier. Since each nibble is a decimal digit, you'd only add 48 to each digit and you'd have a string. You wouldn't have to keep dividing by 10 and using the remainder each time to build the string (right to left).
***
Now, I wouldn't remove ADD/SUB from the Gigatron. The challenge is to use the opcode slots of the more useless/redundant instructions. Most of those are register instructions where the same register is both arguments. If you XOR or subtract the accumulator from the accumulator, you always get 0. If you AND/OR/LD the accumulator with itself, you get the same number, thus making it a NOP. Adding the accumulator to itself is useful as that is a left shift. Conditions are irrelevant for register-only instructions (not affecting the PC), and that leads to duplication.
But if one wants to use a memory-mapped or near-DMA-based Gigalike machine, you could get rid of all the IN and OUT instructions and free up 100 opcodes (64 INs +36 remaining OUTs). Then you could use my idea for a unified, autonomous I/O controller. Though really, keeping the ports would be desirable as they could be repurposed.
One way to "arbitrarily" change the instructions would be to change the control unit with some sort of fast ROM or programmable logic. Then you can use new control lines where the current opcodes don't make sense to use.
***
At this point, I am pondering replacing the ALU with an L4C383 (or IDT7383). That is a 16-bit ALU with a lower latency than the Gigatron ALU. One challenge is that it is a PLCC. It could replace 11 chips and do 16-bit operations. But then I'd have to rework the diodes to generate the correct control codes and figure out how to do the other operations. The Gigatron ALU also loads, stores, and branches. So I'd need other logic for those.
Re: Beyond the Gigatron
I'd rather use the Propeller 2 as an emulator. It is an 8-core, 32-bit MCU. But if I combine it with a Gigasimilar machine, it would be the complete I/O controller. The approach I'd use is mostly snooping through a fixed window. So maybe have up to 3 bytes as an indirection for the indirection table so it can be moved around. You might be able to reserve 2 pages for that. Like, have all I/O addresses, the table, and part of a framebuffer or display list there. So you communicate with the controller through there. Then for the keyboard, communicate either through DMA or by writing to memory during the descending clock.
I worked out how to do bus-mastering DMA on the Gigatron. The Gigatron should initiate it. So the native code sends the controller command. It uses multiplexers to take control of the RAM. Then the next Gigasimilar instruction is a spinlock. It hammers an address until it gets the expected result. That breaks the spinlock.
But imagine how powerful the Gigatron could be with the P2 chip doing all I/O tasks. It could likely be as powerful or more so than the 286. And changing the ALU might make it nearly as powerful as a 386 (in real mode). And the P2 could be a math coprocessor too. And the P2 could do things such as provide some FAT32 handling. 128 of a cog's registers could be a "sector" buffer.
The ideas mentioned above about a Gigaplex sounds interesting. But when it comes to video and sound, you could get by with just one extra Gigatron for that. But yeah, 2 Gigatrons could be chained to the first one and provide more advanced video. The idea is that the first one after the supervisor board could do graphics primitives. So let it handle text mode, graphics primitives, etc. Then it passes the finished stuff to the last one which is a renderer. The renderer machine doesn't have to be complete unless it is used for sound too. And extra features could be added, such as the last one having a screen blanking mode. So the video support board can clear the memory while the final one is outputting from a register, not the memory. Both would be free to clear their memory during the register screen blanking time.
I worked out how to do bus-mastering DMA on the Gigatron. The Gigatron should initiate it. So the native code sends the controller command. It uses multiplexers to take control of the RAM. Then the next Gigasimilar instruction is a spinlock. It hammers an address until it gets the expected result. That breaks the spinlock.
But imagine how powerful the Gigatron could be with the P2 chip doing all I/O tasks. It could likely be as powerful or more so than the 286. And changing the ALU might make it nearly as powerful as a 386 (in real mode). And the P2 could be a math coprocessor too. And the P2 could do things such as provide some FAT32 handling. 128 of a cog's registers could be a "sector" buffer.
The ideas mentioned above about a Gigaplex sounds interesting. But when it comes to video and sound, you could get by with just one extra Gigatron for that. But yeah, 2 Gigatrons could be chained to the first one and provide more advanced video. The idea is that the first one after the supervisor board could do graphics primitives. So let it handle text mode, graphics primitives, etc. Then it passes the finished stuff to the last one which is a renderer. The renderer machine doesn't have to be complete unless it is used for sound too. And extra features could be added, such as the last one having a screen blanking mode. So the video support board can clear the memory while the final one is outputting from a register, not the memory. Both would be free to clear their memory during the register screen blanking time.