Gigatron Similar machine
Posted: 28 Aug 2021, 14:46
I've wondered, if I were to make my design, if I even want to copy the vCPU or memory locations. On one hand, I could do things differently and make use of special features. But then one would need to create a toolchain and software environment. What does everyone think?
I've considered splitting the video out to DMA. I'm not sure of the best way to prevent software races. I've considered a "watchdog" sort of unit to halt if a new frame is being created in memory. Having nothing but vCPU time means that you could overwrite the frame buffer multiple times in a frame. This would worsen the faster you increase the clock. A naive solution would be to have a "crippled" mode where the CPU is halted during line-drawing. A more sophisticated method would be to have the watchdog unit monitor the memory addresses via address snooping and then selectively halt when the I/O unit has more data than can be handled. So for sound or video, if there are active updates, the CPU can halt and prevent software races for those devices. I could do the video and sound in a custom controller and have it read the RAM via DMA.
If I had my own mode and vCPU additions, I could add manual flow-control opcodes. For instance, there could be a "halt until condition" instruction. The operands could give the condition. The conditions could include h-sync, v-sync, sound finished, FPU operation complete, file operation complete, network/com operation complete, keyboard data incoming, etc. So hardware busy-wait polling.
Do others know of alternative strategies to prevent software races if using a hardware, DMA-based I/O controller? I know that other systems used interrupts to signal back to the CPU that the I/O was done. But things modeled after the Gigatron lack interrupts. So, I guess the hardware would need to be better at anticipating what the software intends to do. So address snooping could be used to see when different frames are being sent. For sound, I guess this would be needed since even though the sounds/calculations would be done in hardware, sending the sounds would still be bit-banging.
Speaking of sound, a possible upgrade if using a custom controller could be to dynamically change the frequency response depending on the channel usage. So 3937 or how many Hz when using 4 channels, 7875 when using 2, and 15,750 or whatever when using 1. That would require different note table sets (in BRAM), and software would need to be aware of this. I think that could be reverse compatible since current programmers would not use the extra ranges of the lower channel modes. (Though maybe a coder could artificially do this by putting the same thing on multiple channels.)
For the keyboard, I guess that could use DMA and address snooping. So an idea would be to have a keyboard read buffer and empty the buffer into a single memory location as it is used. So that could be used as "clothesline memory." When the address is read, the memory unit updates with whatever is sitting in the keyboard buffer. And the keyboard buffer could be cleared similarly to how it is done on a PC, though using a memory reading loop, not a port reading loop.
In my design, even some port control opcodes would be nice. For instance, a set active port opcode to allow up to 256 ports. Or have some port control memory locations. In that case, have a port control descriptor table. That could allow up to maybe 255 ports (with 0 meaning inactive) or up to 255 commands per port.
Now, if I want to abandon the memory map, I could add a stack in page 1 like the 6502 and save page 0 for its intended purpose. With the current design in mind, I could add a private stack in BRAM. That has some extra potential since private stack usage could be done in parallel with other operations, even those that use RAM. So one could combine LD/ST with PUSH/POP.
The above could be used in interesting ways. For instance, I'd use a LUT-based decoder unit. So it would be trivial to add another page of native opcodes, even if "private," and let the vCPU microcode store access those. Since BRAM is 9-bits, that would be trivial. That would be mainly for opcodes that only would be good for microcode, such as "return to native code." So there could be extra opcodes that do multiple things at once besides just incrementing the X-register in addition to whatever.
I could do some things differently. For instance, NOP could be a true NOP and not use the ALU. Or the vCPU halt/stop instruction could cause the native code to jump to the start address.
Also, I think a neat way to handle extra 16-bit opcodes would be to let the Memory Unit and DMA controller handle those. It could have its own incrementer. So you could do a 16-bit op on a given Y:X location and the faster memory controller could complete it to both addresses in a single slower cycle.
Another idea could be to store vCPU or similar programs on a ROM and treat it as an I/O device. TBH, that could probably be done on a regular Gigatron and make it easier to load different programs or use cartridges. So if a cartridge exists, the machine could then pull up a menu to select from the collection found on the cartridge. Then Loader could read the cartridge into RAM (if enough RAM exists, or report an error if not). As for a cartridge index, I'd keep it simple. Have a byte for the number of entries, have 2 bytes for each location, have 2 bytes for the length of each program, and maybe have 8-11 bytes for the name of each. If one wanted to, they could use a byte for attributes (such as hidden/private like if it is private or a table to use for additional circuitry). That would be similar to the old .LIB archiver. So have a crude byte-based "ISAM" table and not anything as complex/bulky as a block table such as FAT-12. With a Gigatron, what would be nice would be a native cartridge and an app cartridge.
To be honest, a ROM cartridge could be good for overriding the control unit and including a separate one (if one is handy with SMD). So you could intercept certain opcodes and send NOP to the native one. If one had new registers or whatever, they could share with the existing ones on the Gigatron through the LD [imm] instruction as an aliased instruction, while a different "internal native" instruction is done in the cartridge. So one could run one opcode internally on a custom coprocessor while emitting a different opcode to the Gigatron. As far as that goes, one could include an Arduino or similar on their cartridge and use the 2 together. Plus taking things further, one could have a cable between the native ROM and the app ROM. So that could provide more room for tables and things.
I mentioned an FPU option. I imagine one way to do that using my own memory map would be to have a reserved RAM area with at least 13 or 14 bytes. So have a byte for the opcode, 4 bytes for the FPU accumulator, 4 bytes for operand A, 4 bytes for operand B, and possibly a status byte. I think 14 bytes would be better since I'd clear the opcode "register" when finished. A status register could be good since that would be to denote carry, sign, exception, NAN, overflow, underflow, etc. Plus the status register would be good on boot since that could return presence and revision. I guess similar could be done on a Gigatron if it were modded to have maybe a 2-phase clock to create time/room for concurrent DMA. If the FPU used a set number of cycles, other instructions could be used before the result, or given the way vCPU works, single instructions might give enough time. So if the FPU works in 4 native cycles, 7 native cycles of a vCPU instruction would provide enough time. Since I'd want at least 32-bit registers for an FPU, there might be times it could work more like MMX and return multiple results.
I've considered splitting the video out to DMA. I'm not sure of the best way to prevent software races. I've considered a "watchdog" sort of unit to halt if a new frame is being created in memory. Having nothing but vCPU time means that you could overwrite the frame buffer multiple times in a frame. This would worsen the faster you increase the clock. A naive solution would be to have a "crippled" mode where the CPU is halted during line-drawing. A more sophisticated method would be to have the watchdog unit monitor the memory addresses via address snooping and then selectively halt when the I/O unit has more data than can be handled. So for sound or video, if there are active updates, the CPU can halt and prevent software races for those devices. I could do the video and sound in a custom controller and have it read the RAM via DMA.
If I had my own mode and vCPU additions, I could add manual flow-control opcodes. For instance, there could be a "halt until condition" instruction. The operands could give the condition. The conditions could include h-sync, v-sync, sound finished, FPU operation complete, file operation complete, network/com operation complete, keyboard data incoming, etc. So hardware busy-wait polling.
Do others know of alternative strategies to prevent software races if using a hardware, DMA-based I/O controller? I know that other systems used interrupts to signal back to the CPU that the I/O was done. But things modeled after the Gigatron lack interrupts. So, I guess the hardware would need to be better at anticipating what the software intends to do. So address snooping could be used to see when different frames are being sent. For sound, I guess this would be needed since even though the sounds/calculations would be done in hardware, sending the sounds would still be bit-banging.
Speaking of sound, a possible upgrade if using a custom controller could be to dynamically change the frequency response depending on the channel usage. So 3937 or how many Hz when using 4 channels, 7875 when using 2, and 15,750 or whatever when using 1. That would require different note table sets (in BRAM), and software would need to be aware of this. I think that could be reverse compatible since current programmers would not use the extra ranges of the lower channel modes. (Though maybe a coder could artificially do this by putting the same thing on multiple channels.)
For the keyboard, I guess that could use DMA and address snooping. So an idea would be to have a keyboard read buffer and empty the buffer into a single memory location as it is used. So that could be used as "clothesline memory." When the address is read, the memory unit updates with whatever is sitting in the keyboard buffer. And the keyboard buffer could be cleared similarly to how it is done on a PC, though using a memory reading loop, not a port reading loop.
In my design, even some port control opcodes would be nice. For instance, a set active port opcode to allow up to 256 ports. Or have some port control memory locations. In that case, have a port control descriptor table. That could allow up to maybe 255 ports (with 0 meaning inactive) or up to 255 commands per port.
Now, if I want to abandon the memory map, I could add a stack in page 1 like the 6502 and save page 0 for its intended purpose. With the current design in mind, I could add a private stack in BRAM. That has some extra potential since private stack usage could be done in parallel with other operations, even those that use RAM. So one could combine LD/ST with PUSH/POP.
The above could be used in interesting ways. For instance, I'd use a LUT-based decoder unit. So it would be trivial to add another page of native opcodes, even if "private," and let the vCPU microcode store access those. Since BRAM is 9-bits, that would be trivial. That would be mainly for opcodes that only would be good for microcode, such as "return to native code." So there could be extra opcodes that do multiple things at once besides just incrementing the X-register in addition to whatever.
I could do some things differently. For instance, NOP could be a true NOP and not use the ALU. Or the vCPU halt/stop instruction could cause the native code to jump to the start address.
Also, I think a neat way to handle extra 16-bit opcodes would be to let the Memory Unit and DMA controller handle those. It could have its own incrementer. So you could do a 16-bit op on a given Y:X location and the faster memory controller could complete it to both addresses in a single slower cycle.
Another idea could be to store vCPU or similar programs on a ROM and treat it as an I/O device. TBH, that could probably be done on a regular Gigatron and make it easier to load different programs or use cartridges. So if a cartridge exists, the machine could then pull up a menu to select from the collection found on the cartridge. Then Loader could read the cartridge into RAM (if enough RAM exists, or report an error if not). As for a cartridge index, I'd keep it simple. Have a byte for the number of entries, have 2 bytes for each location, have 2 bytes for the length of each program, and maybe have 8-11 bytes for the name of each. If one wanted to, they could use a byte for attributes (such as hidden/private like if it is private or a table to use for additional circuitry). That would be similar to the old .LIB archiver. So have a crude byte-based "ISAM" table and not anything as complex/bulky as a block table such as FAT-12. With a Gigatron, what would be nice would be a native cartridge and an app cartridge.
To be honest, a ROM cartridge could be good for overriding the control unit and including a separate one (if one is handy with SMD). So you could intercept certain opcodes and send NOP to the native one. If one had new registers or whatever, they could share with the existing ones on the Gigatron through the LD [imm] instruction as an aliased instruction, while a different "internal native" instruction is done in the cartridge. So one could run one opcode internally on a custom coprocessor while emitting a different opcode to the Gigatron. As far as that goes, one could include an Arduino or similar on their cartridge and use the 2 together. Plus taking things further, one could have a cable between the native ROM and the app ROM. So that could provide more room for tables and things.
I mentioned an FPU option. I imagine one way to do that using my own memory map would be to have a reserved RAM area with at least 13 or 14 bytes. So have a byte for the opcode, 4 bytes for the FPU accumulator, 4 bytes for operand A, 4 bytes for operand B, and possibly a status byte. I think 14 bytes would be better since I'd clear the opcode "register" when finished. A status register could be good since that would be to denote carry, sign, exception, NAN, overflow, underflow, etc. Plus the status register would be good on boot since that could return presence and revision. I guess similar could be done on a Gigatron if it were modded to have maybe a 2-phase clock to create time/room for concurrent DMA. If the FPU used a set number of cycles, other instructions could be used before the result, or given the way vCPU works, single instructions might give enough time. So if the FPU works in 4 native cycles, 7 native cycles of a vCPU instruction would provide enough time. Since I'd want at least 32-bit registers for an FPU, there might be times it could work more like MMX and return multiple results.