I've wondered, if I were to make my design, if I even want to copy the vCPU or memory locations. On one hand, I could do things differently and make use of special features. But then one would need to create a toolchain and software environment. What does everyone think?
I've considered splitting the video out to DMA. I'm not sure of the best way to prevent software races. I've considered a "watchdog" sort of unit to halt if a new frame is being created in memory. Having nothing but vCPU time means that you could overwrite the frame buffer multiple times in a frame. This would worsen the faster you increase the clock. A naive solution would be to have a "crippled" mode where the CPU is halted during line-drawing. A more sophisticated method would be to have the watchdog unit monitor the memory addresses via address snooping and then selectively halt when the I/O unit has more data than can be handled. So for sound or video, if there are active updates, the CPU can halt and prevent software races for those devices. I could do the video and sound in a custom controller and have it read the RAM via DMA.
If I had my own mode and vCPU additions, I could add manual flow-control opcodes. For instance, there could be a "halt until condition" instruction. The operands could give the condition. The conditions could include h-sync, v-sync, sound finished, FPU operation complete, file operation complete, network/com operation complete, keyboard data incoming, etc. So hardware busy-wait polling.
Do others know of alternative strategies to prevent software races if using a hardware, DMA-based I/O controller? I know that other systems used interrupts to signal back to the CPU that the I/O was done. But things modeled after the Gigatron lack interrupts. So, I guess the hardware would need to be better at anticipating what the software intends to do. So address snooping could be used to see when different frames are being sent. For sound, I guess this would be needed since even though the sounds/calculations would be done in hardware, sending the sounds would still be bit-banging.
Speaking of sound, a possible upgrade if using a custom controller could be to dynamically change the frequency response depending on the channel usage. So 3937 or how many Hz when using 4 channels, 7875 when using 2, and 15,750 or whatever when using 1. That would require different note table sets (in BRAM), and software would need to be aware of this. I think that could be reverse compatible since current programmers would not use the extra ranges of the lower channel modes. (Though maybe a coder could artificially do this by putting the same thing on multiple channels.)
For the keyboard, I guess that could use DMA and address snooping. So an idea would be to have a keyboard read buffer and empty the buffer into a single memory location as it is used. So that could be used as "clothesline memory." When the address is read, the memory unit updates with whatever is sitting in the keyboard buffer. And the keyboard buffer could be cleared similarly to how it is done on a PC, though using a memory reading loop, not a port reading loop.
In my design, even some port control opcodes would be nice. For instance, a set active port opcode to allow up to 256 ports. Or have some port control memory locations. In that case, have a port control descriptor table. That could allow up to maybe 255 ports (with 0 meaning inactive) or up to 255 commands per port.
Now, if I want to abandon the memory map, I could add a stack in page 1 like the 6502 and save page 0 for its intended purpose. With the current design in mind, I could add a private stack in BRAM. That has some extra potential since private stack usage could be done in parallel with other operations, even those that use RAM. So one could combine LD/ST with PUSH/POP.
The above could be used in interesting ways. For instance, I'd use a LUT-based decoder unit. So it would be trivial to add another page of native opcodes, even if "private," and let the vCPU microcode store access those. Since BRAM is 9-bits, that would be trivial. That would be mainly for opcodes that only would be good for microcode, such as "return to native code." So there could be extra opcodes that do multiple things at once besides just incrementing the X-register in addition to whatever.
I could do some things differently. For instance, NOP could be a true NOP and not use the ALU. Or the vCPU halt/stop instruction could cause the native code to jump to the start address.
Also, I think a neat way to handle extra 16-bit opcodes would be to let the Memory Unit and DMA controller handle those. It could have its own incrementer. So you could do a 16-bit op on a given Y:X location and the faster memory controller could complete it to both addresses in a single slower cycle.
Another idea could be to store vCPU or similar programs on a ROM and treat it as an I/O device. TBH, that could probably be done on a regular Gigatron and make it easier to load different programs or use cartridges. So if a cartridge exists, the machine could then pull up a menu to select from the collection found on the cartridge. Then Loader could read the cartridge into RAM (if enough RAM exists, or report an error if not). As for a cartridge index, I'd keep it simple. Have a byte for the number of entries, have 2 bytes for each location, have 2 bytes for the length of each program, and maybe have 8-11 bytes for the name of each. If one wanted to, they could use a byte for attributes (such as hidden/private like if it is private or a table to use for additional circuitry). That would be similar to the old .LIB archiver. So have a crude byte-based "ISAM" table and not anything as complex/bulky as a block table such as FAT-12. With a Gigatron, what would be nice would be a native cartridge and an app cartridge.
To be honest, a ROM cartridge could be good for overriding the control unit and including a separate one (if one is handy with SMD). So you could intercept certain opcodes and send NOP to the native one. If one had new registers or whatever, they could share with the existing ones on the Gigatron through the LD [imm] instruction as an aliased instruction, while a different "internal native" instruction is done in the cartridge. So one could run one opcode internally on a custom coprocessor while emitting a different opcode to the Gigatron. As far as that goes, one could include an Arduino or similar on their cartridge and use the 2 together. Plus taking things further, one could have a cable between the native ROM and the app ROM. So that could provide more room for tables and things.
I mentioned an FPU option. I imagine one way to do that using my own memory map would be to have a reserved RAM area with at least 13 or 14 bytes. So have a byte for the opcode, 4 bytes for the FPU accumulator, 4 bytes for operand A, 4 bytes for operand B, and possibly a status byte. I think 14 bytes would be better since I'd clear the opcode "register" when finished. A status register could be good since that would be to denote carry, sign, exception, NAN, overflow, underflow, etc. Plus the status register would be good on boot since that could return presence and revision. I guess similar could be done on a Gigatron if it were modded to have maybe a 2-phase clock to create time/room for concurrent DMA. If the FPU used a set number of cycles, other instructions could be used before the result, or given the way vCPU works, single instructions might give enough time. So if the FPU works in 4 native cycles, 7 native cycles of a vCPU instruction would provide enough time. Since I'd want at least 32-bit registers for an FPU, there might be times it could work more like MMX and return multiple results.
Gigatron Similar machine
Gigatron Similar machine
Last edited by Sugarplum on 27 Sep 2021, 01:32, edited 2 times in total.
Re: Gigatron Similar machine
Maybe this is something for hackaday.io? Over there, you'll have a bigger audience, bigger than just the Gigatron-audience. Also, if if takes off, you'd already have everything neatly in one place.
Re: Gigatron Similar machine
Thanks, Walter. I do have a project page there. I'm doing more sharing than looking for advice. I tried on AnyCPU, but I find I don't fit with that crowd. Yet, my project seems to be beyond the intended scope here.
Below is the logic as to how I got to this point in what I'd like to build.
The challenge is creating the syncs outside of software. Actually doing that is the easy part. Standard VGA timings are listed all over the place, and there are videos such as those by Ben Eater that explain how to create those. That's just counters and simple logic. But the challenge then would be having the video controller in a known state while the Gigatron is sending to it. So that's why I am considering using concurrent DMA and giving the controller awareness of the indirection table. So the display would happen like the Gigatron does it, but using hardware.
Then, of course, that creates a new problem to solve. The other software devices would need to know the sync signals. When fully software-created, it is impossible to get out of sync, since everything is done in a single thread. Once you add another processing core or a DMA controller, then the software could run at any time in relation to video, sound, keyboard, etc. The software would need to run when the ports are ready for it to run, and we'd have no way to ensure that unless ALL of the peripherals were done in hardware. So creating a PSG and a keyboard controller in hardware too would eliminate possible hardware race conditions.
So having a unified I/O controller would get the syncs and video copying out of software while making sure everything can access those sync signals. However, this would open up new possibilities for problems, namely software race conditions. If the CPU can operate at full speed without needing to wait on the video, it is possible that the frame buffer would be overwritten several times before the video is finished displaying the current items. I imagine there would be a similar issue with the sound.
A naive solution would be to give the CPU a halt line and to halt during active drawing for at least 1 out of 4 native scanlines. That would be marginally faster than Mode 3 since the sound and video would be produced during its own hardware thread. That would not be an elegant solution and would still limit performance. But it just might work well enough to have vCPU compatibility.
A more sophisticated solution would be to monitor writes to I/O regions in memory and to halt based on the activity there. In older systems, there is precedent for such. If you wanted to build accelerator boards for 6502 machines, you'd need a way to keep I/O compatibility. Even just changing the variety of 6502 CPU or altering the clock rate would cause problems. In the Apple II, sound and disk I/O were bit-banged. If you install a 65C02 or even the rare 65CE02, you would break the sound and disk writes (reads were more forgiving of cycle variations). So those making accelerator boards would selectively accelerate based on writes to I/O regions.
The above is just one way to move video and other I/O to hardware. I am sure there are other ways to do this. For instance, most 6502-based machines used a 6519-6522 PIA or VIA chip. You can get those today, but you won't be able to go past 14 Mhz if you use those. Those rely on interrupts to send signals to the correct devices. The Gigatron doesn't have hardware interrupts. Adding interrupt abilities seems rather complex to me since so many things would need to be done.
As for how older computers did the video, it varied per manufacturer and platform. For instance, the Atari 2600 used the TIA chip, but still mostly relied on bit-banging. An advantage of using the TIA was the ability to use multiple resolutions at the same time. So if you had a white background, you could specify a pixel size the width of the screen for the lines that are entirely white. So that gave extra CPU time. And technically, that can be done on the Gigatron now if one were to write a game in native assembly. The Out port uses a register so you could specify a single color at the beginning of a line and do whatever you want until syncs where you'd blank the screen, change the syncs (both in a single instruction), and then continue code processing until the next line.
However, with the Atari 800, they added a coprocessor called Antic to do the video using display lists. To do that, they modified the 6502 to add a halt line to allow DMA. This is also how they did the DRAM refresh. This is the bus-mastering form of DMA where another device takes over the buses while the CPU is paused. Plus Antic and the PIA used hardware interrupts. So an interrupt was sent at the start of v-sync. Sound, ports, and the keyboard were all handled in POKEY (POrt/KEYboard controller). The PIA (and VIA) are capable of producing sound, but for the Atari 800, that ability is mainly used as an error siren and to chirp the sound during keystrokes.
The Commodore 64 uses cycle-stealing for its DMA. The 6502 uses a multiphase clock and effectively accesses the buses every other cycle. Plus the C64 uses the VIC-II video coprocessor. The SID is used for the sound, and there are 2 VIAs for managing I/O.
Below is the logic as to how I got to this point in what I'd like to build.
The challenge is creating the syncs outside of software. Actually doing that is the easy part. Standard VGA timings are listed all over the place, and there are videos such as those by Ben Eater that explain how to create those. That's just counters and simple logic. But the challenge then would be having the video controller in a known state while the Gigatron is sending to it. So that's why I am considering using concurrent DMA and giving the controller awareness of the indirection table. So the display would happen like the Gigatron does it, but using hardware.
Then, of course, that creates a new problem to solve. The other software devices would need to know the sync signals. When fully software-created, it is impossible to get out of sync, since everything is done in a single thread. Once you add another processing core or a DMA controller, then the software could run at any time in relation to video, sound, keyboard, etc. The software would need to run when the ports are ready for it to run, and we'd have no way to ensure that unless ALL of the peripherals were done in hardware. So creating a PSG and a keyboard controller in hardware too would eliminate possible hardware race conditions.
So having a unified I/O controller would get the syncs and video copying out of software while making sure everything can access those sync signals. However, this would open up new possibilities for problems, namely software race conditions. If the CPU can operate at full speed without needing to wait on the video, it is possible that the frame buffer would be overwritten several times before the video is finished displaying the current items. I imagine there would be a similar issue with the sound.
A naive solution would be to give the CPU a halt line and to halt during active drawing for at least 1 out of 4 native scanlines. That would be marginally faster than Mode 3 since the sound and video would be produced during its own hardware thread. That would not be an elegant solution and would still limit performance. But it just might work well enough to have vCPU compatibility.
A more sophisticated solution would be to monitor writes to I/O regions in memory and to halt based on the activity there. In older systems, there is precedent for such. If you wanted to build accelerator boards for 6502 machines, you'd need a way to keep I/O compatibility. Even just changing the variety of 6502 CPU or altering the clock rate would cause problems. In the Apple II, sound and disk I/O were bit-banged. If you install a 65C02 or even the rare 65CE02, you would break the sound and disk writes (reads were more forgiving of cycle variations). So those making accelerator boards would selectively accelerate based on writes to I/O regions.
The above is just one way to move video and other I/O to hardware. I am sure there are other ways to do this. For instance, most 6502-based machines used a 6519-6522 PIA or VIA chip. You can get those today, but you won't be able to go past 14 Mhz if you use those. Those rely on interrupts to send signals to the correct devices. The Gigatron doesn't have hardware interrupts. Adding interrupt abilities seems rather complex to me since so many things would need to be done.
As for how older computers did the video, it varied per manufacturer and platform. For instance, the Atari 2600 used the TIA chip, but still mostly relied on bit-banging. An advantage of using the TIA was the ability to use multiple resolutions at the same time. So if you had a white background, you could specify a pixel size the width of the screen for the lines that are entirely white. So that gave extra CPU time. And technically, that can be done on the Gigatron now if one were to write a game in native assembly. The Out port uses a register so you could specify a single color at the beginning of a line and do whatever you want until syncs where you'd blank the screen, change the syncs (both in a single instruction), and then continue code processing until the next line.
However, with the Atari 800, they added a coprocessor called Antic to do the video using display lists. To do that, they modified the 6502 to add a halt line to allow DMA. This is also how they did the DRAM refresh. This is the bus-mastering form of DMA where another device takes over the buses while the CPU is paused. Plus Antic and the PIA used hardware interrupts. So an interrupt was sent at the start of v-sync. Sound, ports, and the keyboard were all handled in POKEY (POrt/KEYboard controller). The PIA (and VIA) are capable of producing sound, but for the Atari 800, that ability is mainly used as an error siren and to chirp the sound during keystrokes.
The Commodore 64 uses cycle-stealing for its DMA. The 6502 uses a multiphase clock and effectively accesses the buses every other cycle. Plus the C64 uses the VIC-II video coprocessor. The SID is used for the sound, and there are 2 VIAs for managing I/O.