Would emulating vCPU on a Propeller 2 be feasible?
Posted: 28 Jun 2022, 00:30
I wanted to do the 75-100 Mhz Gigatron, and while I think that's possible if I redesign the architecture and go for a 4-stage pipeline, I don't know if I can pull it off with the way memory prices have gotten. While that's against Marcel's original philosophy, I wanted to do the CU and the ALU in SRAM LUTs. And going that far, I could add registers, 16-bit native instructions, etc. The memory slot could give room/time for another ALU.
But, I wonder if one could emulate vCPU and run .GT1 files on a Propeller 2. The Propeller 2 is 32-bit, has 64 GPIO pins, has 8 cogs, has 4 DACs per cog, 512 double-words (2K) of executable space per cog, another 2K of LUT space per cog, and 512K of hub memory. Each cog has its own timeslot on the hub for accessing its memory. Neighboring pairs (same pin address but the last bit) of pins can access each other's LUT RAM. If you need more memory, you could use a cog as a memory controller and access an external serial RAM. It is rated at 180 Mhz, but 300+ Mhz is possible. There are built-in VGA features (or HDMI or composite), though that may dictate your clock rate. It must run at 10x the pixel clock, though not sure if you can go faster than that or not. But if not, one could shoot for a 25 Mhz pixel clock and emulate 6.25 by sending each pixel 4 times across.
On the sound, it is possible to use the internal DAC and send on 1 wire (or 2 if stereo is desired). So one could do 8-bit samples, though keeping 6-bit samples might be good.
If one wants to do lights, they could probably do what you can do now with only 2-3 wires. The pins can be tri-stated, so you treat things as having a floating power supply, having diodes in both directions, having some common to ground and some common to Vcc. Then have some code to convert from the Gigatron system to the LED mappings. And clocking at 180-320 Mhz should allow for the illusion of them being solid. So one can Charlieplex them if they want to save pins or add more LEDs.
For things like multiplication, the system calls can be sped up since the P2 can do 32/32/64 multiplications and 64/32/32 divisions in 2 cycles. Really, it would be nice to have a function call for random numbers since that could make full use of the hardware RNG. That uses a 128-bit algorithm, it is seeded with a TRNG source on boot, and it returns different results per cog or pin. Otherwise, I'd need to find a way to update the memory fast enough, or maybe alias the address. So when the vCPU RNG address is read, it could come from cog memory, not the hub or any external RAM.
So more syscalls and opcodes could come in handy, but I'd then need a compiler that could use them. I guess with the new Z register that is added or will be added, 16 Mb would be the limit. I'm not quite sure what to do with the memory. I mean, I am not sure how much hub memory needs to be reserved for the emulation. I'd imagine up to half of it, but I wouldn't know unless I did it.
As for cog allocation, I'm not sure how many would be needed for the core emulator. Besides that, there should be a video core that maybe could do sound, a memory controller, and maybe a couple of I/O controllers. Maybe have one for input devices like game and KB, or possibly even a mouse. I think a mouse isn't much harder to bit-bang than a keyboard, but the timings and protocol may be tighter, and it's easier to get out of sync. And maybe dedicate a cog to file I/O. If 8 cogs are enough, one might not only be able to emulate a Gigatron but expand it too.
On the video, that would take some delicate work. There are VGA facilities built-in, and it works almost like the Gigatron, though to make the color maps compatible, it may take swapping some wires around. The colors are ordered as BGR. Though to be honest, one can skip the hardware features and probably bit-bang, though one would need to pay attention to the code timings and the clock rate. If the built-in instructions limit the speed, then one could opt to clock it faster and bit-bang. For instance, if the hub is used as conventional memory, then one can stream from the hub to the cog RAM ahead of time during off lines and bit-bang or use the built-in features from the cog RAM.
Or, should I abandon vCPU and come up with my own instruction set that better utilizes the P2? Of course, that means writing new tools and software. Or do both. There might be enough flash ROM for multiple core sets. I'd certainly use a different memory map and try to keep things word or double-word aligned if I were to roll my own. Most of the important places in the memory map are on odd boundaries. If the most important stuff or the conventional memory will go in the hub, then it would be easy to read a double-word at a time.
What would be neat, would be if I could use 2 of those and have one just for sound. Like what if it could emulate the POKEY, SID, the TI noise chip, and some of the Yamaha stuff? Or make something custom that uses the best features of each or provides modes that never existed. For instance, the POKEY defaults to 4 channels and uses 8 bits. You can put it in a 16-bit mode at cost of a channel per 16-bit channel. But what if it had 4 or more 16-bit channels. But then again, in a way, that was already done, since there was a quad POKEY chip. So I guess one could have 8 of the 16-bit channels, making high-res and stereo possible. I'd love something with a lot of channels and a wide frequency range. Part of me has nostalgia for the TV horizontal sweep (15,750 Hz). And there is really no need for higher than that, though a few people can hear up to 18 kHz.
I think the philosophy behind the Propeller is that interrupts are not needed if you can dedicate cogs to peripherals and use semaphores and other flow control to communicate between the cogs.
As a name, maybe such a contraption could be called an Octotron. However, that is already the name of an amusement ride and a sound synthesizer. Hydra is already used for a P1-based console. Or Propellotron.
But to do this, I'd need to know a lot of things about vCPU that I don't currently know. That would include things like the vCPU entry point, what I/O facilities the vCPU has, etc.
But, I wonder if one could emulate vCPU and run .GT1 files on a Propeller 2. The Propeller 2 is 32-bit, has 64 GPIO pins, has 8 cogs, has 4 DACs per cog, 512 double-words (2K) of executable space per cog, another 2K of LUT space per cog, and 512K of hub memory. Each cog has its own timeslot on the hub for accessing its memory. Neighboring pairs (same pin address but the last bit) of pins can access each other's LUT RAM. If you need more memory, you could use a cog as a memory controller and access an external serial RAM. It is rated at 180 Mhz, but 300+ Mhz is possible. There are built-in VGA features (or HDMI or composite), though that may dictate your clock rate. It must run at 10x the pixel clock, though not sure if you can go faster than that or not. But if not, one could shoot for a 25 Mhz pixel clock and emulate 6.25 by sending each pixel 4 times across.
On the sound, it is possible to use the internal DAC and send on 1 wire (or 2 if stereo is desired). So one could do 8-bit samples, though keeping 6-bit samples might be good.
If one wants to do lights, they could probably do what you can do now with only 2-3 wires. The pins can be tri-stated, so you treat things as having a floating power supply, having diodes in both directions, having some common to ground and some common to Vcc. Then have some code to convert from the Gigatron system to the LED mappings. And clocking at 180-320 Mhz should allow for the illusion of them being solid. So one can Charlieplex them if they want to save pins or add more LEDs.
For things like multiplication, the system calls can be sped up since the P2 can do 32/32/64 multiplications and 64/32/32 divisions in 2 cycles. Really, it would be nice to have a function call for random numbers since that could make full use of the hardware RNG. That uses a 128-bit algorithm, it is seeded with a TRNG source on boot, and it returns different results per cog or pin. Otherwise, I'd need to find a way to update the memory fast enough, or maybe alias the address. So when the vCPU RNG address is read, it could come from cog memory, not the hub or any external RAM.
So more syscalls and opcodes could come in handy, but I'd then need a compiler that could use them. I guess with the new Z register that is added or will be added, 16 Mb would be the limit. I'm not quite sure what to do with the memory. I mean, I am not sure how much hub memory needs to be reserved for the emulation. I'd imagine up to half of it, but I wouldn't know unless I did it.
As for cog allocation, I'm not sure how many would be needed for the core emulator. Besides that, there should be a video core that maybe could do sound, a memory controller, and maybe a couple of I/O controllers. Maybe have one for input devices like game and KB, or possibly even a mouse. I think a mouse isn't much harder to bit-bang than a keyboard, but the timings and protocol may be tighter, and it's easier to get out of sync. And maybe dedicate a cog to file I/O. If 8 cogs are enough, one might not only be able to emulate a Gigatron but expand it too.
On the video, that would take some delicate work. There are VGA facilities built-in, and it works almost like the Gigatron, though to make the color maps compatible, it may take swapping some wires around. The colors are ordered as BGR. Though to be honest, one can skip the hardware features and probably bit-bang, though one would need to pay attention to the code timings and the clock rate. If the built-in instructions limit the speed, then one could opt to clock it faster and bit-bang. For instance, if the hub is used as conventional memory, then one can stream from the hub to the cog RAM ahead of time during off lines and bit-bang or use the built-in features from the cog RAM.
Or, should I abandon vCPU and come up with my own instruction set that better utilizes the P2? Of course, that means writing new tools and software. Or do both. There might be enough flash ROM for multiple core sets. I'd certainly use a different memory map and try to keep things word or double-word aligned if I were to roll my own. Most of the important places in the memory map are on odd boundaries. If the most important stuff or the conventional memory will go in the hub, then it would be easy to read a double-word at a time.
What would be neat, would be if I could use 2 of those and have one just for sound. Like what if it could emulate the POKEY, SID, the TI noise chip, and some of the Yamaha stuff? Or make something custom that uses the best features of each or provides modes that never existed. For instance, the POKEY defaults to 4 channels and uses 8 bits. You can put it in a 16-bit mode at cost of a channel per 16-bit channel. But what if it had 4 or more 16-bit channels. But then again, in a way, that was already done, since there was a quad POKEY chip. So I guess one could have 8 of the 16-bit channels, making high-res and stereo possible. I'd love something with a lot of channels and a wide frequency range. Part of me has nostalgia for the TV horizontal sweep (15,750 Hz). And there is really no need for higher than that, though a few people can hear up to 18 kHz.
I think the philosophy behind the Propeller is that interrupts are not needed if you can dedicate cogs to peripherals and use semaphores and other flow control to communicate between the cogs.
As a name, maybe such a contraption could be called an Octotron. However, that is already the name of an amusement ride and a sound synthesizer. Hydra is already used for a P1-based console. Or Propellotron.
But to do this, I'd need to know a lot of things about vCPU that I don't currently know. That would include things like the vCPU entry point, what I/O facilities the vCPU has, etc.