Gigatron Hackers

Posted: **28 Jun 2022, 00:30**

I wanted to do the 75-100 Mhz Gigatron, and while I think that's possible if I redesign the architecture and go for a 4-stage pipeline, I don't know if I can pull it off with the way memory prices have gotten. While that's against Marcel's original philosophy, I wanted to do the CU and the ALU in SRAM LUTs. And going that far, I could add registers, 16-bit native instructions, etc. The memory slot could give room/time for another ALU.

But, I wonder if one could emulate vCPU and run .GT1 files on a Propeller 2. The Propeller 2 is 32-bit, has 64 GPIO pins, has 8 cogs, has 4 DACs per cog, 512 double-words (2K) of executable space per cog, another 2K of LUT space per cog, and 512K of hub memory. Each cog has its own timeslot on the hub for accessing its memory. Neighboring pairs (same pin address but the last bit) of pins can access each other's LUT RAM. If you need more memory, you could use a cog as a memory controller and access an external serial RAM. It is rated at 180 Mhz, but 300+ Mhz is possible. There are built-in VGA features (or HDMI or composite), though that may dictate your clock rate. It must run at 10x the pixel clock, though not sure if you can go faster than that or not. But if not, one could shoot for a 25 Mhz pixel clock and emulate 6.25 by sending each pixel 4 times across.

On the sound, it is possible to use the internal DAC and send on 1 wire (or 2 if stereo is desired). So one could do 8-bit samples, though keeping 6-bit samples might be good.

If one wants to do lights, they could probably do what you can do now with only 2-3 wires. The pins can be tri-stated, so you treat things as having a floating power supply, having diodes in both directions, having some common to ground and some common to Vcc. Then have some code to convert from the Gigatron system to the LED mappings. And clocking at 180-320 Mhz should allow for the illusion of them being solid. So one can Charlieplex them if they want to save pins or add more LEDs.

For things like multiplication, the system calls can be sped up since the P2 can do 32/32/64 multiplications and 64/32/32 divisions in 2 cycles. Really, it would be nice to have a function call for random numbers since that could make full use of the hardware RNG. That uses a 128-bit algorithm, it is seeded with a TRNG source on boot, and it returns different results per cog or pin. Otherwise, I'd need to find a way to update the memory fast enough, or maybe alias the address. So when the vCPU RNG address is read, it could come from cog memory, not the hub or any external RAM.

So more syscalls and opcodes could come in handy, but I'd then need a compiler that could use them. I guess with the new Z register that is added or will be added, 16 Mb would be the limit. I'm not quite sure what to do with the memory. I mean, I am not sure how much hub memory needs to be reserved for the emulation. I'd imagine up to half of it, but I wouldn't know unless I did it.

As for cog allocation, I'm not sure how many would be needed for the core emulator. Besides that, there should be a video core that maybe could do sound, a memory controller, and maybe a couple of I/O controllers. Maybe have one for input devices like game and KB, or possibly even a mouse. I think a mouse isn't much harder to bit-bang than a keyboard, but the timings and protocol may be tighter, and it's easier to get out of sync. And maybe dedicate a cog to file I/O. If 8 cogs are enough, one might not only be able to emulate a Gigatron but expand it too.

On the video, that would take some delicate work. There are VGA facilities built-in, and it works almost like the Gigatron, though to make the color maps compatible, it may take swapping some wires around. The colors are ordered as BGR. Though to be honest, one can skip the hardware features and probably bit-bang, though one would need to pay attention to the code timings and the clock rate. If the built-in instructions limit the speed, then one could opt to clock it faster and bit-bang. For instance, if the hub is used as conventional memory, then one can stream from the hub to the cog RAM ahead of time during off lines and bit-bang or use the built-in features from the cog RAM.

Or, should I abandon vCPU and come up with my own instruction set that better utilizes the P2? Of course, that means writing new tools and software. Or do both. There might be enough flash ROM for multiple core sets. I'd certainly use a different memory map and try to keep things word or double-word aligned if I were to roll my own. Most of the important places in the memory map are on odd boundaries. If the most important stuff or the conventional memory will go in the hub, then it would be easy to read a double-word at a time.

What would be neat, would be if I could use 2 of those and have one just for sound. Like what if it could emulate the POKEY, SID, the TI noise chip, and some of the Yamaha stuff? Or make something custom that uses the best features of each or provides modes that never existed. For instance, the POKEY defaults to 4 channels and uses 8 bits. You can put it in a 16-bit mode at cost of a channel per 16-bit channel. But what if it had 4 or more 16-bit channels. But then again, in a way, that was already done, since there was a quad POKEY chip. So I guess one could have 8 of the 16-bit channels, making high-res and stereo possible. I'd love something with a lot of channels and a wide frequency range. Part of me has nostalgia for the TV horizontal sweep (15,750 Hz). And there is really no need for higher than that, though a few people can hear up to 18 kHz.

I think the philosophy behind the Propeller is that interrupts are not needed if you can dedicate cogs to peripherals and use semaphores and other flow control to communicate between the cogs.

As a name, maybe such a contraption could be called an Octotron. However, that is already the name of an amusement ride and a sound synthesizer. Hydra is already used for a P1-based console. Or Propellotron.

But to do this, I'd need to know a lot of things about vCPU that I don't currently know. That would include things like the vCPU entry point, what I/O facilities the vCPU has, etc.

Posted: **28 Jun 2022, 11:31**

I think you could emulate vcpu on a $5 raspberry pico. You would have to smartly use their dma and "state machines" to output the video and the sound, possibly with the help of the second arm core. This would make a very cheap and very fast Gigatron...

Posted: **28 Jun 2022, 11:44**

I've been toying with that idea too. The 8 PIOs offload much of the serial data manipulation from the main dual core M0+. It's crazy when you think about how much processing power you get for $4.

Posted: **28 Jun 2022, 12:33**

If you want to just emulate the Von Neumann vCPU 16bit instruction set, that's pretty trivial on any decent modern architecture. If you want 100% .gt1 compatibility then you have to emulate all the Sys calls, zero page registers, audio registers, VBlank registers, scanline VTable, etc, etc. No so trivial.

Of course you can just run Marcel's C emulation code, but then you need a decent 1GHz+ architecture to run that at 60Hz.

Either aim for full compatibility, or create something completely different, anything else is a complete waste of time IMO.

Posted: **28 Jun 2022, 14:01**

bmwtcu wrote: ↑28 Jun 2022, 11:44 I've been toying with that idea too. The 8 PIOs offload much of the serial data manipulation from the main dual core M0+. It's crazy when you think about how much processing power you get for $4.

Absolutely crazy.

After reading your posts I bought a Tank Nano 9k. Crazy power too. Too bad their psRAM has stupid latencies.

Another interesting piece of hardware is the CMOD A7 (https://digilent.com/shop/cmod-a7-35t-b ... pga-module) which is more expensive, out of stock, but has just one needs to make a crazy fast Gigatron (including a 512k SRAM with 8ns access time.)

Posted: **28 Jun 2022, 14:25**

at67 wrote: ↑28 Jun 2022, 12:33 If you want to just emulate the Von Neumann vCPU 16bit instruction set, that's pretty trivial on any decent modern architecture. If you want 100% .gt1 compatibility then you have to emulate all the Sys calls, zero page registers, audio registers, VBlank registers, scanline VTable, etc, etc. No so trivial.

Of course you can just run Marcel's C emulation code, but then you need a decent 1GHz+ architecture to run that at 60Hz.

Either aim for full compatibility, or create something completely different, anything else is a complete waste of time IMO.

Yes, with the P2 idea in mind, full compatibility is what I had in mind. There might need to be a native setup program to fill in all the tables, and maybe do it like an overlay to where it is no longer in the kernel space after it is used. Syscalls could be handled like the other instructions, etc. The different cogs could help make up for some of the bottlenecks imposed by the arch. The video production would get its own cog, be indirection aware, etc. And I could probably stream the hub memory into the cog memory several lines in advance. Sound could have its own cog and cache the sound tables while keeping the ability to be software changeable. And it can be improved on while maintain compatibility since one could put the virtual registers, zerp-page, etc., in cog registers. The hub RAM is multi-ported, so no need to stop execution to wait on the video.

As for something new, that can be done too. Once the hardware arch is settled upon and peripherals are coded, other instruction sets and system call sets could be added. In that case, have a bootstrap loader to determine which one is installed on boot. Someone even emulated an XT on one.

Posted: **29 Jun 2022, 08:05**

at67 wrote: ↑28 Jun 2022, 12:33 Of course you can just run Marcel's C emulation code, but then you need a decent 1GHz+ architecture to run that at 60Hz.

I'd been thinking about this sort of thing (but for other reasons - maybe more to come

), and it seems to me that it should be quite achievable to get realtime performance on an emulator on a small microcontroller. The trick would be to do dynamic binary recompilation, which is what systems like qemu do. Basically it's a JIT compiler for machine code - rather than interpret each instruction, you work out the equivalent host machine instruction, write it to memory, and run that. Optimise hot loops and so on.

For generic emulators like qemu this is apparently quite a lot of engineering work, but if you're just doing the Gigatron, and you're just targeting Thumb (for the Pi Pico), I think it should be reasonably straightforward. The performance benefit would come from caching the generated code - and since there can be at most 65536 instructions, the cache management could be pretty sloppy. The optimisation phase could be nonexistent too.

Another thing for the list of Gigatron projects I'll never get around to.

Posted: **29 Jun 2022, 17:40**

https://forum.gigatron.io/viewtopic.php?t=82

https://www.jcwolfram.de/projekte/gtmicro_en/main.php

Posted: **29 Jun 2022, 19:51**

That should probably be added to the pinned topic

Posted: **30 Jun 2022, 10:19**

I think we sorta lost the intention of the thread. It is nice what others are doing on more modest controllers, and I applaud those who are doing those. Thank you for the links and the controller recommendations.

The difference here is that the Propeller 2 has 8 cogs, and the challenge is to maximize cog usage. Due to its use of cog RAM and hub RAM, as well as the ability to access external memory, if one adds it, concurrent DMA is inherent to the design.

It seems the key would be figuring out how to do jump lists. If it has that ability, it won't be as costly as needing to parse, poll, etc.

I get what others are saying about dynamic recompiling. That could probably be done with a dedicated cog. Maybe reserve enough hub RAM for the kernel and then divide it into 2 pools with virtual code going into one, getting read by the converter cog and being written back to the other "partition," with that other partition containing executable native code.

But since it will never be the real thing, and due to the different memories, with each cog having its own, and multiple opportunities for concurrent operation, I'd like to take advantage of that and have the different processes instead of multiplexing things in code. So there could be a table-aware video cog, which may or may not be able to also do sounds and lights, an input controller with integrated Pluggy, some sort of memory controller (at least if external RAM is added), and a mass storage cog.

Gigatron Hackers

Would emulating vCPU on a Propeller 2 be feasible?

Would emulating vCPU on a Propeller 2 be feasible?

Re: Would emulating vCPU on a Propeller 2 be feasible?

Re: Would emulating vCPU on a Propeller 2 be feasible?

Re: Would emulating vCPU on a Propeller 2 be feasible?

Re: Would emulating vCPU on a Propeller 2 be feasible?

Re: Would emulating vCPU on a Propeller 2 be feasible?

Re: Would emulating vCPU on a Propeller 2 be feasible?

Re: Would emulating vCPU on a Propeller 2 be feasible?

Re: Would emulating vCPU on a Propeller 2 be feasible?

Re: Would emulating vCPU on a Propeller 2 be feasible?