16-bit Gigatron similar machine

Sugarplum · Post by **Sugarplum** » 03 Oct 2023, 11:05

I'm trying to figure out the best way to do a Gigatron-similar machine with a 16-bit native machine under the hood. The vCPU code can still be 16-bit. The memory map could be different. It could use a 16-bit memory map. I know of no old CPU that did that.

So how far should I take it being 16-bit? For instance, should the ROM be 24-bit wide to allow for 16-bit operands?

Should it use 16-bit wide SRAM? I have mixed feelings here. Going with 16-bit RAM would break compatibility unless one could live with the alignment penalty.

I've already decided on a 16-bit ALU. I still don't know how the ALU works, but there are NOS and working pulls of the IDT7383 and similar. It seems 26 ns ones are the most common. That eliminates a concern I have that the Gigatron ALU wouldn't scale well. I think the current latency is about 45 ns. So 90 ns would be unacceptable. That said, one could pull Z80 tricks or similar by either taking 2 cycles for ALU ops (and making things less ALU-centric and more control-unit-centric) or adding a pipeline stage. But if you already have a prepackaged ALU solution that is just over half the latency and is twice as wide, I don't see why that shouldn't be used.

Technically, I guess one could use those for 8-bits with some clever wiring. In that case, output lines 7 and 8 would be used by the condition decoder. Data output line 7 would determine the sign and line 8 (the 9th bit) would be used in place of the carry line to determine >=. Then you'd have better latency, though likely not half due to there being a substantial tree adder that has a high level of parallelism. And you'd tie all the other inputs low if you're not using them.

I'd need to figure out how to attach the 16-bit ALU. The control line differences would require replacement diode arrays. Since more lines are involved, one must consider if doubling the decoders would be necessary to increase fanout current.

Speaking of current, one might want to use a dedicated power supply, such as using a transformer of at least 6 volts with a rectifier, a "Big Blu" capacitor (even if they're black), and a voltage regulator IC of about 5 Amps. Or hell, use an ATX supply.

How should I handle the 3 missing operations in the prepackaged ALU? How would I move load, store, and branch outside the ALU? Would changing the diode ROMs allow one to directly put registers/memory on the bus without the ALU? How would I handle branches?

More importantly, what instructions would I need in addition to the ones used? I think there would need to be a way to move between the accumulator halves and to store things to Y:X.

Would it be useful to have an instruction to jump to AC*2 (or another power of 2)? I think that could be the solution to using all 256 virtual opcodes. I mean, if you have 512 bytes of vector table space, you'd have enough room for 256 branches and 256 NOPs. Also, what is the length of the largest single vCPU instruction handler? Could it be done within 8 instructions or more? That could be more efficient to not have a dispatch/vector table and directly call the instructions. That means being able to jump to only certain addresses due to shifting in order to reserve space. So if you can only jump to even addresses, that means having 2 pages for the jump list. But if you skip by more powers of 2, you can have an inline-coded jump list.

monsonite · Post by **monsonite** » 04 Nov 2023, 13:59

Personally, I feel that the Gigatron achieves all that was ever intended of it, plus a lot more.

It should provide the inspirational basis for new architectures, possibly with different end goals, and these priorities may be varied, depending on the application.

I would like to see a hardware "Carry" implemented. This could just be latched from the upper 74HC283 onto bit 0 of the databus, and an ADC instruction provided. This will reduce the 16-bit ADD instruction to about half the existing execution time, and simplify some of the branching logic.

The Gigatron is capable of 12.5MHz clocking, with a few chip upgrades. This makes it attractive to use as a virtual machine, such as vCPU and v6502 but possibly emulate some of the other classic 8-bit machines, such as 8080, so that the CP/M software archive could be made accessible. With hardware carry, an improved 16-bit add time would make 8080 pseudo-16-bit operations more viable.

A further point, if the 74HC377 Accumulator were replaced with a 74HC299 universal (serial/parallel in/out tri-state) shift register, it would allow for both left and right shifts and rotates to be performed. This would make bit-testing and multiplication/division easier.

It would also provide an easier route to simplifying the SPI peripheral capabilities without having to resort to a credit-card sized expansion PCB.

Using CORDIC algorithms (shifts and adds) to calculate trigonometric, transcendental and scientific functions, the Gigatron could take the role of a programmable floating point calculator, on the lines of the HP9100A or even HP35.

Regarding expanding to 16-bits. I think this would generate too much additional complexity, on top of the existing Gigatron design. The Gigatron is a moderately fast 8-bit machine, but having the mix of 8-and 16-bit pseudo instructions available in vCPU, (such as Z80/8080) certainly improves its versatility.

Your comments about the time overhead of ripple carry are perfectly valid, and just doubling the existing ALU width would probably be a lot of work and somewhat disappointing. I don't think that the Gigatron is the best starting point for creating an efficient 16-bit architecture. Too much of the design is tied up with the 1/4 VGA colour video hardware, and separating these functions would be like separating conjoined twins.

The existing design has proven that a doubling of the clock frequency is possible, as is quadrupling the stock RAM to 128K bytes. I am aware that there has been some development on higher resolution, reduced colour palette video schemes. A higher resolution monochrome text/graphic mode would be more attractive for certain applications, even to the point where the Gigatron becomes THE graphics controller, for a different CPU, that does not have dedicated video generation hardware.

Finally, I believe that the control unit, which mostly consists of combinational logic, can be simplified, by placing it inside another AT27C1024 (45nS) ROM. This would provide ample inputs and outputs to decode a larger instruction set, using a technique similar to the OPR and IOT instructions on the PDP-8.

Sugarplum · Post by **Sugarplum** » 05 Nov 2023, 20:43

Thank you Monosonite!

I agree, the Gigatron fits the design philosophy well. I'm seriously considering spinning something similar. I think I'd want to play with some working pulls of the limited 16-bit ALUs that were around. They could replace at least 10 chips (and possibly more such as the accumulator and maybe the bus bridge buffer) and would replace the proposed chips (so 20 plus from the other thread).

I think the ALU I propose does include some flags. And whatever it doesn't latch can be latched externally. Just moving to a 16-bit ALU would greatly reduce the native code needed for the 16-bit addition. Having a proper carry flag would then help 32-bit calculations.

The shift register proposal would be interesting. I don't think doing the Ac+= instruction and using the left shift function of the register could be combined. The register shift would shift what is already in there (and maybe 1 more bit), while the adder would shift things at the input. I don't think it can load new input and shift that. So I think you can add to left-shift or use the register to left-shift, but not simultaneously unless I am mistaken.

I guess your further comments on the proposed shift register are that it could be used for bit-banging serial transfers in native code.

The complexity issue is why I'd use the packaged ALU. For breadboarding purposes, I could use the PLCC-to-DIP adaptor.

***

Now, my proposed project might use the P2. I think that if possible, bus snooping should be the main DMA strategy. The idea could be to make it work a bit like the X16's Vera "chip" (actually FPGA board). Then one could dedicate a couple of pages of conventional RAM to be the window to talk to the proposed general-purpose I/O controller. Doing it this way means that you wouldn't use too much of the conventional RAM for I/O tasks. One page could be mostly "registers" and another can be a buffer. So one could use the controller RAM through the window and do everything. I mean video, sound, keyboard, file I/O, and perhaps random numbers and math assistance. For random numbers, I'd say the best strategy to incorporate a microcontroller as the I/O controller, in that case, would be to sacrifice a GPIO line for white noise mode and add a shift register to the board. Then somehow add an instruction to use it.

As for how to "talk" the other way, 2 ideas come to mind. If there is enough time, cycle-stealing can be done. If not, and the ROM can get behind it, then use the spinlock approach to simulate bus-mastering DMA. That would be fine if all I/O is moved out of the CPU. Since it is Harvard, the RAM is not needed for the ROM to run. So what you'd do when expecting a result from the controller would be to issue the controller instruction (from native and into the monitored address range) and then immediately enter a spin-lock, trying to read a known value from a specific RAM location. During this time, the controller claims the SRAM and uses it. To signal that it's done, just put the SRAM back. That would satisfy the spinlock and the code would continue.

***

Other considerations. Port instructions take up to 76 instructions. There would be 64 Input instructions and 12 additional Output instructions. The reason for the discrepancy is 2-fold. Input instructions include conditional instructions (Output instructions don't). Plus, I'm counting Input instructions that overlap with Output instructions. Now, it seems that if one uses an external controller, those 76 instructions could be replaced. But I wouldn't, not in the entirety (maybe partially). An argument for keeping many of those would be to repurpose the ports if desired. They could be used as controller command and status lines. So you could emulate IRQs this way. Maybe check for 0 on the In every so often, and if not, then parse it out and handle what needs to be handled, then use the Out to tell the controller it did what was asked. Of course, random numbers could come that way too. Like sending a port command to request one and reading it on the other port in the next instruction or whenever.

Really, one might want to have double the registers to be able to get past the resolution being locked to the clock rate. Marcel halved the screen to make 12.5 MHz work because there is no way to efficiently use any useful instruction between individual Out commands. Having separate accumulators and MAR sets would help, but in itself, that only does so much. While you could load and store between the pixels, do calculations, etc., you could not do branches in the middle of a video loop. So then, really, you'd need predicated instructions. That would be easier with a proper flags register.

***

Now let me get more into why I started this thread. I have specific questions about this eventual project.

1. Would adding another ROM to allow for wider operands be justified?

2. Besides a wider accumulator and possibly an additional MAR register or pair, would any other registers be needed/warranted? And yes, the wider ALU chip already has 3 registers, namely 2 inputs and an accumulator. There may be reasons to use or not use those. One reason for the input registers is to allow for simplicity over speed in a design so that if you use microcode if the design allows, you can use a single 16-bit bus to feed both operads (over 1 or more additional cycles).

3. Should I use wider SRAM? This won't give any advantage unless you use only aligned reads/writes. The issue with the current memory map is that important things like the registers straddle word boundaries. So that means you'd be stuck in an 8-bit mode and take twice as long to transfer a word. At least it wouldn't have to be over half as on other older machines since there is the X++ feature. So grabbing the next address is assumed.

So, if I were to do this, I wouldn't bother with a compatible memory map. I'd want to rearrange things or move to a word map.

4. Assuming 16-bit SRAM, how would be the best way to handle bit-banging or bus snooping? The challenge here is how to be wise in handling GPIO lines if bus-snooping with a controller. I mean, using 16-bit SRAM would mean needing to use up to 40 lines (if I want to use 20 address lines and 16-bit, with up to 4-5 control lines). But if I want to do temporary bit-banging, then the challenge would be that this would still be done using 8 bits. I guess there could be ways around that. So maybe use snooping in a cycle-stealing fashion, if latches are used.

5. What do you think of my idea to jump by the paragraph in a vCPU jumplist? Then you wouldn't need prefixes unless you need more instructions past 256. Dispatching should be simpler. The PC would jump to the first byte of the paragraph selected. The lowest bits would come from the ground plane and the bits that don't overlap with Y come from there.

6. Are there any other instructions that would not be too hard to do that would be good to add to a Harvard machine or one that does emulation? For instance, jumplists of paragraphs come to mind. So you could be able to interpret/emulate all the essential instructions inline such as above. Predication would be another since that would make it easier to bit-bang video at a faster clock rate without wasting as much time.

Gigatron Hackers

16-bit Gigatron similar machine

16-bit Gigatron similar machine

Re: 16-bit Gigatron similar machine

Re: 16-bit Gigatron similar machine