Thinking of creating a custom CPU

General project related announcements and discussions. Events, other retro systems, the forum itself...
Post Reply
PurpleGirl
Posts: 41
Joined: 09 Sep 2019, 08:19

Thinking of creating a custom CPU

Post by PurpleGirl » 05 Dec 2019, 15:50

I am not committed yet, but I've been considering a custom CPU. I am only in the crude planning stages about the architecture I'd want.

I'd likely need to learn how to do FPGA, especially if I were to do 32 or more bits. So I'd need to get accustomed to using FPGA design software and Verilog/HDL.

Now, it would be nice to keep Gigatron compatibility. Now, with 16 bits, you could keep the entire Gigatron opcode set as is. Maybe add one more access mode bit, so the 3-8 decoder could be swapped with a 4-16, thus not worrying about losing opcodes. There could be a bit to override the entire Gigatron instruction set thus allowing all the other bits to be used for a new decoder. So if that bit is set, you have space for 32k (32768) opcodes in a totally new instruction mode. Otherwise, you'd have the 256 original set and 128 new ones (1 bit is used as an escape sequence for the new set).

Reworking the original Gigatron instructions
In addition, unused or unusable opcodes of the original set could be subject to a secondary decoder to make them usable -- so long as this doesn't introduce too many delays or any race conditions (like if an instruction ends up taking 2 cycles and its output is used before it's ready). We pretty much know that any instructions with the top 3 bits high will be unique if my studying and math are right, but not sure if it is any of those pins or all of them (AND vs OR). I'd have to study that more. Seeing the instruction set in binary would be helpful. But I think it is anything that's 224 or higher that should certainly not be touched. Then I need to find the rules for the rest of them.

Another idea I had here with the original Gigatron would be to have a crude piggybacked decoder to look for specific NOPs and use those to change the instruction context, where the same binary code would take on different functions once a specific NOP is used, with a NOP in the new context to restore the original opcode meanings. That would mean that anything needed for video processing would need to function the same regardless of context.

8/16-bit register bonding modes
Maybe there could be 1-2 bits in the upper 8 bits to determine the register bonding mode. That mainly involves whether the carries are joined in the ALU, and if not joined, the upper registers have their own ALU. If 16-bit, I'd like to see a way to have the new half-registers to operate both in bonded mode with the existing ones with all calculations being able to use 16-bits, while also being able to split them and use them differently and simultaneously in more of an MMX fashion. What this would enable would be the freedom to do 8-bit operations at any time, and to do 16-bit (or 2 simultaneous 8-bit) operations during porches and vblank. Thus the 16-bit instructions would allow for emulated code to run better while doing the instructions in MMX fashion would allow for full speed 8-bit native code with the occasional 16-bit instructions. If there's not enough room to do different addressing modes simultaneously in augmented control logic mode, then those things can be done in the new instruction set mode and new control logic.

Architecture Change
Currently, the Gigatron uses Harvard architecture. It meets the goals of the Gigatron. But I'd like to see the ability to use native mode all the time., even when running software out of RAM. The vCPU emulator can still exist for compatibility. So expanding the Harvard architecture to include the RAM and giving some instructions that make it seem more Von Neumann might be a goal. Right now, only the ROM is connected to the Instruction Register, and thus that's the major reason why vCPU is necessary if you want to run programs out of RAM. Since IR cannot see the RAM and execute instructions from there, then there has to be software to read the RAM into registers and interpret it. So now, you'd have to find a way for the IR to access the RAM. The first thought there is to use multiplexers. But that poses a challenge. The current ROM is 16-bit while the memory is currently 8-bit. The ROM uses 8-bit instructions in the low byte and an 8-bit operand in the high byte. That would not run on a 16-bit machine since all 16-bits would be an opcode. So a 32-bit ROM would be one way to work around this. But I have difficulty finding any 32-bit EPROM or EEPROM that is not serial. Serial ROM would take an extensive memory controller that would need to be clocked at least 8 times faster. Or you could have 2 16-bit ROMs, with one being code and the other being data. Or a compromise would be to keep the ROM as-is, but only connect it to the low IR and the low DR, with possible overrides (when ROM is active) on the upper registers such as tristate buffers, tying the upper lines low, and being careful to not introduce skew when in 16-bit mode due to such complexity. But that would cripple the point of it being 16-bit, since then, only RAM programs would be able to use any of the new architecture, and how would you get to any addresses past the ones the ROM occupies without using paging, let alone getting to use RAM at all, or getting back into ROM from a RAM access? So 32-bit ROM would make more sense.

Now let's suppose we have 32-bit ROM. You'd still only have 64k of addresses, addressing it as 16-bit in 8-bit mode. But addressing it the usual way would mean the low word goes to the IR and the high word to the DR. The current ROM uses all 64k, and the current RAM can only go just as far if upgraded. But the registers would be twice as wide, meaning you could address up to 4 Gb. So the ROM could sit below the RAM. Or alternatively, you could page out the ROM. So if the program in RAM exited, it would need to page the ROM back and jmp to it, since that's where the "shell" is.

But that leaves another challenge. Would the operands in RAM require 32-bit wide RAM, or would the instruction have to increment it and read from the next address? That could mean 2 cycles. And I see the shortcomings of both approaches. I'd prefer to not get into needing to use thunking.

Speaking of thunking what if blocks of memory are used only for data? So how would you access or calculate (ALU) all 4 bytes of an address? I know a way but that introduces feature creep and increases HW complexity.

Another problem with expanding the Harvard architecture to RAM comes to mind. You might only be able to take advantage of the split registers being made autonomous (effectively 2 8-bit cores) when staying in the first 64k addresses. Paging out the ROM might be helpful here, and I think you could take advantage of the branching quirk (jump and then change pages while jumping, but the added paging instruction could take too long, and not sure the other would work, so there might be a need for nops at the beginning of the ROM or a loaded program). But what if you need to use ROM routines on the same page?

Beyond that would require full 16-bit mode and tying up 2 registers or a word register (or a byte register for 24-bits) and an immediate. So for the video, if you still want to bit-sling, you' might as well send a doubleword to the Out with circuitry there to serialize the bytes. That could buy 3 clock cycles. That would require bonding 2 16-bit registers, and the Out would likely need to be a register.

Post Reply