Everything started more than one year ago when I saw a presentation of a guy (Walter) that with a beer in one hand was talking about a strange microcomputer in an hotel hacker conference… I got immediately interested, and as soon as I've seen the very nice documentation, ideas started gathering to hack it and optimize even more (I'm passionate about minimalistic things)!
For one month I accumulated them, but entering even more in the “build your own processor” community I got into relay based ones that are similarly challenging in terms of design, but easier to build and test with rudimentary electronic skills and instruments, and in some way more fun (the satisfaction of looking and hearing electro-mechanical relays moving is different than looking at static black ICs, and for this blinkenlights were a wonderful idea!).
The relay computer project is still ongoing (together with few others), but again by chance after looking at the fantastic PucMon youtube video I got the call to try at least to put the ideas together and see them with the community, hoping some of you might like to collaborate on them and so motivating myself in going forward, get even more ideas and surely minimizing the errors that will be made in case of implementation.
Some of these changes retain full "retro compatibility", but in general the idea is to simplify even more the design trying to remain compatible at vCPU level, keeping the 70s philosophy and the “software can replace hardware” approach of the project (in fact in few places extending this concept)!
All of them have still to be tested, but personally I think that most of them should work, but even being able to implement just some of them can be an interesting result!
I analyzed each part of the machine trying to find ways to reduce components and logic gates, as well as trying to preserve as much as possible the functionalities, and where possible even extending them (as for VGA). In terms of pure reduction being able to implement all the changes proposed could save around 10 integrated circuits, but even half of them I think would be a great result!
Reducing even more the number of chips should also help more people to be involved in this very interesting area of homebrew CPU design, making it easier to build and hack!
Below a selected set of changes to the original platform:
**Removal 2:1 multiplexers for Y**
Since high byte of RAM address can be 0 or Y, the same effect of the double 2:1 multiplexers can be obtained using a 374 chip for Y register in combination with pull-down resistors, simply routing the multiplexer selection to Y output enable
This change should be fully usable also with the "standard" gigatron, saving 2 chips on the total count.
To optimize current consumption for TTL the Pull Down circuits can be transformed in pull up switching the page o to the page FF. This also depending if pull-down is or not faster than pull-up!
**Removal 2:1 multiplexers for X**
Without the 2:1 multiplexer on X we lose the possibility of a direct memory addressing from ROM, that will require one additional instruction to load X. One instruction more where required but two chip less in general, seems a good tradeoff and fully aligned with gigatron ideas!
The pull-down resistors can be applied also here giving the possibility to address directly the 0 address (might be used for example in horizontal scrolling games for pixel rows displacement).
Note that with the easy addressing, [0,0] can be used as an additional auxiliary register!
The X incrementer is kept to be able to output to VGA at full speed. Having the two chips also allow the separate load of the nibbles if needed.
**Increasing the RAM**
This is a trivial but important one: 64kbytes of ram by default. Self-explanatory and surely missing on the standard gigatron.
**Increasing the clock frequency**
Gigatron has been already reported being able to run at double speed, and current strip down is even decreasing path lengths so helping to improve overclocking possibilities or stability.
Increasing the clock frequency would minimize the effect of some simplification of the hw side that requires some additional instruction on the sw one.
Note that to keep compliance with VGA pixel frequency of 25.175MHz, higher usable clocks (of the 6.29375MHz /4 one) should: 12.5875MHz or 8.3917MHz, (respectively /2 and /3). And surely whatever multiple of 6.29375MHz or the others can also fit.
**Including a keyboard**
Some design decisions are not fully aligned with the main idea of Gigatron, and one is to use the pretty complex serial chip, but, even more, requiring an entire microcontroller to be able to use a keyboard is surely not in line with the original philosophy.
Removing it and using an old-style matrix keyboard (and no external controller) is much more aligned and use much simpler logic. If required joystick can be attached to an expansion port (and again a much simpler one just based on switches).
With a 32 keys (4x8) matrix keyboard, blinker led lines might be used for keyboard row signals and should not interfere much with led blinking. For the columns a specific buffer chip should be used (de facto replacing the current serial chip).
**Minimizing Program Counter**
Design idea: the most significant byte of the register can be incremented at the end of the page with an unconditional fixed jump to the address 0 of the next page. This will allow the use of a standard flip-flop chip instead of the double 161 incrementer. Benefits: one chip less and shorter carry propagation; downside: one/two ROM words per page used to jump, with average even slightly more for the cases which Temp/Flag register need to be saved and/or restored. Note that this limitation applies just to native instructions and not vCPU ones, and just to code that go multipage (very limited if at all existing even in the current implementation).
A a side note the "rollover" feature on the same page might even be used by purpose on some extreme optimizations.
**Minimizing the Control Unit**
Design idea: CU Signal "unrolling", using two ROMs, and putting into the ROMs the “already decoded” instructions signals (24 bits for signals and 8 for data).
Other than removing some chips, CU decoding logic removal (together with ALU described later) is also shortening delay paths increasing the possibility for higher clock frequency compatibility. Another benefit is having the possibility to specify all the parallel activity that might be needed by the instruction since they are stored separately.
Expansion BUS, this one adds chips instead of removing, but keeping it simple putting on the bus just a dedicated register and a signal to write and one buffer and a signal to read we would have a 20 pin connector (including Vcc and Ground) for any future use.
These chips are optional and can be soldered just if/when needed.
VGA chips are already at the bare minimum, but increment the number of VGA colors still maintaining the single register output might be possible using the Digital Composite Sync feature (horizontal and vertical syncs just using hsync) to have 128 colors (ref. https://www.avrfreaks.net/forum/found-h ... ga-feature, https://hackaday.com/2015/12/17/attiny- ... -8-colors/ https://www.avrfreaks.net/forum/impossi ... er-atiny85). The monitor should support the DCS feature, but seems almost all of the monitors do it!
There is an even more daunting version obtainable mixing the sync with the signal (ref. https://www.avrfreaks.net/forum/impossi ... y85?page=3).
Note that via software also more modes can be made available with more resolution as for example 256x192, focusing resources (50k RAM and a lot of CPU) just on picture displaying.
On the contrary, there might be situations when maximum computational power would be needed and no video is required. This could be coded a new vCPU instruction “FAST" (analogous to ZX81 one), leaving a very powerful computer that for example can compute a chess move!
**Minimizing the Arithmetic Logic Unit**
We already removed the CPU, and inside it the CU, the ALU is the last piece remaining... let's remove also it!
This is the last, but surely the most important. The most chip reduction and disruptive.
The proposal is to use loads and lookup tables instead of ALU dedicated chips for operations, but remain efficient (and having even all conditional signals) with a purposefully designed architecture composed of an "expanded" MAU, two "mixing" registers, one temp/"flag" register and one to combine nibbles results. For the comprehensive details and some code examples please have a look at the overall picture later on.
This way of operating might be a bit slower in doing logical and arithmetic operations, but it can cover in the same efficient way whatever operation you would need, and having more registers (and also flags) can be even faster in some operations. And surely is fully in line with the gigatron philosophy of using software instead of hardware!
Regarding RAM utilization, 8bit lookup tables consume 256bytes of RAM. 10 of them that probably are enough for most operations would consume around 2kbytes, leaving still a lot of space for the rest.
Moreover, from my analysis, the logical and arithmetical operations are just a subset of the total, and far to be the majority.
ALU removal together with CU decoding logic removal is also shortening delay paths and increasing the possibility to increase clock frequency (ALU7 is one of the slowest signal in the standard gigatron).
Below you can see a picture trying to show all the components and the interactions among them: For example, if you need to do the XOR of two operands and you have loaded them in the S and T the instructions would be:
Code: Select all
1. U, V = S x T # pair the nibbles of the two input operands in U and V 2. Y = XorLutPage # point to the look up containing in the low nibbles the XOR of the low and high nibble of the index 3. T = [Y,U] # calculate the XOR of the low nibbles 4. W = T + [Y,V] # calculate the XOR of the high nibbles and join them with low ones to store the result in W
Code: Select all
1. X = address of vAC 2. T = [0,X] # load vAC into T (supposedly vAC is in page 0) 3. U, V = T x bytetoadd # load U and V with the low and high nibbles to be added 4. Y = AddLutTable # the ADD look up table contains the result of adding the two nibbles of the index 5. T = [Y,U] # add the low nibble ("obviously" the carry will be in bit 5) 6. if not T.HalfCarryFlag jump to step 8 # since pipelining this instruction would need to be phisically put one instruction before when coded 7. Y = AddPlus1LutTable # in case of carry use the ADD lookup table with +1 included 8. W = T . [Y,V] # join the result of the high nibble addition with the one saved in T (low nibbles) 9. [0,X] = W # store result back into vAC
Code: Select all
1. X = address of vPC 2. X = [0,X] # load vPC low in X 3. Y = DoubleIncLut # 2 bytes per vInstruction 4. S = [Y,X] # increment vPC 5. X = address of vPC 6. [0,X] = S ; X++ # save incremented vPC low and set X on vPC high (example of parallel execution) 7. Y = [0,X] # load vPC high 8. X = S # saved vPC low 9. Jump to [Y,X] # to execute vInstruction
I have experience in microprocessor architectures and I've realized various assembler applications, but as for electronics, this would be my first "not trivial" realization! Any feedback, suggestion, or further improvement is more than welcome!
I'm glad to know you managed to read till here; hope you found it interesting!
In the meantime, happy hacking!