Extending the Gigatron instruction set

Using, learning, programming and modding the Gigatron and anything related.
Forum rules
Be nice. No drama.
User avatar
marcelk
Posts: 488
Joined: 13 May 2018, 08:26

Re: Extending the Gigatron instruction set

Post by marcelk »

Naturally the overall instruction set efficiency can improve by adding complexity (and with that, chips) to the control unit. We didn't even optimise the current 6-chip control unit design after we finally allowed the condition decoder in (it was a 5-chip design originally, with conditional branching only on AC bit 7). Given that, there should be room for improvement by using the dark gates in U12, U32, U33 (and U1) better before adding even more chips. I believe that was the original proposal, but it it seems not exactly the case any more.

When allowing to add a chip there are suddenly many more things that can be considered. Perhaps an [Y++,X] mode to help "Wolfenstein 3D" render walls? Or perhaps something to assist direct keyboard hookup. Or doubled horizontal resolution...
HGMuller
Posts: 20
Joined: 14 May 2018, 05:46

Re: Extending the Gigatron instruction set

Post by HGMuller »

Well, Y++ requires more than just a change of the control unit: you would have to replace Y by a counter. Of course allowing [X,D] also requires some fiddling with the data path, but these are just wire connections.

It is true that my latest design abandoned the initial philosophy of making a 'component-neutral' enhancement. It is also true that the more chips you add, the more you can do. But there is a 'law of deminishing returns' here; at some point you will be adding features that are almost never useful, and only enhance the speed with which a typical program runs by a percent or so. Lack of the mode [X,D], and of a variety of modes on loading the X and Y register can hurt enormously. In the tentative code I was writing for the chess program it slowed things down by a factor of 2-3, on average: for almost every data move you would have to save and restore a register, while availability of the proper mode (even if only on load instructions) would allow you to do it in a single instruction without touching any auxiliary register.

So the issue is really: would an extra chip be worth it to get a performance boost of a factor 2-3? I would say 'yes' to that. You already have some 38 chips, so the performance per chip would sky-rocket. For a boost of a mere 10%: probably not. Who cares about 10%? For a typical speedup of 1%, you would only decrease performance per chip when it required an extra chip.

The temptation to make something that is 'near perfect' (i.e. every mode for every destination and even on store instructions), as opposed to make something that is just a bit better (e.g. an extra mode on instructions that load ACCU), is just to much for me. Even if it requires a lousy extra 74x00.
HGMuller
Posts: 20
Joined: 14 May 2018, 05:46

Re: Extending the Gigatron instruction set

Post by HGMuller »

tocksin wrote: 13 Jul 2018, 13:58 What about something a little less intrusive? I'd hate to become incompatible with the current instruction set with all of the work people are doing developing the software. What if we just replace the instructions which are currently unusable? Specifically, the store instructions which write to RAM. So, when IR1,IR0 = 01 and /W = 0

My thought was the same as yours - let's create an [X,D] memory access mode to aid in block memory copies. But we only change the above modes. So we disable reading from memory on these instructions, and instead read from AC. And when we do this, we select the upper memory address byte to be the X register. This is a fairly simple change.

We break into the /OE line and make the /newOE = /OE or W. So it will go low normally unless there's also a store command.
We need an extra control line to tell us when we are overriding the normal command which would be /OVERRIDE = (/OE or /W).
We break into the /AE line and make the /newAE = /AE and /OVERRIDE. So it will go low normally and if we are overriding the old command. I'm pretty sure you can't mix wired-ANDs and wired-ORs, so we may need an extra chip here.

Connect the X register to the unused multiplexer input next to the Y input. Then route the inverted /OVERRIDE line to the select pin. So whenever you would normally do a store [Y,X] or [Y,D], now you store [X,X] or [X,D].

So now you do block memory copies by doing a LD [Y,D] to AC, then do a ST [X,D] from AC. Easy-peasy for two wired gates, one extra AND chip, and using the extra inverter while maintaining full reverse compatibility. We could further discuss doing [X++,D] instead to aid in block copies by putting the /OVERRIDE pin into the IX line for one extra diode. So what would you rather have? st AC->[X,D] or st AC->[X++,D] ?
OK, I had to digest this a bit. Some remarks:

The issue with the RAM /OE can probably be solved by connecting the bus decoder's /E input to the clock. The bus will then be driven only while the clock is low (the second half of a cycle), which should be good enough, as the data there is only needed at the end of the cycle, when the rising edge clocks it into a register. This would mask /OE entirely by /WE on store instructions.

The inverter on the MUXH select input could be eliminated by swapping X and Y on its input instead. This is probably not worth it from a practical point of view (e.g. counting the number of pins you have to disconnect from the PCB and the number of wires you have to use to reroute them to other holes). But it is interesting for the theoretical issue of whether the enhancement could be done in a 'component-neutral' way. In this respect it turns out that some of the choices which in the original Gigatron design were completely arbitrary in hindsight turn out to be unfortunate. E.g. if the mapping of bus-decoder outputs O0-O3 on output enables would have been /DE, /IE, /AE, /OE rather than /DE, /OE, /AE, the /newAE signal could have been generated as /AE and /(W and IR1), which could have been done through two wired ANDs plus the spare inverter. Now this cannot be done without altering the opcode map.

I think I would rather have [X,D] than [X++,D], but I don't think there is any reason why we could not have both. The proposed modification already requires an extra gate. Given that gates come in packages that contain four of them, there is no need to be modest and leave some of them unused. E.g. whenever we force X onto MUXH, we could force D on MUXL, by making its select line newEL = EL or OVERRIDE = /EL nand /OVERRIDE. This would give us [X,D] instead of both [Y,D] and [Y,X], at the expense of the useless [X,X]. The ST [Y,X++] that already exists would then be overridden to a ST [X++,D], without the need to do anything to IX.
tocksin
Posts: 25
Joined: 22 Jun 2018, 14:12

Re: Extending the Gigatron instruction set

Post by tocksin »

I like this idea. This would change the ST AC -> [X++, X++] instruction to ST AC -> [X++, D] which is much more useful. More instructions for no more chips is good!
User avatar
marcelk
Posts: 488
Joined: 13 May 2018, 08:26

Re: Extending the Gigatron instruction set

Post by marcelk »

HGMuller wrote: 19 Sep 2018, 09:36
The issue with the RAM /OE can probably be solved by connecting the bus decoder's /E input to the clock. The bus will then be driven only while the clock is low (the second half of a cycle), which should be good enough, as the data there is only needed at the end of the cycle, when the rising edge clocks it into a register. This would mask /OE entirely by /WE on store instructions.
Some caution is needed in this area. Instructions that read RAM followed by an arithmetic operation are on the critical path when there is a carry from the low adder into the high adder. In fact, if you sum the typical setup, propagation and hold times listed in the data sheets for that path, the Gigatron shouldn't work at all with the 160 ns cycle time it is clocked at: they sum to ~190 ns for 70 ns RAM, 30 ns too long. There is also a dependency on the prior instruction's addressing mode, but I've never fully characterised that contribution.

In reality the chips are a bit faster than advertised, and the registers borrow a few ns from the next CPU cycle by receiving the shifted CLK2 signal. When selecting the RAM and clock speeds, we made plenty of multi-hour eye diagrams, both with 70 ns RAM and 55 ns RAM. Our conclusion was that, at 6.25 MHz, 70 ns RAM really requires the shift provided by CLK2, but that more modern 55 ns RAM doesn't need this shift. We kept both signals in the kit edition (1) for compatibility with vintage parts and (2) for overclocking potential.

Therefore, letting RAM sit idle for the entire ɸ1 duration sounds a bit brave: ɸ1 is much longer than 70 ns - 55 ns = 15 ns after all: bus decoding must be fast for an early /OE.

BTW: the test pads at the top of the board (GND, ALU7 and CLK2) are there to make it easy to measure this. You want to verify that ALU7 is stable before CLK2 rises for at least the duration of the registers' setup time (~20 ns or so for TTL). Here's one overview I could find back:

eyes.png
eyes.png (845.68 KiB) Viewed 6779 times
HGMuller
Posts: 20
Joined: 14 May 2018, 05:46

Re: Extending the Gigatron instruction set

Post by HGMuller »

The point is that the data-valid time of from /OE of a RAM is usually much shorter than the 'access time' (which is data-valid from address-change or /CS). This is understandable from how the RAM works internally: an address change has to first propagate through the row decoder, then drive a row-enable line that must enable hundreds of gates (and hence has high capacitance), after which the data has to propagate (again through lines that connect hundreds of cells) to and through a multiplexer to select the right cell from the row. /OE just activates the tri-state output drivers that are at the end of this chain.

That propagation delays in practice are shorter than the specs is normal, as the specs are really for worst case: maximum fan-out, worst ambient conditions (like freezing temperatures), sub-standard supply voltage, high capacitive load... For a given design you usually have relatively low fan-out, short connections with little capacitance, etc.

I did some more thinking on the 'binary compatible approach'. This is of course bound to require more additional logic for a given enhancement. Yet it might be the most sensible thing to do. Completely redefining the opcode map just to save a TTL chip does indeed sound like a bad tradeoff. Of course when you are still in the design stage, where you still have complete freedom for how to encode the instructions, you woud pick an encoding that leads to the simplest decoder, but that horse has already left the barn (as Bob Hyatt likes to say). So given where we are, putting binary compatibility before simplicity is not a bad idea.

I am not satisfied with just ST ACCU->[X,D] as an extra instruction, though. But fortunately it seems one can do significantly more than that:

The idea is to not only repurpose the opcodes for the (originally undefined) instructions ST mem->mem, but also those for the (useless) instructions that use ACCU as a second logical operand (such as LD ACCU,X and XOR ACCU). The latter group can be used for instructions that do read their second operand from memory instead. Rather than what was discussed before (i.e. replacing [Y,D] by [X,D] in 'overridden' cases), I want to do it by replacing [D] by [X,D]. So override the MUXH enable rather than its select. The MUXH select then can be driven by /IR3 (requiring an inverter). This makes use of the fact that in the original design MUXH is only enabled (passing Y) in modes/destination combinations 2, 3 and 7, ([Y,D] -> ACCU, [Y,X]->ACCU and [Y,X++]->OUT). In mode/destination 0, 4 and 5 (where /IR3 would select X) they are all [D]->{ACCU, X or Y}. This would turn into [X,D]->{ACCU, X or Y} if MUXH were enabled. (Which will only be done in the opcodes to be overridden.) So we add the instructions ST [X,D] , ST [X,D],X , ST [X,D],Y , LD [X,D],ACCU , LD [X,D],X , LD [X,D],Y , plus all versions of the latter three whith AND, OR or XOR instead of LD.

To achieve this we must generate the signals:

/OVERRIDE1 = /W or /OE
/OVERRIDE2 = IR7 or /AE
/ACCU= (/ADD and /W and /J) or /AE
/newOE = /OVERRIDE2 and /OE
/newAE = /OVERRIDE1 and /ACCU
/newEH = /OVERRIDE1 and /OVERRIDE2 and EH

This requires 3 OR gates. (Wired OR is not recommended with TTL, where the voltage for logical 0 is so close to GND.) The AND operations could be done with 9 diodes and 3 pull-ups. (The AND for /newEH can just attache two more diodes to the existing diode matrix.) The /ADD and /W and /J (all from the function decoder) is a (rather cumbersome) way to generate /IR7 without using an inverter. (This is why I dislike using OR gates; you cannot use those you have to spare as inverters...) The /ACCU signal indicates where it is actually useful to drive ACCU onto the bus. Note that this includes the case of the ADD ACCU instruction, which is equivalent to LSL, and thus not completely useless like the other operations with ACCU operand. (NOP or CLR can be done in other ways.) Also note I made no attempt to suppress the RAM /OE when /OVERRIDE1 drives ACCU onto the bus, as I still assume the trick of enabling the bus decoder only during phi2 will work.
User avatar
marcelk
Posts: 488
Joined: 13 May 2018, 08:26

Re: Extending the Gigatron instruction set

Post by marcelk »

For reference, I found back more measurements from the time we were working on comparing various types of clock circuits, inverter types and values for the passives.

clk51.png
clk51.png (625.42 KiB) Viewed 6762 times

This is a critical-path breakdown I made at the time (calculated and measured). Shortly after this we settled for a 6.25 MHz clock or 160 ns cycle time.

breakdown.png
breakdown.png (183.34 KiB) Viewed 6762 times
HGMuller
Posts: 20
Joined: 14 May 2018, 05:46

Re: Extending the Gigatron instruction set

Post by HGMuller »

Nice. Is the calculation for LS or HCT chips?

Anyway, you can see that the RAM cannot start doing anything useful at all before 18 + 27 + 18 = 63ns into the cycle, as it isn't even fed the right address before that. And in the measurement this is even 86ns. The bus-mode decoder is not on the critical path; Tri-state enable times of the TTL data sources it drives are all very fast, and the worst delay is from the RAM.

The diagram doesn't show when phi2 of the clock starts. But if this enables the mode decoder it would require the enable delay of the '155 (assumed 18ns here) plus the output-enable to data-valid time of the RAM, which is typically half the access time. (Say 40ns for a 70ns RAM.) That doesn't seem like a problem.

To side-track a little bit: The Gigatron was designed for simplicity rather than speed, and the timing diagram shows this clearly. The critical path of information flow is very long. Despite what is sometimes claimed, the Gigatron is NOT a RISC machine: one instruction specifies an address calculation, memory access and ALU operation all together. This is what we call a complex instruction. In a true RISC design this would all be done in separate instructions: ALU operations would only have register operands and destinations, the program would be responsible for any address calculation, and memory access would be limited to loading and storing a register at an address contained in a (possibly dedicated) register. That would allow a much shorter cycle time for the same RAM speed. (But would probably take many more chips, and lead to longer programs, although these could have shorter instructions.)
HGMuller
Posts: 20
Joined: 14 May 2018, 05:46

Re: Extending the Gigatron instruction set

Post by HGMuller »

Oops! :oops: I overlooked that the ACCU mode is not useless on the instructions that load X or Y, as the LD instruction passes the operand from the bus. LDX A, LDY A are the most obvious choice for transferring A to X or Y, so they are likely heavily used. There are plenty of alternatives, such as OR $00,X, but this would destroy the binary compatibility.
tocksin
Posts: 25
Joined: 22 Jun 2018, 14:12

Re: Extending the Gigatron instruction set

Post by tocksin »

If the diode logic is really causing that much of a slow down, you could decrease the value of the pull-up resistors. It would consume a lot more current, but that might be a easy tradeoff for speed.
Post Reply