An ominous warning and a hardware puzzle

Using, learning, programming and modding the Gigatron and anything related.
Forum rules
Be nice. No drama.
Post Reply
lb3361
Posts: 360
Joined: 17 Feb 2021, 23:07

An ominous warning and a hardware puzzle

Post by lb3361 »

The first three multiplexer inputs are tied to the instruction bits that determine when a branch should occur. For instance 100 means BGT, 110 means BNE, etc.
For an unconditional branch, the instruction bits are 111.
I maintain an experimental rom to play with all kind of crazy idea. I am particularly fond of the latest one which has to do with dispatching work using a low overhead finite state machine. Anyway, the compilation gives ominous warning because I am using a conditional branch to an address fetched from RAM

Code: Select all

1403 f52b  bge  [$2b]
Warning: large propagation delay (conditional branch with RAM on bus)

The offending code looks like this:

Code: Select all

FSM14_NEXT:   1401 8115  adda [$15]       6095  adda([vTicks])                  #0 NEXT
              1402 f52b  bge  [$2b]       6096  bge([fsmState])                 #1
              1403 c215  st   [$15]       6097  st([vTicks])                    #2
Marcel left a helpful comment in asm.py

Code: Select all

  # Warning for conditional branches with a target address from RAM. The (unverified) danger is
  # that the ALU is calculating `-A' (for the condition decoder) as L+R+1, with L=0 and R=~A. But B
  # is also an input to R and comes from memory. The addressing mode is [D], which requires high
  # EH and EL, and this is slower when the diodes are forward biased from the previous instruction.
  # Therefore R might momentarily glitch while B changes value and the AND/OR layers in the 74153
  # multiplexer resettles. Such a glitch then potentially ripples all the way through two 74283
  # adders and the control unit's 74153. This all depends on the previous instruction's addressing
  # mode and the values of AC and [D], which we can't know with static analysis.
  # See also https://github.com/kervinck/gigatron-rom/issues/78
All this speaks about an unverified race condition, suggesting that it has not been observed. Yet it is reasonable to assume that Marcel developed this unverified yet complicated theory for a reason. Needless to say, my experimental rom (attached) uses this with very high frequency and is working flawlessly.

First question: has anyone has actually observed this? Should this instruction branch unexpectedly, then the attached ROM cannot work. Yet, as far as I know, it works very well. Maybe Walter knows something about Marcel's original concern?

Second question: it seems that applying the same reasoning as Marcel's to U12 (the condition code decoder, also a multiplexer) suggests that if BRA[$dd] works, then so does BGE[$dd]. Can you see a flaw in the following argument?

The two selector bits of the condition code decoder are the very stable AC7 (the high bit of the accumulator flip-flop) and the potentially glitching C0 (the carry from the ALU). Together these two signals define the three possible conditions for the accumulator.

Code: Select all

  {AC7,C0}   AC is      BRA   BGE
 ---------------------------------
    00       >0         y     y
    01       =0         y     y
    10       <0         y     n
    11     Illegal      n     n

The first three multiplexer inputs are tied to the instruction bits that determine when a branch should occur. For instance 100 means BGT, 110 means BNE, 111 means BRA, etc. The fourth multiplexer input is wired to the ground, which means that a glitching C0 could prevent a branch from happening even when the jump instruction is an unconditional branch. In fact we can compare BRA and BGE in two scenarios:
  • When AC7 is 0, both are supposed to branch. A C0 glitch only affects the first two multiplexer AND gates because the last two are held low by /AC7. The situation of BRA and BGE is therefore exactly identical. If a glitch on C0 can cause BGE to fail, then it can also cause BRA to fail.
  • When AC7 is 1, only BRA is supposed to branch. This time, the first two AND gates are safely held low by AC7 and are out of the picture. In the case of BGE, the last two gates are also safely held low by the multiplexer inputs, therefore we are certain that BGE does not branch. In the case of BRA, a glitch on C0 could temporarily cause the illegal combination 11 and prevent BRA from branching.
We know that BRA[$dd] works perfectly because it is found in the video output code of all ROMs so far. Therefore this reasoning suggests that BGE[$dd] must work just as well. The wiring of U12 ensures that if BRA works as expected, then so does BGE, regardless of what comes after or before. The same argument also applies to BLT.
Attachments
exp.rom
(128 KiB) Downloaded 63 times
lb3361
Posts: 360
Joined: 17 Feb 2021, 23:07

Re: An ominous warning and a hardware puzzle

Post by lb3361 »

I realize I did not explain why I am using this strange conditional branch instructions.

I tried to estimate the hidden cost of dispatching long instructions and found that it increases very quickly after 64 cycles. Splitting them into independently scheduled pieces is penalized by the dispatching cost (15 to 20 cycles, depending how you count) and is also very complex. At some point I stumbled on a very fast way to write a dispatcher as a finite state machine. It only has the minimal set of entry points, ENTER and NEXT. Everything has to be in one page. But the overhead is only 3 cycles (or 5 if you count the jump to NEXT):

Code: Select all

fillers(until=0xff)
label('FSM14_ENTER')
bra(pc()+4)                     #0
align(0x100, size=0x100)
bra([fsmState])                 #1
assert (pc() & 255) == (symbol('NEXT') & 255)
label('FSM14_NEXT')
adda([vTicks])                  #0
bge([fsmState])                 #1
st([vTicks])                    #2
adda(maxTicks)                  #3
bgt(pc()&255)                   #4
suba(1)                         #5
ld(hi('vBlankStart'),Y)         #6
jmp(Y,[vReturn])                #7
ld(0)                           #8
I tried to apply this to AT67's already very fast SYS_Multiply_s16 and SYS_Divide_s16. Instead of executing the multiplication as 16 fragments of 56-66 cycles, it does so as 34 fragments which are all less then 28 cycles. The division is even more extreme: 49 fragments. Other than that, this was basically AT67's code. I was able to make further improvement that would have been very hairy in a traditional self-restarting SYS call. The code is at https://github.com/lb3361/gigatron-rom/ ... m.py#L5850.

Here are the timing results for 3000 16x16 multiplications with semi-random arguments (mode 1):

Code: Select all

                             +----------+----------+----------+----------+
                             |   vCPU   |  Native  |    FSM   |  Newest  |
+----------------------------+----------+----------+----------+----------+
| 3000 multiplications 16x16 |   12.6   |   4.15   |   3.42   |   2.75   |
| 3000 divisions       16/16 |   15.4   |   7.45   |   4.93   |   4.93   |
+----------------------------+----------+----------+----------+----------+
Not only this saves time, but it is also far easier to program. Since each fragment starts at cycle 3, one can write a surprising amount in 28 cycles. In addition, there is little need to make complicated adjustments when one merges two control paths with different execution time. One can always branch to NEXT and be done with it. This of course has a lot of consequences on how to program long opcodes.

But there is more. One of my goals was to rewrite sys_Exec in purely native code because the v5a one uses memory on the stack, and I have other plans for the stack. The idea was to use the same scheduler but use fsmState as a program counter. Therefore I write a bunch of microinstructions and a program that uses them. You can see that at https://github.com/lb3361/gigatron-rom/ ... m.py#L6072. Then I realized that I could also write a native Loader that shares most of its code with the native Exec.

This is why I would really like to know whether the ominous warning hides a real monster. So far so good...
Post Reply