Page 2 of 3

Re: 10MHz and Beyond!

Posted: 20 Apr 2019, 16:20
by marcelk
marcelk wrote: 14 Apr 2019, 07:21 It shouldn't be too hard to make a custom ROM that sends out more pixels per line. If not from RAM, they could be black, giving only half a screen. Maybe even squeeze some vCPU time in there. I can look into this during the Easter weekend. This would bring back the horizontal pulse frequency into a range the monitor can accept.
It always takes much more time than expected, but I believe I have something that should bring back video for a system that runs at 12.5 MHz: ROM v3y (ROM image attached to this post).

ROM v3y is based on standard ROM v3. It adds 200 cycles to every scanline by running vCPU for that duration. This brings back the horizontal sync frequency to 31.25 kHz.

The display becomes just the left half of the screen at 160x120 pixels, with a pixel aspect ratio of 2:1. So it's obviously suitable for experimentation only. The good news is that pixel lines can now spend half of their time running application code. The extra vCPU power is very noticeable in BASIC programs and in Mandelbrot.

A small technical detail: retro scanlines (and invisible vertical blank lines) invoke vCPU twice, instead of just once for a longer combined period. This gives a small overhead, but is needed to prevent overflow of the signed 8-bit vTicks variable. (vTicks tracks the remaining time for vCPU in the current time slice.) Some refactoring was needed to keep the video code inside their pages due to this. It barely fits now.

This is still not the "proper" way to do it, because the horizontal porches and pulse are still at 50% of their nominal duration. But it is a reasonable shot and I give it a fair chance that VGA monitors accept the signal again.

There are many other ways do to this. For example doubling every pixel. Or streaming 256 pixels instead of 160. Or even streaming 320 pixels if you don't mind the repetition when wrapping around the RAM page boundary. But that all requires more reshuffling in code page 2, because that currently doesn't have room for a longer pixel burst. It can be done, but maybe not worth it (yet).

Caveat emptor: I tested it in emulation in gtemu.c and in Phil's Javascript emulator with the clock cranked up. On hardware just with the scope (with the slower standard crystal) where the video sync signal looks stable when pressing buttons and changing modes. That's an indication there are no major errors in timing consistency. So I'm happy to ship a ROM for testing. I can only test it on fast hardware myself after I've made my 74F-based build, and that isn't scheduled yet.

Edit: I figured I could just as well put the emulated 12.5 MHz Javascript version online, so you can experience the speedup without soldering: Mandelbrot renders in just under 3.5 minutes, in the default video mode(!), vs. almost 19 minutes on the stock system. A speedup of 550% for just double the clock :-)

ROM v3y at 12.5 MHz
ROM v3y at 12.5 MHz
Screenshot 2019-04-20 at 20.21.12.png (206.86 KiB) Viewed 4258 times

Re: 10MHz and Beyond!

Posted: 22 Apr 2019, 11:33
by marcelk
I made eye diagrams for a 74HCT board with the 55 ns RAM that comes with the kit, and one with the 10 ns RAM you kindly sent me. I'm not a SMT soldering hero, so I'm surprised it works after the heat abuse I let it undergo. Each trace was captured for somewhere between 1h15m and 1h30m while running Mandelbrot in video mode 3 (fastest for vCPU). The purple trace is ALU7.

10 ns SRAM
10 ns SRAM
DS1Z_QuickPrint7.png (48.03 KiB) Viewed 4238 times

For the overclock potential we have to consider the setup time for the user registers. Lets take the 74x377 as reference. The setup time tSU for 74LS is ~20 ns, for 74HCT it is ~11 ns at 5V and 25˚C, and for 74F it is 2 ns (!?).

From the trace I estimate we have 3/5th + 1 + 3/5th divisions between stable ALU7 and rising CLK2: 44 ns. So I say that this board when overclocked to 1000/(160-44+11) = 7.8 MHz will probably still be 100% ok. At 10 MHz there must be errors creeping in, because the setup time has to be negative for it to fit.

Below is the same board with the fast RAM installed. The difference is much less than 55-10=35 ns. The gap looks more favourable when you sample for just a minute or so. I give it 3/10th + 1 + 1 + 3/5th divisions at best, or 58 ns. So it will be 100% ok at 8.8 MHz, but it becomes a bit shady above that.

I've ordered the missing parts to complete my 74F build I just started. This is fun...

55 ns SRAM LY62256-55
55 ns SRAM LY62256-55
DS1Z_QuickPrint6.png (49.62 KiB) Viewed 4238 times

10 ns SRAM
10 ns SRAM
IMG_5174.JPG (151.57 KiB) Viewed 4238 times

Re: 10MHz and Beyond!

Posted: 22 Apr 2019, 12:06
by monsonite
Hi Marcel,

This is great feedback. I appreciate the effort that you have put in over this weekend to create this proof of concept.

As well as the timing gain from the 10nS RAM, I think the resistor pull-ups in the decoder section are an "easy win" to improve the low to high transition time. For my experimentation, I just soldered additional SIL resistor networks to the underside of the board, these worked in parallel with the existing 2K2 parts to lower the resistance. I now have about 600R pull-ups and this has reduced the rise time to better than 30nS.

Your soldering of the 55nS RAM chip looks fine. SMT soldering is a skill - but it is quite easy when you get the hang of it. Good flux makes the job 99% easier, and when you realise that surface tension will create a good joint for you, you don't have to try to manualy solder every pin - just let the wave of solder on the tip of the iron find its way under each pin. It's hard to describe - but I learnt it mostly from watching young lads on a Chinese production line on one of my many Chinese adventures.

Re: 10MHz and Beyond!

Posted: 27 Apr 2019, 09:06
by marcelk
My 74F based build runs at 12.5 MHz now as well:
  • All 74F logic, except for clock (74HCT04) and input (74HC595)
  • Pull-up resistors in control unit lowered from 2.2kΩ to 680Ω
  • BAT42 signal diodes
  • Removed CLK1-CLK2 delay. The registers didn't update properly otherwise, even with just C3 removed.
  • Faster RAM (10 ns i.s.o. 55/70 ns). This is on the critical path.
  • Faster EPROM (45 ns OTP i.s.o. 100/150 ns UV erasable). More than fast enough.
  • Modified software to slow down the video signal 2x (giving 5.5x more time for applications)
  • Inner GND/Vcc layers and smaller clearances in copper pour
  • No I/O connected yet, and only about half of the decoupling capacitors are on (I ran out of parts)
  • 12.5 MHz crystal, and removed(!) C1 and C2.
The current draw is 870 mA compare to 500 mA for 74LS, or 80 mA for 74HCT. The chips and the board get a bit warm, but not as bad as with original TTL. I'll see if I can make a FLIR image later today.

As I didn't have a input device hooked up at this stage, so I couldn't switch to Mandelbrot. The eye diagram looks healthy when running the main menu. As usual: CLK1=Yellow, CLK2=Blue, ALU7=Purple.

12.5 MHz main menu
12.5 MHz main menu
DS1Z_QuickPrint8.png (50.83 KiB) Viewed 4206 times

It looks so good, I put on a 15 MHz crystal!

15.0 MHz main menu
15.0 MHz main menu
DS1Z_QuickPrint16.png (55.52 KiB) Viewed 4206 times

That definitely works. But we have just 6 ns before ALU7 settles and CLK2 rises. So it might not hold under load. I quickly improvised an input device, started Mandelbrot and let it run for a couple of minutes:

15.0 MHz Mandelbrot
15.0 MHz Mandelbrot
DS1Z_QuickPrint11.png (55.47 KiB) Viewed 4206 times

Ough, it's really struggling now, but it isn't crashing. It looks like it needs more caps. It's probably not meeting the 2 ns setup time, but it is getting close. We can try shift CLK2 a bit. This is definitely at or over the edge of where we can go at this moment.

Edit: 10 ns SRAM is far out of the date range of course (Q: any indication of when it became available? 1990s?). But 55 ns isn't, and with that 10 MHz would have worked back in the day, provided you could have built a fast-enough ROM replacement as well. But that can always be another RAM.

Re: 10MHz and Beyond!

Posted: 27 Apr 2019, 11:28
by monsonite
Hi Marcel,

This is excellent progress, especially that you have found 12.5MHz to be stable, and then pushed on to explore 15MHz.

The ROM V3y arrived in the post today - so I got a chance to try my system at 12.5MHz.

I am still using a clock signal generated by an ARM microcontroller - that is not particularly stable.

However, despite my somewhat flaky clock arrangement, I got Tetris, Wozmon and Pictures to run as normal.

The images below were taken off my monitor with the machine running at 12.5MHz. You only get half a screen - and the change in aspect ratio is particularly noticeable - especially with the Saturn image. You get a free "Easter Egg"! :D

Tiny BASIC would not run at all at 12.5MHz on my system - so I reduced my clock back to 12MHz - which is the nearest crystal I have - and Tiny BASIC is now running stably.

I am eager to hear how you get on when you have the full keyboard input and video output running.

Perhaps we should rename this thread 12.5MHz and Beyond! :D
Start Screen.jpg
Start Screen.jpg (268.44 KiB) Viewed 4180 times
Parrot.jpg (388.34 KiB) Viewed 4180 times
Happy Easter.jpg
Happy Easter.jpg (316.1 KiB) Viewed 4180 times

Re: 10MHz and Beyond!

Posted: 27 Apr 2019, 12:15
by monsonite
Having had a time to experiment briefly with the new ROM v3y it is most noticeable the difference it makes to program execution speed - especially Tiny BASIC. Those extra 200 vCPU cycles at the end of the video line make an appreciable difference.

If you take a simple program to output an incrementing number to the screen:

10 n=n+1
20 print n
30 goto 10

At 6.25MHz and no line blanking, this takes a glacial 8 minutes 31 seconds to count to 1000.
With 3 line blanking - it takes 47 seconds to reach 1000.

So currently line blanking is essential for any real improvement in speed giving nearly an 11 times speed improvement.

With the ROM v3y and a 12.00MHz crystal - the results were different - and impressive. Times in seconds to reach 1000 for each of the video modes:

0 blank 26.5s
1 blank 22.9s
2 blank 20.8s
3 blank 18.1s

So whilst line blanking offers some incremental improvement - it is no longer the dominant effect on speed.

Scaling these figures for a 12.5MHz crystal would approximate

0 blank 25.4
1 blank 22.0
2 blank 20.0
3 blank 17.4

So doubling the crystal appears to give about a 2.7 times improvement for 3 line blank mode, but a massive 20 times improvement for 0 blank line video.

If anyone has a Commodore 64, BBC Micro, ZX Spectrum or similar early 1980s machine still in working condition - it would be very interesting to run some comparative benchmarks for the simple BASIC program given above.

Edit - I found a C64 emulator here

Time to execute the BASIC program above 49.36 seconds.

This suggests that the unmodified Gigatron with 3 line blanking is slightly faster than a C64 for this benchmark. The 12.5MHz modified machine is between 2 and 3 times faster than the C64 for BASIC.


Remember that Tiny BASIC was written in GCL which is itself an interpreted language - which suggests two levels of interpretation.

C64 BASIC was written in 6502 assembly language and much of the video and sound generation was offloaded from the 6510 to the VIC-II graphics chip and the SID sound chip.

I think that this gives a fair representation of the performance of the Gigatron in comparison to some of the classic retro machines.

Re: 10MHz, 12.5MHz and Beyond!

Posted: 27 Apr 2019, 16:39
by marcelk
This is an IR image of the 15 MHz configuration running Mandelbrot. I took it to a friend's house, as I don't have a FLIR camera myself. The user registers AC, OUT and Y get warmest. AC warms up to 40 degrees Celsius, OUT slightly above that. For reference, original TTL gets to 60 degrees. Surprisingly, the entire control unit stays cool, all 6 chips of it [Edit: the inverters on the left (U15) warm up differently. I missed that]. When switching off the power, the OUT register has a much slower cool down than all other chips: it's socketed and has a poorer thermal contact with the inner copper layers.

IR image of 15 MHz 74F running Mandelbrot
IR image of 15 MHz 74F running Mandelbrot
ab5981f1-2f45-433a-9b01-4d4798b6c7e4.jpg (232.52 KiB) Viewed 4167 times

Video of cold boot into the main menu here:

Re: 10MHz, 12.5MHz and Beyond!

Posted: 28 Apr 2019, 07:05
by marcelk
Thanks for benchmarking, and good to see the modified ROM generates a stable video signal that gets accepted.

monsonite wrote: 27 Apr 2019, 12:15 Caveats:

Remember that Tiny BASIC was written in GCL which is itself an interpreted language - which suggests two levels of interpretation.

C64 BASIC was written in 6502 assembly language and much of the video and sound generation was offloaded from the 6510 to the VIC-II graphics chip and the SID sound chip.

I think that this gives a fair representation of the performance of the Gigatron in comparison to some of the classic retro machines.
Microsoft BASIC's variables are floating point. It does have integer variables also, but those can't be used in FOR-TO-NEXT loops.

Microsoft BASIC is tokenised while Tiny BASIC parses everything again and again. Even processing a single digit number already involves multiplication by 10.

Re: 10MHz, 12.5MHz and Beyond!

Posted: 30 Apr 2019, 17:55
by monsonite
Further progress this evening with the arrival of a 13MHz crystal.

Tetris still runs perfectly, WozMon runs perfectly, TinyBASIC runs intermittently.

Mandelbrot runs for several "loops" but crashes to black screen when trying to render the 24th scan line from the bottom of the screen.

At 12MHz all is perfectly stable. I think Marcel has the advantage of better power distribution and signal integrity on his 4 layer board.

Strangely enough the system is more stable when neither the instruction register nor the data register, U8, U9 has been upgraded from the standard 74HCT273.

I then realised that my accumulator register had not been upgraded to 74F377. And so another trip to Ebay.....


Re: 10MHz, 12.5MHz and Beyond!

Posted: 09 May 2019, 17:44
by marcelk
If I take a step back, the way I look at it now is that a "Gigatron" is something that could have been possible in a "more or less similar" form with the technology of the era (excluding microprocessors and other high-integrated logic). If we look at the 6.25 MHz design, the CPU logic was genuinely possible using common 74LS chips, and the RAM was possible with a row of 55ns or 70ns Intel 2147 chips. In the kit edition we put 74HCT components for reduced power consumption over the USB port. And we use a higher-integrated 62256 RAM out of availability considerations (and that's ok, it's still a bit more about the CPU after all). In the kit we're also using a relatively large and fast EPROM for the same reasons. We wouldn't need the full 64K in 1979 as 8K is sufficient for a video/sound/vCPU loop and BASIC. But back in the day we would have needed some program memory fast enough to keep up with the 160ns duty cycle. Contemporary EPROMs were probably still much too slow. But the assumption is that a matching solution can be found for that, worst-case by using RAM chips there as well.

So with a tongue-in-the cheek we can still say it's a kind of "lost" late-1970s design that was "found back" in 2017.

If we find ourselves in 1979 doing this, the video loop will generate TV signals of course (PAL or NTSC). Probably we run vCPU in alternating video frames instead of every few scanlines. Such a primal incarnation also creates a higher horizontal resolution because of the slower electron beam sweep. Today's software generates VGA and pays for that with the 160 pixel horizontal resolution. In 1979 we would add some tiling capability to the video circuit and/or reduce the number of colors, so we can do with less than 32K of RAM. We would halve the ROM width for the region only used for program storage, because the instruction byte repeats every 256 bytes.

From the 12-15 MHz results with 1979s 74F logic and a 1990s (2000s?) 10ns SRAM we learnt that the 74F logic eats 1000/15-10 = 57ns out of the duty cycle budget. So we can reasonably say that the two-stage pipeline design can be pushed to 1000/(57+55) = 8.9 MHz using parts commercially available by the end of the 1970s.

I think the point is, for 9 MHz and above, we start to depend on RAM with specs from a much later era. Of course some of the above is a bit of handwaving. But if it's approximately correct and if we want to push for higher clock rates, I prefer to adjust the Gigatron architecture instead: decouple the RAM from the rest of the logic, turn it into a three-stage pipeline design, and keep the 55ns RAM in. Coincidently, three-stage pipelines is what the original RISC architectures proposed. We then have a path to roughly 1000/57 = 17.5 MHz. 16 MHz is a bit more realistic, but still a 1.8x speedup. I haven't figured out precisely what it takes in additional chips to buffer the control signals for the 3rd stage, but hopefully no more than two 8-bit registers can do the trick, so the design goal of staying under 40 logic chips can still be met. [Edit: Or maybe do away with some of the control signal buffering. After all, we can also live with the branch delay slot...]

Just a rambling mind, not the announcement of a yet another new subproject...