microtron

Lerc · Post by **Lerc** » 27 Jul 2020, 21:07

steve wrote: ↑27 Jul 2020, 17:30 I'm not sure I got this point (what exactly to OR and for what reason), but let me say that I would probably prefer suggestions to remove chips than to add if possible!

The OR combining would be for using two 8 bit inputs instead of nibbles on Load W. It would let you do operations where bits in the inputs would influence bits in the output further away.

It would allow things like Logical rotate A Left B times. It adds no capability as such but has the potential to vastly increase the speed of some operations (like swap nibbles). So if you are aiming for absolute minimum chipcount then skip it.

steve wrote: ↑27 Jul 2020, 17:30 Many of them could be combined, but I'm trying to avoid it since if put decoders before the IR registers they would even require one more of them. So the optimal solution would be to put them after the IR registers (they might even allow to save one IR chip, I think saving two would be difficult but if possible we can even have back the single ROM design), but in this case they will introduce also delays (as in the current gigatron implementation anyway).

Yeah, there looks to be scope for a 3->8 decoder (BUS) and a dual 2->4 decoder (vga-color/vga-sync/LEDs/expansion) (MAU low) That only saves 5 existing lines. So more potential to add stuff but not quite enough to save a IR chip. Additionally, losing parallel register load would save a IR chip but cost a decoder. I'd be more inclined to try and find uses for the saved lines.

Would the delays of the decoders be much of an issue? I was under the impression most of the speed limit was in ROM access time. The pathway delays of a couple of decoders aren't that huge.

steve · Post by **steve** » 27 Jul 2020, 22:34

Lerc wrote: ↑27 Jul 2020, 21:07 The OR combining would be for using two 8 bit inputs instead of nibbles on Load W. It would let you do operations where bits in the inputs would influence bits in the output further away.
It would allow things like Logical rotate A Left B times. It adds no capability as such but has the potential to vastly increase the speed of some operations (like swap nibbles). So if you are aiming for absolute minimum chipcount then skip it.

Got it!

I will think about it, even if at the moment seems to me that "the game might not be worth the candle" as we say in Italy

Lerc wrote: ↑27 Jul 2020, 21:07 Yeah, there looks to be scope for a 3->8 decoder (BUS) and a dual 2->4 decoder (vga-color/vga-sync/LEDs/expansion) (MAU low) That only saves 5 existing lines. So more potential to add stuff but not quite enough to save a IR chip. Additionally, losing parallel register load would save a IR chip but cost a decoder. I'd be more inclined to try and find uses for the saved lines. Would the delays of the decoders be much of an issue? I was under the impression most of the speed limit was in ROM access time. The pathway delays of a couple of decoders aren't that huge.

ROM troughput for sure cannot be exceeded, but since the pipelining this latency is isolated from others.
The decoders adds some tents on ns of delays, but this sum up with the rest of the circuits involved.

Willing to use decoders (with some more limits on the overclock) I was trying to reason about an 8 bit encoding of the full instruction set: it might be possible in some way, but before going further in the reasoning I think is a good idea to stabilize the ALU and overall design.

For example regarding the expansion bus in fact we might "steal" another bit from the AuxKeyboardIn and implement with it a SPI interface with just the output expansion register (so removing the optional input expansion register, and even have free bits on the output). I think this can be a good idea, standard, quite easy to implement and with a lot of potential to interface as for expansions.

Post by **at67** » 30 Jul 2020, 04:36

steve wrote: ↑22 Jul 2020, 19:16 I’m sorry for this long post, but I hope that some of you can reach the end of it appreciating the synthesis I did of more than one year of scattered thinking of the various solutions…

There is a lot to digest here and without fully simulating/implementing some of these ideas I can only provide some abstract thoughts and opinions to the discussion.

With any new hardware/firmware in the Gigatron Eco system, I always ask myself the following:
1: Is it backwards compatible with current hardware/peripherals?
2: Is it Native code compatible with current firmware ROM's.
3: Is it vCPU and 6502 compatible with current application software.
4: If it is none of the above, what is it's end goal? i.e. (purely for personal satisfaction, an evolution of the Gigatron to something else, etc).

*Note* IMHO there is no right, wrong or mandatory path that must be followed when making design decisions in a project of this scope; by it's very nature and core it is all about motivation, learning, experimentation and the realisation of a vision. If others obtain some sort of knowledge, value or experience out of your journey, then I always consider that a bonus.

So keeping these questions and ideas in mind will make it easier to understand my thoughts.

steve wrote: ↑22 Jul 2020, 19:16 **Introductory notes**
Some of these changes retain full "retro compatibility", but in general the idea is to simplify even more the design trying to remain compatible at vCPU level, keeping the 70s philosophy and the “software can replace hardware” approach of the project (in fact in few places extending this concept)!

Remaining compatible at the vCPU level obviously allows all current vCPU software, (which is most of the applications in the repo), to be migrated to new hardware; I personally think this is an excellent starting point. IMHO, once you give up on vCPU compatibility you give up on being part of the Gigatron's Software Eco system and become something else. This is not right/wrong or good/bad, it's just different; someone might want to produce an end product following a completely different tangent but based on Marcel's and Walter's original design philosophy.

steve wrote: ↑22 Jul 2020, 19:16 **Removal 2:1 multiplexers for Y**
Since high byte of RAM address can be 0 or Y, the same effect of the double 2:1 multiplexers can be obtained using a 374 chip for Y register in combination with pull-down resistors, simply routing the multiplexer selection to Y output enable

This seems like a good optimisation, it would need to be verified from a timing perspective though; i.e. using OE to tri-state the 374's outputs to either pull up or pull down resistors would need to meet the setup time of whatever SRAM device you choose.

steve wrote: ↑22 Jul 2020, 19:16 This change should be fully usable also with the "standard" gigatron, saving 2 chips on the total count.

I'm not following here, do you mean for the X register as well, as you explain further down?

steve wrote: ↑22 Jul 2020, 19:16 **Removal 2:1 multiplexers for X**
Without the 2:1 multiplexer on X we lose the possibility of a direct memory addressing from ROM, that will require one additional instruction to load X. One instruction more where required but two chip less in general, seems a good tradeoff and fully aligned with gigatron ideas!

This will probably require a non-trivial rewrite of the firmware, (which I assume you have already decided to do), the non-trivial parts mostly involve timing considerations for the sampling/generation of Input/Audio/Video and for the vCPU interpreter that is interleaved within the Input/Audio/Video loop.

e.g. Currently the vCPU instructions have a maximum of 28 clock cycles before they can no longer be dispatched and executed within the interpreter's tight requirements and a lot of vCPU instructions are already at the limit, (ADDW, SUBW, LSLW, DEEK, POKE and many more). So if your changes require these instructions to be re-written in Native code, (and some of them will), what happens when they break the maximum 28 clock cycle limit? You would probably have to completely re-design and re-write the entire vCPU interpreter, (breaking up instructions into individual packets, fetch, decode, execute, etc). Currently the vCPU instructions are implemented as a LUT that spans multiple 256 byte ROM pages, (some of the smaller instructions implement a 2 page jump to allow more room in the first ROM vCPU instruction page for new instructions, see this thread https://forum.gigatron.io/viewtopic.php?f=4&t=136 as to how Marcel made room for CALLI, CMPHS and CMPHU).

steve wrote: ↑22 Jul 2020, 19:16 The pull-down resistors can be applied also here giving the possibility to address directly the 0 address (might be used for example in horizontal scrolling games for pixel rows displacement).

There is a video indirection table located at 0x0100 to 0x01EF that contains two byte pointers for every scan line, they use a differential system for the least significant byte, (effectively the horizontal scroll register for that scan line, X byte), and an absolute value for the most significant byte, (effectively the vertical scroll register for that scan line, Y byte).

Code: Select all

- To horizontally scroll the entire screen you only need to modify the byte location 0x0101.
- Changing any X byte, (bytes at odd addresses between 0x0101 and 0x01EF), gets you free horizontal scrolling for all
scan lines coming after the scan line you modified because of the differential X system.
- To horizontal scroll one scan line you would need to modify an X byte appropriately and apply the negative of that
change to the next scan line's X byte.
- To vertically scroll the entire screen you need to modify all the byte locations at even addresses starting at 0x0100
and ending at 0x01EE in a loop, i.e. there is no differential system for the Y bytes.
- You can obtain more RAM for code and data by duplicating scan-lines, (I do this in PucMon), i.e. I set scan line 2 to
a particular pattern and then point scan-lines 0, and 1 to scan-line 2's memory, set scan-line 117 to another particular
pattern and then point 118 and 119 to 117's memory. This effectively frees an extra 4*160 bytes for code and data
and gives me a chunky border look around the PucMon playfield, (with the vertical sections of the border costing 1/3
the memory).
- You can perform seriously fancy screen wipe, scanline, interleaving effects using these X and Y registers: I'll release
a gtBASIC demo of some of the things you can do with the video indirection table when I get a chance.

steve wrote: ↑22 Jul 2020, 19:16 Note that with the easy addressing, [0,0] can be used as an additional auxiliary register!

I'm assuming you mean a zero register for easy access in code? If so Marcel already initialises 0x0000 to 0 and 0x0080 to 1, the firmware uses these, (vCPU can as well), as constants but also as a simple LUT for converting sign bits to false/true

steve wrote: ↑22 Jul 2020, 19:16 **Increasing the clock frequency**
Increasing the clock frequency would minimize the effect of some simplification of the hw side that requires some additional instruction on the sw one.

I agree it most likely would, but because ROMv3y shows the limitations of a synchronously coupled system, (i.e. the video timing to the base clock rate), I think you would find it difficult to produce a satisfactory display using the current video system logic on anything other than 6.25Mhz, (satisfactory is of course subjective, but I mean clean signals, stabilised syncs on the majority of VGA monitors, correct handling of underscan/overscan/centering, etc).

A high res mode that sacrifices colour fidelity for spatial resolution but effectively keeps the same video timing would work, (as outlined in the hires video thread), but the only real way, (IMHO), to allow for proper video timing at any base clock rate is to completely decouple the video timing logic from the Native code, (this is something that I am currently working on). Not only would this free up the native code from video timing constraints, (to some extent), but it would open up a whole world of extra capabilities in colour and resolution depths, even allowing for paletted modes, e.g. 256/4096 and still staying backward compatible with the current vram/pixel layout and video timing.

steve wrote: ↑22 Jul 2020, 19:16 Note that to keep compliance with VGA pixel frequency of 25.175MHz, higher usable clocks (of the 6.29375MHz /4 one) should: 12.5875MHz or 8.3917MHz, (respectively /2 and /3). And surely whatever multiple of 6.29375MHz or the others can also fit.

This could work, but would once again would not be trivial to implement in the firmware, (you would effectively have multiple video generation loops, which there already is for the different scanline modes), and the generation of the clocks might be more difficult than you think. e.g. Esoteric clock frequencies can be hard to generate/obtain, you can have bespoke crystals cut to whatever frequency you desire, (used to be expensive and I have no idea if this service is still offered to the general consumer), or generate clocks using divisors and state machines but then you can get asymmetrical waveforms if you are not careful or quickly have your chip count balloon out.

steve wrote: ↑22 Jul 2020, 19:16 **Including a keyboard**
With a 32 keys (4x8) matrix keyboard, blinker led lines might be used for keyboard row signals and should not interfere much with led blinking. For the columns a specific buffer chip should be used (de facto replacing the current serial chip).

Wattsekunde prototyped and built a matrix keyboard style interface here that you might find interesting:
https://forum.gigatron.io/viewtopic.php?f=4&t=5
https://forum.gigatron.io/viewtopic.php?f=4&t=39
Also this as to how/why PS2 was chosen as a keyboard interface:
https://forum.gigatron.io/viewtopic.php?f=4&t=4

steve wrote: ↑22 Jul 2020, 19:16 **Minimizing Program Counter**
Design idea: the most significant byte of the register can be incremented at the end of the page with an unconditional fixed jump to the address 0 of the next page. This will allow the use of a standard flip-flop chip instead of the double 161 incrementer. Benefits: one chip less and shorter carry propagation; downside: one/two ROM words per page used to jump, with average even slightly more for the cases which Temp/Flag register need to be saved and/or restored. Note that this limitation applies just to native instructions and not vCPU ones, and just to code that go multipage (very limited if at all existing even in the current implementation).

The 161 counters aren't just incrementers, they are also pretty nifty presetable, cascadable and resetable counters that allow the Native code far jump instruction to exist, (jmp Y,D). The Native code uses the far jump instruction in some critical places, especially for the vCPU and 6502 instruction dispatch, SYS calls, etc; unless I am missing something, replacing the most significant byte of the PC with an auto incrementer would cause you some serious Native coding grief and require a complete re-write of the firmware and probably require a completely different vCPU instruction set and implementation, (i.e. I don't see how you could produce the same vCPU feature set without a Native code far jump).

steve wrote: ↑22 Jul 2020, 19:16 **Minimizing the Control Unit**
Design idea: CU Signal "unrolling", using two ROMs, and putting into the ROMs the “already decoded” instructions signals (24 bits for signals and 8 for data).

Other than removing some chips, CU decoding logic removal (together with ALU described later) is also shortening delay paths increasing the possibility for higher clock frequency compatibility. Another benefit is having the possibility to specify all the parallel activity that might be needed by the instruction since they are stored separately.

A lot of people originally thought the entire CPU circuit of the Gigatron was a ROM LUT, because of the size and modern implementation of the 64Kx16 ROM, not realising that it is purely a storage device for Native code and data for the Harvard architecture that the Gigatron implements. To me it seems the current ROM was purely chosen based on price, accessibility and flexibility for a simple and 70/80's theme upgrade path. The problem with putting any part of the control unit in ROM, is speed. If you're going to remain true to the ethos of using only chips from roughly that era, there is no way you can find any kind of ROM that would meet your timing requirements at 6.25Mhz. But if you decide that using a modern ROM, PAL, GAL, CPLD, FPGA etc, then all bets are off and you will find it trivial to implement as much or as little of the control logic in modern devices as you like.

steve wrote: ↑22 Jul 2020, 19:16 **Minimizing the Arithmetic Logic Unit**
The proposal is to use loads and lookup tables instead of ALU dedicated chips for operations, but remain efficient (and having even all conditional signals) with a purposefully designed architecture composed of an "expanded" MAU, two "mixing" registers, one temp/"flag" register and one to combine nibbles results. For the comprehensive details and some code examples please have a look at the overall picture later on.

Where would the LUT's exist? In ROM you have the complexity of the Harvard architecture making ROM lookup and ROM LUT's non trivial and inefficient, (the Gigatron's Harvard architecture always has a 1:1 instruction:data pair). In RAM you get the benefits of software control and configuration of your actual ALU, but RAM is already a scarce and fragmented commodity, (I do like the RAM idea though).

steve wrote: ↑22 Jul 2020, 19:16 This way of operating might be a bit slower in doing logical and arithmetic operations, but it can cover in the same efficient way whatever operation you would need, and having more registers (and also flags) can be even faster in some operations. And surely is fully in line with the gigatron philosophy of using software instead of hardware!

I think it would be significantly slower at the Native code level if it was all done in software without hardware assistance, e.g. current hardware ALU operations take 1 clock cycle, if implemented at the software level they would I guess be at least an order of magnitude slower as you would have to software decode the ALU operation, create a LUT address out of operands, fetch the result from the RAM/ROM LUT and then write it to your destination, (ROM would be significantly slower/inefficient than RAM).

steve wrote: ↑22 Jul 2020, 19:16 **Ending**
I have experience in microprocessor architectures and I've realized various assembler applications, but as for electronics, this would be my first "not trivial" realization! Any feedback, suggestion, or further improvement is more than welcome!

I'm glad to know you managed to read till here; hope you found it interesting!

In the meantime, happy hacking!
_Stefano

It's a non trivial undertaking that you have set yourself upon and a truly formidable firmware update if you plan to implement so much of it in software, I think most of your ideas are based on a solid foundation and are absolutely doable with varying degrees of difficulty.

The success of your project is completely up to you, if it was me approaching this magnitude of change I would start off small and forget about the big picture, (but the first thing I would ask myself is how much compatibility I would want with the current Gigatron at all levels). Once I answered that question I would then attack the problems from small/easy to large/difficult in that order.

I have no doubt that you can optimise the current Gigatron design in a myriad of ways whilst remaining true to it's original design principals, good luck and keep this thread updated with your ideas, successes and failures; as all of it is part of the learning journey.

alastair · Post by **alastair** » 30 Jul 2020, 15:55

at67 wrote: ↑30 Jul 2020, 04:36 It's a non trivial undertaking that you have set yourself upon and a truly formidable firmware update.

I started a similar TTL computer project 16 months ago. The hardware is done and working great, but most of that time has been working on the firmware. I'm about a year in and just starting to see light at the end of the tunnel. I knew what I was in getting myself in to, but as at67 stated... it is truly formidable!

I'm also using lookup tables for the ALU. It is Harvard Architecture, but I have a fetch and execute context on the program/ROM side. The fetch and immediate load uses the ROM in the program context and the execute uses the ROM in the ALU context (my instructions need at least two cycles to complete). There is a tradeoff though. Even though you eliminate the ALU chips, you will need additional logic, state, and pipelining to make it work.

I got my CPU design to 22 chips and a simple PAL. The PAL replaces about 6 TTL chips, so if you count those you get to 28 vs 32 chips for the Gigatron CPU. However, when you compare the gate count, I'm at 935 vs the Gigatron 930. The Gigatron is already an optimum design, so you really can't reduce it anymore and maintain the same functionality.

I don't want to discourage your design, just give some context

This is a fun and challenging field. If I was to sum it up I would say this is a huge software project with a tiny bit of hardware at the beginning. Good luck!

steve · Post by **steve** » 03 Aug 2020, 17:30

at67 wrote: ↑30 Jul 2020, 04:36 There is a lot to digest here and without fully simulating/implementing some of these ideas I can only provide some abstract thoughts and opinions to the discussion.

Dear at67, first of all let me thank you very much for your feedback. As you know your wonderful PucMon implementation was for me the push to write my thoughts!

at67 wrote: ↑30 Jul 2020, 04:36 With any new hardware/firmware in the Gigatron Eco system, I always ask myself the following:
1: Is it backwards compatible with current hardware/peripherals?
2: Is it Native code compatible with current firmware ROM's.
3: Is it vCPU and 6502 compatible with current application software.
4: If it is none of the above, what is it's end goal? i.e. (purely for personal satisfaction, an evolution of the Gigatron to something else, etc).

As you've seen I proposed a number of changes that came to me analyzing each section of the schematics.

Few of them are in the first category (and also compatible with standard firmware ROM), and if they will work, implementing them you'll have a gigatron that works in the same way as a standard one, just with some chips less (e.g. 29 instead of 32). In particular, they are the removal of the 2 multiplexer chips on Y, and then using a single register instead of the 2 counters for the high byte of the PC. Also the 64K RAM fits in this category, and this has already been tested.

Some of them instead require some, let's say minor (especially compared to the third category ones), changes on the ROM code as for example the VGA color increase or using an embedded matrix keyboard. Also the somewhat already prototyped clock speed increase fits here. Fits here also the removal of the decoding logic, but in my opinion it doesn't make sense to do it without also ALU simplification since the ALU will anyway remain the longest path for the clock speed increase.

All the rest instead fits in the third category, meaning that they will allow to reuse the vCPU software (and for me a working PucMon would be the target!! LOL

) and - as you also pointed out - "being part of the Gigatron's Software Eco system"!

Also the difficulty of implementation increase, with the one of the first category the easier ones.

at67 wrote: ↑30 Jul 2020, 04:36
steve wrote: ↑22 Jul 2020, 19:16 **Removal 2:1 multiplexers for Y**
Since high byte of RAM address can be 0 or Y, the same effect of the double 2:1 multiplexers can be obtained using a 374 chip for Y register in combination with pull-down resistors, simply routing the multiplexer selection to Y output enable
This seems like a good optimisation, it would need to be verified from a timing perspective though; i.e. using OE to tri-state the 374's outputs to either pull up or pull down resistors would need to meet the setup time of whatever SRAM device you choose.

As said it should be tested, but personally hope and think that the pull-down circuit will be able to be at least "on par" with the delay that was there from the two removed 74157.

at67 wrote: ↑30 Jul 2020, 04:36
steve wrote: ↑22 Jul 2020, 19:16 **Removal 2:1 multiplexers for X**
Without the 2:1 multiplexer on X we lose the possibility of a direct memory addressing from ROM, that will require one additional instruction to load X. One instruction more where required but two chip less in general, seems a good tradeoff and fully aligned with gigatron ideas!
This will probably require a non-trivial rewrite of the firmware, (which I assume you have already decided to do), the non-trivial parts mostly involve timing considerations for the sampling/generation of Input/Audio/Video and for the vCPU interpreter that is interleaved within the Input/Audio/Video loop.

Yes, all the "third category" changes unfortunately need an absolutely non-trivial rewrite of the ROM code.

at67 wrote: ↑30 Jul 2020, 04:36 e.g. Currently the vCPU instructions have a maximum of 28 clock cycles before they can no longer be dispatched and executed within the interpreter's tight requirements and a lot of vCPU instructions are already at the limit, (ADDW, SUBW, LSLW, DEEK, POKE and many more). So if your changes require these instructions to be re-written in Native code, (and some of them will), what happens when they break the maximum 28 clock cycle limit? You would probably have to completely re-design and re-write the entire vCPU interpreter, (breaking up instructions into individual packets, fetch, decode, execute, etc). Currently the vCPU instructions are implemented as a LUT that spans multiple 256 byte ROM pages, (some of the smaller instructions implement a 2 page jump to allow more room in the first ROM vCPU instruction page for new instructions, see this thread https://forum.gigatron.io/viewtopic.php?f=4&t=136 as to how Marcel made room for CALLI, CMPHS and CMPHU).

I'm fully aware of the requirements since I already analyzed at least at high level the vcode interpreter and video loop.

The real limit is on the third level of changes, and in particular the implementation of the arithmetical operations without ALU. I've done some checks (e.g. vCPU instruction frequency and rough porting of some parts of the interpreter loop) and in my opinion a double clock frequency should compensate the number of additional cycles required.

at67 wrote: ↑30 Jul 2020, 04:36
steve wrote: ↑22 Jul 2020, 19:16 The pull-down resistors can be applied also here giving the possibility to address directly the 0 address (might be used for example in horizontal scrolling games for pixel rows displacement).
There is a video indirection table located at 0x0100 to 0x01EF that contains two byte pointers for every scan line, they use a differential system for the least significant byte, (effectively the horizontal scroll register for that scan line, X byte), and an absolute value for the most significant byte, (effectively the vertical scroll register for that scan line, Y byte).

You're right, this use case was already addressed (and all the optimization you've found for PucMon are impressive!), but I'm pretty sure that being able to address the "0 address" of every page directly (as for the zero page overall) will find a nice utilization

at67 wrote: ↑30 Jul 2020, 04:36
steve wrote: ↑22 Jul 2020, 19:16 **Increasing the clock frequency**
Increasing the clock frequency would minimize the effect of some simplification of the hw side that requires some additional instruction on the sw one.
I agree it most likely would, but because ROMv3y shows the limitations of a synchronously coupled system, (i.e. the video timing to the base clock rate), I think you would find it difficult to produce a satisfactory display using the current video system logic on anything other than 6.25Mhz, (satisfactory is of course subjective, but I mean clean signals, stabilised syncs on the majority of VGA monitors, correct handling of underscan/overscan/centering, etc).

A high res mode that sacrifices colour fidelity for spatial resolution but effectively keeps the same video timing would work, (as outlined in the hires video thread), but the only real way, (IMHO), to allow for proper video timing at any base clock rate is to completely decouple the video timing logic from the Native code, (this is something that I am currently working on). Not only would this free up the native code from video timing constraints, (to some extent), but it would open up a whole world of extra capabilities in colour and resolution depths, even allowing for paletted modes, e.g. 256/4096 and still staying backward compatible with the current vram/pixel layout and video timing.

This seems a very very interesting news!!
The continuos need to couple screen visualization with (v)code execution is surely one of the most complex part addressed by the ROM. Solving it would render much easier the rewriting task!
I hope you'll share some more details on them, looking forward to more news!!

at67 wrote: ↑30 Jul 2020, 04:36
steve wrote: ↑22 Jul 2020, 19:16 Note that to keep compliance with VGA pixel frequency of 25.175MHz, higher usable clocks (of the 6.29375MHz /4 one) should: 12.5875MHz or 8.3917MHz, (respectively /2 and /3). And surely whatever multiple of 6.29375MHz or the others can also fit.
This could work, but would once again would not be trivial to implement in the firmware, (you would effectively have multiple video generation loops, which there already is for the different scanline modes), and the generation of the clocks might be more difficult than you think. e.g. Esoteric clock frequencies can be hard to generate/obtain, you can have bespoke crystals cut to whatever frequency you desire, (used to be expensive and I have no idea if this service is still offered to the general consumer), or generate clocks using divisors and state machines but then you can get asymmetrical waveforms if you are not careful or quickly have your chip count balloon out.

I mean using just one of them - an easily found one - and adapt the code to it (as for example in the current implementation the slight clock difference has been compensated by code).

at67 wrote: ↑30 Jul 2020, 04:36
steve wrote: ↑22 Jul 2020, 19:16 **Including a keyboard**
With a 32 keys (4x8) matrix keyboard, blinker led lines might be used for keyboard row signals and should not interfere much with led blinking. For the columns a specific buffer chip should be used (de facto replacing the current serial chip).
Wattsekunde prototyped and built a matrix keyboard style interface here that you might find interesting:
https://forum.gigatron.io/viewtopic.php?f=4&t=5
https://forum.gigatron.io/viewtopic.php?f=4&t=39
Also this as to how/why PS2 was chosen as a keyboard interface:
https://forum.gigatron.io/viewtopic.php?f=4&t=4

Didn't see the matrix keyboard posts. Seems he worked the same route that I was willing to go. I've to read them better.

PS: looking at the photos seems Marcel was using a CherryML based keyboard (I also like the low profile switches!

)

at67 wrote: ↑30 Jul 2020, 04:36
steve wrote: ↑22 Jul 2020, 19:16 **Minimizing Program Counter**
Design idea: the most significant byte of the register can be incremented at the end of the page with an unconditional fixed jump to the address 0 of the next page. This will allow the use of a standard flip-flop chip instead of the double 161 incrementer. Benefits: one chip less and shorter carry propagation; downside: one/two ROM words per page used to jump, with average even slightly more for the cases which Temp/Flag register need to be saved and/or restored. Note that this limitation applies just to native instructions and not vCPU ones, and just to code that go multipage (very limited if at all existing even in the current implementation).
The 161 counters aren't just incrementers, they are also pretty nifty presetable, cascadable and resetable counters that allow the Native code far jump instruction to exist, (jmp Y,D). The Native code uses the far jump instruction in some critical places, especially for the vCPU and 6502 instruction dispatch, SYS calls, etc; unless I am missing something, replacing the most significant byte of the PC with an auto incrementer would cause you some serious Native coding grief and require a complete re-write of the firmware and probably require a completely different vCPU instruction set and implementation, (i.e. I don't see how you could produce the same vCPU feature set without a Native code far jump).

Seems here I was not enough clear. Changing the two 74161 with a single 74273 loses just the automatic increment feature. The possibility to preset them (to execute long jumps Y,D - in my case T,D but is the same), and reset them (during initialization) are both kept!

The only feature lost would be used when the PC would reach the end of a page (PC low byte equal to 255) and there pipelined instruction is not a jump. In this case the "standard" gigatron would go to the address 0 of the next page while the one with the change would go to the address 0 of the same page. This "feature" should be taken care just on native ROM code (not vCPU one).

From all the ROM code I've seen I anyway didn't find any place where the removal of the feature have an impact (meaning that the code was staying intra-page or moving to other pages with long jumps). If everyone knows some places that move between pages without long jumps please share them!

at67 wrote: ↑30 Jul 2020, 04:36
steve wrote: ↑22 Jul 2020, 19:16 **Minimizing the Control Unit**
Design idea: CU Signal "unrolling", using two ROMs, and putting into the ROMs the “already decoded” instructions signals (24 bits for signals and 8 for data).

Other than removing some chips, CU decoding logic removal (together with ALU described later) is also shortening delay paths increasing the possibility for higher clock frequency compatibility. Another benefit is having the possibility to specify all the parallel activity that might be needed by the instruction since they are stored separately.
A lot of people originally thought the entire CPU circuit of the Gigatron was a ROM LUT, because of the size and modern implementation of the 64Kx16 ROM, not realising that it is purely a storage device for Native code and data for the Harvard architecture that the Gigatron implements. To me it seems the current ROM was purely chosen based on price, accessibility and flexibility for a simple and 70/80's theme upgrade path. The problem with putting any part of the control unit in ROM, is speed. If you're going to remain true to the ethos of using only chips from roughly that era, there is no way you can find any kind of ROM that would meet your timing requirements at 6.25Mhz. But if you decide that using a modern ROM, PAL, GAL, CPLD, FPGA etc, then all bets are off and you will find it trivial to implement as much or as little of the control logic in modern devices as you like.

No PAL/GAL/CPLD/FPGA. I was thinking to use similar ROM as the one used to overclock the standard Gigatron.

at67 wrote: ↑30 Jul 2020, 04:36
steve wrote: ↑22 Jul 2020, 19:16 **Minimizing the Arithmetic Logic Unit**
The proposal is to use loads and lookup tables instead of ALU dedicated chips for operations, but remain efficient (and having even all conditional signals) with a purposefully designed architecture composed of an "expanded" MAU, two "mixing" registers, one temp/"flag" register and one to combine nibbles results. For the comprehensive details and some code examples please have a look at the overall picture later on.
Where would the LUT's exist? In ROM you have the complexity of the Harvard architecture making ROM lookup and ROM LUT's non trivial and inefficient, (the Gigatron's Harvard architecture always has a 1:1 instruction:data pair). In RAM you get the benefits of software control and configuration of your actual ALU, but RAM is already a scarce and fragmented commodity, (I do like the RAM idea though).

LUT can exists both in RAM and in ROM, and it can be chosen based on convenience.

For the ROM ones the pipelining architecture allow - exactly as in the standard gigatron - the nice trick to jump to addresses where there are just list of loads and come back after just a single load with the pipelined jump to next instruction.

And I've seen that LUTs ar pretty extensively used in the ROM. But this trick requires the jump instructions and table to be in the same ROM page and since most of the logic and arithmetic tables are 256 bytes long (to allow two full nibbles operation result lookup) unfortunately the ROM trick can't be used (at least not easily).

at67 wrote: ↑30 Jul 2020, 04:36
steve wrote: ↑22 Jul 2020, 19:16 This way of operating might be a bit slower in doing logical and arithmetic operations, but it can cover in the same efficient way whatever operation you would need, and having more registers (and also flags) can be even faster in some operations. And surely is fully in line with the gigatron philosophy of using software instead of hardware!
I think it would be significantly slower at the Native code level if it was all done in software without hardware assistance, e.g. current hardware ALU operations take 1 clock cycle, if implemented at the software level they would I guess be at least an order of magnitude slower as you would have to software decode the ALU operation, create a LUT address out of operands, fetch the result from the RAM/ROM LUT and then write it to your destination, (ROM would be significantly slower/inefficient than RAM).

Since the code frequency and efficiency of managing nibbles LUT together with the carry, zero and sign flags ready to be used, I estimated that overall instructions will be doubled and this can be met with more streamlined and shorter delay paths allowing doubling the clock with stability.

If not possible to double the clock a more "retro" line display configuration should else be applied (not sure if one yes and one not might suffice or one yes three not would be required).

It's true that the full ALU would be faster, but the essential gigatron philosophy is all about replacing hardware with software!!

Anyway as said I'm open to new ALU design (and any other ideas) if they can be more efficient with same hardware or same efficient with less hardware!

alastair wrote: ↑30 Jul 2020, 15:55 I started a similar TTL computer project 16 months ago. The hardware is done and working great, but most of that time has been working on the firmware. I'm about a year in and just starting to see light at the end of the tunnel. ...
I'm also using lookup tables for the ALU. It is Harvard Architecture, but I have a fetch and execute context on the program/ROM side. The fetch and immediate load uses the ROM in the program context and the execute uses the ROM in the ALU context (my instructions need at least two cycles to complete). There is a tradeoff though. Even though you eliminate the ALU chips, you will need additional logic, state, and pipelining to make it work.

Ciao Alastair!

I tryed to search for your schematics but didn't find them.

Curious to see if some more ideas can be borrowed or if and how you made the circuit to optimize LUTs (or even, what youre using LUTs for, infact with a double cycle they might even be used even in the decode instruction).

alastair wrote: ↑30 Jul 2020, 15:55 I got my CPU design to 22 chips and a simple PAL. The PAL replaces about 6 TTL chips, so if you count those you get to 28 vs 32 chips for the Gigatron CPU. However, when you compare the gate count, I'm at 935 vs the Gigatron 930. The Gigatron is already an optimum design, so you really can't reduce it anymore and maintain the same functionality.

I personally like the gigatron design with "plain" circuits (so not PAL, FPGA, CPLD, etc), and I was also glad to be able to suggest removal of the more complex serial 74595. But can also understand that someone want to experiment with different approches and technologies.

Anyway don't know if you can use some of the ideas presented here to diminish even more. E.g. if you're able to apply the first two you should already be able to start from a 29 chips "standard" gigatron.

at67 wrote: ↑30 Jul 2020, 04:36 It's a non trivial undertaking that you have set yourself upon and a truly formidable firmware update if you plan to implement so much of it in software, I think most of your ideas are based on a solid foundation and are absolutely doable with varying degrees of difficulty.
The success of your project is completely up to you, if it was me approaching this magnitude of change I would start off small and forget about the big picture, (but the first thing I would ask myself is how much compatibility I would want with the current Gigatron at all levels). Once I answered that question I would then attack the problems from small/easy to large/difficult in that order.
I have no doubt that you can optimise the current Gigatron design in a myriad of ways whilst remaining true to it's original design principals, good luck and keep this thread updated with your ideas, successes and failures; as all of it is part of the learning journey.

alastair wrote: ↑30 Jul 2020, 15:55 I knew what I was in getting myself in to, but as at67 stated... it is truly formidable!
I don't want to discourage your design, just give some context This is a fun and challenging field. If I was to sum it up I would say this is a huge software project with a tiny bit of hardware at the beginning. Good luck!

I'm not sure if/when/how I will implement this project that is surely interesting.

I know that software part would not be easy at all, but in my case with my little experience the hardware part scares me the same or more!

As also suggested, I would start deploying a "standard" gigatron and applying changes one at at time, starting from the fully compatible, then the "minor ROM changes" to finish with the major ones!

If anyone in the meantime will test some of the changes (e.g. the one of the first tier) would be very interesting to see the results.

And if someone is willing to go deeper and even collaborate I would be glad to share the experience. In Turin or Milan might be even in person, but also remotely might work, especially in this covid times.

Let's see. I'll keep you all updated.

In the meantime as usual,
Happy hacking!

alastair · Post by **alastair** » 04 Aug 2020, 03:42

steve wrote: ↑03 Aug 2020, 17:30 Ciao Alastair!

I tryed to search for your schematics but didn't find them.

Ciao Steve!

I have a project page on Hackaday. There's also a Git Repo with the current build tools. The ALU lookup tables are built with this Ruby script (outputs Intel HEX). It's still a work-in-progress, but I hope to have a completed version by the end of this month.

steve wrote: ↑03 Aug 2020, 17:30 I personally like the gigatron design with "plain" circuits (so not PAL, FPGA, CPLD, etc), and I was also glad to be able to suggest removal of the more complex serial 74595. But can also understand that someone want to experiment with different approches and technologies.

My design also includes a "GPU" rather than bit banging the video. It is really just a DMA controller that works in transparent mode by accessing the memories on the alternate cycle to the CPU. The initial CPU/GPU design needed 50 TTL chips, but I was able to reduce this to 34 (22 CPU, 12 GPU). I decided early on to use one of the original PALs since it fits the late 70's timeframe I'm designing to. A single PAL16R4 (introduced in 1978) was able to replace the 6 TTL chips used in the execution state machine. Another advantage of a programable part here is the ability to modify how the CPU executes instructions without changing the hardware.

steve wrote: ↑03 Aug 2020, 17:30 Anyway don't know if you can use some of the ideas presented here to diminish even more. E.g. if you're able to apply the first two you should already be able to start from a 29 chips "standard" gigatron.

I spotted this optimization as well and ended up using pull-up resistors in four different places: selecting the zero page (X=FF), selecting a boot page (page=FF), selecting an audio sample during blanking (H=FF, V=FF), and overriding the lower part of the instruction register on single cycle fetch.

steve · Post by **steve** » 04 Aug 2020, 21:56

alastair wrote: ↑04 Aug 2020, 03:42 I have a project page on Hackaday. There's also a Git Repo with the current build tools. The ALU lookup tables are built with this Ruby script (outputs Intel HEX). It's still a work-in-progress, but I hope to have a completed version by the end of this month.

I had a look at it and... I was impressed!!!
I would say that comparing gigatron to a Commodore 64 this would be an Amiga!

Regarding LUT anyway I'm not sure about the efficiency of the design you used. I commented directly on the post there for more context.

alastair wrote: ↑04 Aug 2020, 03:42 My design also includes a "GPU" rather than bit banging the video. It is really just a DMA controller that works in transparent mode by accessing the memories on the alternate cycle to the CPU. The initial CPU/GPU design needed 50 TTL chips, but I was able to reduce this to 34 (22 CPU, 12 GPU). I decided early on to use one of the original PALs since it fits the late 70's timeframe I'm designing to. A single PAL16R4 (introduced in 1978) was able to replace the 6 TTL chips used in the execution state machine. Another advantage of a programable part here is the ability to modify how the CPU executes instructions without changing the hardware.

Nice the idea to "time" share the video output, and I think that the barrel computer suggestion of Marcel was appropriate!
It makes more requirements on RAM but since is faster than ROM it is easier to make it cope with that.
I will for sure think about it, but now I'm not sure I can borrow the idea without adding many chips to the solution.

I spotted this optimization as well and ended up using pull-up resistors in four different places: selecting the zero page (X=FF), selecting a boot page (page=FF), selecting an audio sample during blanking (H=FF, V=FF), and overriding the lower part of the instruction register on single cycle fetch.

Good, then seems it works! One off!!
I've seen you used pull-ups instead of pull-downs. Was this because it is faster reacting? And the rationales behind the 1K resistor were the same?

alastair · Post by **alastair** » 04 Aug 2020, 23:19

steve wrote: ↑04 Aug 2020, 21:56 It makes more requirements on RAM but since is faster than ROM it is easier to make it cope with that

The ROM and RAM are used concurrently, so I can only go as fast as the slowest chip. The fastest ROM and RAM chips in the 32-pin DIP package are 55ns, so that's the speed limit. The smaller chips are faster though, but not big enough for my design

steve wrote: ↑04 Aug 2020, 21:56 I've seen you used pull-ups instead of pull-downs. Was this because it is faster reacting? And the rationales behind the 1K resistor were the same?

I remember being taught to use pull up over pull down. I can't remember when though... it must have been a long time ago

It makes sense though. When the buffer comes out of tri-state it has to either source or sink current. It is better at sinking current, so pulling the high (pulled up) state low. When TTL goes in to tri-state it only has to go from 0.5 to 2v to become high, but has to go from 3.2v to 0.8v to go low. Basically, the pull up will be faster.

There's parasitic capacitance, so your pull up/down resistor forms an RC circuit. You'll see numbers in the 15-50pF on the data sheets and this seems about right. The breadboard was around 40pF and you need about 0.5RC to go from low to high. I used 270 ohm resistors to match the 5ns rise time of the logic. The PCB capacitance is a lot better and I relaxed the time to 6ns with 1k. I did the math, but also socketed the resistor networks and played around with different values.

steve · Post by **steve** » 14 Aug 2020, 12:34

I've found a very interesting article on generating full vga signal via a single counter and few other components: https://hackaday.io/project/9782-nes-za ... generation

With the consideration of Alastair to access the graphic memory pages on alternating clock phases, and using two counters as memory address (one for the page, meaning pointing to the row, the other one for the offset, pointing each pixel), and resetting them with sync signals (page counter with Vsync and offset counter with Hsync) it will be possible to have a very important "decoupled" VGA signal generation!

Since X incrementer will not be anymore a requirement, replacing the X buffer with a register, this solution should be able to use the same number of components as of now, plus the ones required by the sync generation (and seems that with a genius design little might be added!)

Surely memory addresses would be changed (eg screen starting at 0 page), but the game would be surely "worth the candle" if technically possible with minimal hw!

And now the current key question: does someone have some idea on how to generate the two sync signals in the "very minimal hw" manner?

@Alastair, there is something that can be reused from your projects?

@At67, is this similar to the circuit you've in mind?

Here I surely need help, since seems too much electronics is involved and I even had difficulties to understand the initially mentioned circuit functioning!

Just as notes:
1. the page counter should use the 8 bits from Q9 to Q2 (not using Q1 and Q0) for RAM page addressing, since each pixels row should be repeated four times (on X axis the repetition is automatically generated by the slower clock beating every 4 pixels)
2. since sync signals should be generated independently from RAM data, full 256 colors can be possible
3. this change would fit in the "second category of microtron changes", meaning fitting also with "standard Gigatron" with ROM changes (a lot of "simplification" work to remove VGA coupling, and managing address changes)

Lerc · Post by **Lerc** » 16 Aug 2020, 21:05

steve wrote: ↑14 Aug 2020, 12:34 I've found a very interesting article on generating full vga signal via a single counter and few other components: https://hackaday.io/project/9782-nes-za ... generation

I must say, that seems like a little bit of genius. I especially like the first comment saying it won't work. A brave choice of words to place under a photo of it working. II would expect it to maybe a little temperamental.

One issue with a self managed video output is you potentially need some feedback for the CPU to know what's going on. When the CPU is driving the display itself it implicitly knows when vblank is etc.

On the other hand if you had it working, you wouldn't necessarily have to maintain a whole video page you could output just one or two scanlines. Have one scanline outputted repeatedly until the CPU has created the contents of the next one. Then you could do things like the current video modes only instead of having blank scanlines you would keep displaying the current line for free. If that provided enough time to generate a scanline for a tiled mode then the world opens up greatly (and if there were any leftover time after that, Sprites!).

As an aside, I've been thinking about world of microcontroller VGA and was looking at running the lines of an SPI RAM out to video for a somewhat similar effect. Using Quad mode, First scanline in write mode (screen gets a copy going past), repeating scanlines in read mode with the microcontroller ignoring the data and it just turning up on screen. It might have a few command pixels visible on the left side of the screen.

Combined with this timing trick above you could just about do a decent picture with an ATTiny and SPI Ram, that would be a thing. (Off topic for gigatron, but cool nonetheless)

Gigatron Hackers

microtron

Re: microtron

Re: microtron

Re: microtron

Re: microtron

Re: microtron

Re: microtron

Re: microtron

Re: microtron

Decoupling VGA signal generation

Re: microtron