steve wrote: ↑22 Jul 2020, 19:16
I’m sorry for this long post, but I hope that some of you can reach the end of it appreciating the synthesis I did of more than one year of scattered thinking of the various solutions…
There is a lot to digest here and without fully simulating/implementing some of these ideas I can only provide some abstract thoughts and opinions to the discussion.
With any new hardware/firmware in the Gigatron Eco system, I always ask myself the following:
1: Is it backwards compatible with current hardware/peripherals?
2: Is it Native code compatible with current firmware ROM's.
3: Is it vCPU and 6502 compatible with current application software.
4: If it is none of the above, what is it's end goal? i.e. (purely for personal satisfaction, an evolution of the Gigatron to something else, etc).
*Note* IMHO there is no right, wrong or mandatory path that must be followed when making design decisions in a project of this scope; by it's very nature and core it is all about motivation, learning, experimentation and the realisation of a vision. If others obtain some sort of knowledge, value or experience out of your journey, then I always consider that a bonus.
So keeping these questions and ideas in mind will make it easier to understand my thoughts.
steve wrote: ↑22 Jul 2020, 19:16
**Introductory notes**
Some of these changes retain full "retro compatibility", but in general the idea is to simplify even more the design trying to remain compatible at vCPU level, keeping the 70s philosophy and the “software can replace hardware” approach of the project (in fact in few places extending this concept)!
Remaining compatible at the vCPU level obviously allows all current vCPU software, (which is most of the applications in the repo), to be migrated to new hardware; I personally think this is an excellent starting point. IMHO, once you give up on vCPU compatibility you give up on being part of the Gigatron's Software Eco system and become something else. This is not right/wrong or good/bad, it's just different; someone might want to produce an end product following a completely different tangent but based on Marcel's and Walter's original design philosophy.
steve wrote: ↑22 Jul 2020, 19:16
**Removal 2:1 multiplexers for Y**
Since high byte of RAM address can be 0 or Y, the same effect of the double 2:1 multiplexers can be obtained using a 374 chip for Y register in combination with pull-down resistors, simply routing the multiplexer selection to Y output enable
This seems like a good optimisation, it would need to be verified from a timing perspective though; i.e. using OE to tri-state the 374's outputs to either pull up or pull down resistors would need to meet the setup time of whatever SRAM device you choose.
steve wrote: ↑22 Jul 2020, 19:16
This change should be fully usable also with the "standard" gigatron, saving 2 chips on the total count.
I'm not following here, do you mean for the X register as well, as you explain further down?
steve wrote: ↑22 Jul 2020, 19:16
**Removal 2:1 multiplexers for X**
Without the 2:1 multiplexer on X we lose the possibility of a direct memory addressing from ROM, that will require one additional instruction to load X. One instruction more where required but two chip less in general, seems a good tradeoff and fully aligned with gigatron ideas!
This will probably require a non-trivial rewrite of the firmware, (which I assume you have already decided to do), the non-trivial parts mostly involve timing considerations for the sampling/generation of Input/Audio/Video and for the vCPU interpreter that is interleaved within the Input/Audio/Video loop.
e.g. Currently the vCPU instructions have a maximum of 28 clock cycles before they can no longer be dispatched and executed within the interpreter's tight requirements and a lot of vCPU instructions are already at the limit, (ADDW, SUBW, LSLW, DEEK, POKE and many more). So if your changes require these instructions to be re-written in Native code, (and some of them will), what happens when they break the maximum 28 clock cycle limit? You would probably have to completely re-design and re-write the entire vCPU interpreter, (breaking up instructions into individual packets, fetch, decode, execute, etc). Currently the vCPU instructions are implemented as a LUT that spans multiple 256 byte ROM pages, (some of the smaller instructions implement a 2 page jump to allow more room in the first ROM vCPU instruction page for new instructions, see this thread
https://forum.gigatron.io/viewtopic.php?f=4&t=136 as to how Marcel made room for CALLI, CMPHS and CMPHU).
steve wrote: ↑22 Jul 2020, 19:16
The pull-down resistors can be applied also here giving the possibility to address directly the 0 address (might be used for example in horizontal scrolling games for pixel rows displacement).
There is a video indirection table located at 0x0100 to 0x01EF that contains two byte pointers for every scan line, they use a differential system for the least significant byte, (effectively the horizontal scroll register for that scan line, X byte), and an absolute value for the most significant byte, (effectively the vertical scroll register for that scan line, Y byte).
Code: Select all
- To horizontally scroll the entire screen you only need to modify the byte location 0x0101.
- Changing any X byte, (bytes at odd addresses between 0x0101 and 0x01EF), gets you free horizontal scrolling for all
scan lines coming after the scan line you modified because of the differential X system.
- To horizontal scroll one scan line you would need to modify an X byte appropriately and apply the negative of that
change to the next scan line's X byte.
- To vertically scroll the entire screen you need to modify all the byte locations at even addresses starting at 0x0100
and ending at 0x01EE in a loop, i.e. there is no differential system for the Y bytes.
- You can obtain more RAM for code and data by duplicating scan-lines, (I do this in PucMon), i.e. I set scan line 2 to
a particular pattern and then point scan-lines 0, and 1 to scan-line 2's memory, set scan-line 117 to another particular
pattern and then point 118 and 119 to 117's memory. This effectively frees an extra 4*160 bytes for code and data
and gives me a chunky border look around the PucMon playfield, (with the vertical sections of the border costing 1/3
the memory).
- You can perform seriously fancy screen wipe, scanline, interleaving effects using these X and Y registers: I'll release
a gtBASIC demo of some of the things you can do with the video indirection table when I get a chance.
steve wrote: ↑22 Jul 2020, 19:16
Note that with the easy addressing, [0,0] can be used as an additional auxiliary register!
I'm assuming you mean a zero register for easy access in code? If so Marcel already initialises 0x0000 to 0 and 0x0080 to 1, the firmware uses these, (vCPU can as well), as constants but also as a simple LUT for converting sign bits to false/true
steve wrote: ↑22 Jul 2020, 19:16
**Increasing the clock frequency**
Increasing the clock frequency would minimize the effect of some simplification of the hw side that requires some additional instruction on the sw one.
I agree it most likely would, but because ROMv3y shows the limitations of a synchronously coupled system, (i.e. the video timing to the base clock rate), I think you would find it difficult to produce a satisfactory display using the current video system logic on anything other than 6.25Mhz, (satisfactory is of course subjective, but I mean clean signals, stabilised syncs on the majority of VGA monitors, correct handling of underscan/overscan/centering, etc).
A high res mode that sacrifices colour fidelity for spatial resolution but effectively keeps the same video timing would work, (as outlined in the hires video thread), but the only real way, (IMHO), to allow for proper video timing at any base clock rate is to completely decouple the video timing logic from the Native code, (this is something that I am currently working on). Not only would this free up the native code from video timing constraints, (to some extent), but it would open up a whole world of extra capabilities in colour and resolution depths, even allowing for paletted modes, e.g. 256/4096 and still staying backward compatible with the current vram/pixel layout and video timing.
steve wrote: ↑22 Jul 2020, 19:16
Note that to keep compliance with VGA pixel frequency of 25.175MHz, higher usable clocks (of the 6.29375MHz /4 one) should: 12.5875MHz or 8.3917MHz, (respectively /2 and /3). And surely whatever multiple of 6.29375MHz or the others can also fit.
This could work, but would once again would not be trivial to implement in the firmware, (you would effectively have multiple video generation loops, which there already is for the different scanline modes), and the generation of the clocks might be more difficult than you think. e.g. Esoteric clock frequencies can be hard to generate/obtain, you can have bespoke crystals cut to whatever frequency you desire, (used to be expensive and I have no idea if this service is still offered to the general consumer), or generate clocks using divisors and state machines but then you can get asymmetrical waveforms if you are not careful or quickly have your chip count balloon out.
steve wrote: ↑22 Jul 2020, 19:16
**Including a keyboard**
With a 32 keys (4x8) matrix keyboard, blinker led lines might be used for keyboard row signals and should not interfere much with led blinking. For the columns a specific buffer chip should be used (de facto replacing the current serial chip).
Wattsekunde prototyped and built a matrix keyboard style interface here that you might find interesting:
https://forum.gigatron.io/viewtopic.php?f=4&t=5
https://forum.gigatron.io/viewtopic.php?f=4&t=39
Also this as to how/why PS2 was chosen as a keyboard interface:
https://forum.gigatron.io/viewtopic.php?f=4&t=4
steve wrote: ↑22 Jul 2020, 19:16
**Minimizing Program Counter**
Design idea: the most significant byte of the register can be incremented at the end of the page with an unconditional fixed jump to the address 0 of the next page. This will allow the use of a standard flip-flop chip instead of the double 161 incrementer. Benefits: one chip less and shorter carry propagation; downside: one/two ROM words per page used to jump, with average even slightly more for the cases which Temp/Flag register need to be saved and/or restored. Note that this limitation applies just to native instructions and not vCPU ones, and just to code that go multipage (very limited if at all existing even in the current implementation).
The 161 counters aren't just incrementers, they are also pretty nifty presetable, cascadable and resetable counters that allow the Native code far jump instruction to exist, (jmp Y,D). The Native code uses the far jump instruction in some critical places, especially for the vCPU and 6502 instruction dispatch, SYS calls, etc; unless I am missing something, replacing the most significant byte of the PC with an auto incrementer would cause you some serious Native coding grief and require a complete re-write of the firmware and probably require a completely different vCPU instruction set and implementation, (i.e. I don't see how you could produce the same vCPU feature set without a Native code far jump).
steve wrote: ↑22 Jul 2020, 19:16
**Minimizing the Control Unit**
Design idea: CU Signal "unrolling", using two ROMs, and putting into the ROMs the “already decoded” instructions signals (24 bits for signals and 8 for data).
Other than removing some chips, CU decoding logic removal (together with ALU described later) is also shortening delay paths increasing the possibility for higher clock frequency compatibility. Another benefit is having the possibility to specify all the parallel activity that might be needed by the instruction since they are stored separately.
A lot of people originally thought the entire CPU circuit of the Gigatron was a ROM LUT, because of the size and modern implementation of the 64Kx16 ROM, not realising that it is purely a storage device for Native code and data for the Harvard architecture that the Gigatron implements. To me it seems the current ROM was purely chosen based on price, accessibility and flexibility for a simple and 70/80's theme upgrade path. The problem with putting any part of the control unit in ROM, is speed. If you're going to remain true to the ethos of using only chips from roughly that era, there is no way you can find any kind of ROM that would meet your timing requirements at 6.25Mhz. But if you decide that using a modern ROM, PAL, GAL, CPLD, FPGA etc, then all bets are off and you will find it trivial to implement as much or as little of the control logic in modern devices as you like.
steve wrote: ↑22 Jul 2020, 19:16
**Minimizing the Arithmetic Logic Unit**
The proposal is to use loads and lookup tables instead of ALU dedicated chips for operations, but remain efficient (and having even all conditional signals) with a purposefully designed architecture composed of an "expanded" MAU, two "mixing" registers, one temp/"flag" register and one to combine nibbles results. For the comprehensive details and some code examples please have a look at the overall picture later on.
Where would the LUT's exist? In ROM you have the complexity of the Harvard architecture making ROM lookup and ROM LUT's non trivial and inefficient, (the Gigatron's Harvard architecture always has a 1:1 instruction:data pair). In RAM you get the benefits of software control and configuration of your actual ALU, but RAM is already a scarce and fragmented commodity, (I do like the RAM idea though).
steve wrote: ↑22 Jul 2020, 19:16
This way of operating might be a bit slower in doing logical and arithmetic operations, but it can cover in the same efficient way whatever operation you would need, and having more registers (and also flags) can be even faster in some operations. And surely is fully in line with the gigatron philosophy of using software instead of hardware!
I think it would be significantly slower at the Native code level if it was all done in software without hardware assistance, e.g. current hardware ALU operations take 1 clock cycle, if implemented at the software level they would I guess be at least an order of magnitude slower as you would have to software decode the ALU operation, create a LUT address out of operands, fetch the result from the RAM/ROM LUT and then write it to your destination, (ROM would be significantly slower/inefficient than RAM).
steve wrote: ↑22 Jul 2020, 19:16
**Ending**
I have experience in microprocessor architectures and I've realized various assembler applications, but as for electronics, this would be my first "not trivial" realization! Any feedback, suggestion, or further improvement is more than welcome!
I'm glad to know you managed to read till here; hope you found it interesting!
In the meantime, happy hacking!
_Stefano
It's a non trivial undertaking that you have set yourself upon and a truly formidable firmware update if you plan to implement so much of it in software, I think most of your ideas are based on a solid foundation and are absolutely doable with varying degrees of difficulty.
The success of your project is completely up to you, if it was me approaching this magnitude of change I would start off small and forget about the big picture, (but the first thing I would ask myself is how much compatibility I would want with the current Gigatron at all levels). Once I answered that question I would then attack the problems from small/easy to large/difficult in that order.
I have no doubt that you can optimise the current Gigatron design in a myriad of ways whilst remaining true to it's original design principals, good luck and keep this thread updated with your ideas, successes and failures; as all of it is part of the learning journey.