hnaves wrote: ↑14 May 2021, 21:30
I am still far from understanding the big picture though... How do you plan to implement the words? Are you going to implement the full set of words of the Forth standard in native code (ROM)?
It's hard to see the big picture when it's only half drawn.
In ROM, yes, in native code, no.
To the extent that there is a big picture, it's roughly this: there are more or less two Forths - and only one of them has been written. View this as a proof of concept - that hasn't yet proven the concept.
The idea is that when the user starts the system, there is nothing in memory except the dictionary. It's all available to the user. All of the code lives in, and runs from ROM. From the user's point of view this will be a fairly conventional indirect-threaded code system - when the user types a colon definition, it's compiled as a list of pointers to pointers to code as in Jonesforth (and your system, I believe). The only odd part is that the first pointer is a RAM address (usually a pointer into the dictionary entry), the second a ROM address (because the code is in ROM). This is what I call RAM mode, and only bits of it exist.
The other part is what I call ROM mode. This is compiled Forth code that lives in ROM. This is a direct-threaded code system, and it's the bit that does exist. In this system threads are compiled as pointers to code, or they would be if pointers were a thing in the ROM. I encode them as sequences like this (and yes, I know what you're going to say, this is not a dense encoding; in a lot of cases things could be done a lot better).
Code: Select all
st $low-byte [Y, X++]
jmp Y, <address of a zero-page routine that moves the IP>
st $high-byte [Y, X++]
This is what that bootstrap compiler is doing.
hnaves wrote: ↑14 May 2021, 21:30
I see labels such as forth.DO-DOCOL-ROM and forth.DO-DOCOL-RAM, which makes me wonder if some of the words have two versions... Additionally, I see labels such as forth.core.RSHIFT, forth.core.ABS, which are non-trivial words, so it seems that you are implementing everything in ROM....
In a DTC system, every thread starts with some machinecode which updates the returnstack, and moves the IP etc. That's all the DO-DOCOL stuff you're seeing in the listing. Because we have to do different stuff when being called from RAM mode we actually have two prefixes - in principle some words might only need one or the other, but for now I think I always put in both.
As you'll surmise there are two implementations of NEXT, one that treats the IP as a RAM pointer, and reads the next address in RAM, and on that executes a "ROM pointer". Which is in use is defined by a zero-page variable.
There will be other things that need to be duplicated, for example, I have a ROM-mode ?BRANCH, but it will need a different implementation for RAM mode. Most of RAM mode can be written in Forth though.
Roughly I'm trying to have a sensible core written in assembly, but use Forth where possible (like ABS, see core.f). Some things (like RSHIFT) are in assembly because it's fun (have a look! I felt pretty proud of it at the time. forth/_shift.py).
hnaves wrote: ↑14 May 2021, 21:30
I started reading your code, and I found the way you defined the bootstrap compiler in forth/bootstrapforth very interesting.
Interesting is one way of putting it. It seemed like such a simple idea, but the way I had to keep recompiling the same files with different dictionaries to get a version that does the necessary things was... well, confusing to say the least. I feel that there's got to be a better way.
hnaves wrote: ↑14 May 2021, 21:30
Have you thought about implementing a Forth virtual CPU in native code, something like a
J1 clone (such as
H2)?
No, but it sounds like a fun idea. It would be interesting to see how well it compares. I'm glad that you've made GtForth, because otherwise I was going to have to complete a vCPU Forth after the ROM forth, just to see if I wasted my time!