That sounds nice, but unfortunately it doesn't work like this in my Forth; we can't really have pointers in ROM. Because of the Gigatron Harvard architecture we can't treat the ROM like RAM and read arbitrary bytes - this is possible in other Harvard architectures like the i8051, but not on the Gigatron. We can hold data in ROM, carried in the operands of instructions but the only way to access that data is to execute those instructions.
In my case I encode my "pointers" as three instructions: st $ll,[y, x++]; jmp y, $cc; st $hh,[y,x++] - where $ll is the low-byte of the address, $hh is the high-byte. This copies my pointer to an arbitrary RAM address (set by the Y and X registers) then jumps to another routine at address $cc (in the corresponding ROM page). NEXT jumps in with y = 0 and x pointing to my W variable in RAM page zero. The routine in ROM page zero just bumps the Interpreter Pointer (in this case by 3, which is passed in the accumulator), so that the next time around we read the next pointer, then jumps through W. (It's all a little more complicated than this in reality). I don't claim this is the best way to do it, and it certainly isn't the only way to do it, it's just that it's how I am doing it for now.
Actually no, that's not what the constraint is. I have a scheme where each page which holds the start of a Forth word (including threads, this is a Direct Threaded Code Forth) must have a "trampoline" routine at a fixed offset in the page. This is used in determining if there is sufficient time remaining to execute a word. Again, I don't claim this is the best way to do it, and it certainly isn't the only way to do it, it's just that it's how I am doing it for now. This is why threads need to be contained in a single page. They could perhaps be split across pages, but I haven't worked out a mechanism for this, and haven't needed it yet. It also makes the code which bumps the IP simpler - it doesn't have to detect overflow.
Given that I start out with this constraint, my branches can never be outside the current page, so 8-bit signed relative amounts are fine, and this works quite nicely, as the move_ip routine already does the required update of the IP. So BRANCH and ?BRANCH just jump back into the thread, where I load the relative movement into the accumulator before jumping to move_ip (at least conceptually - again, some details elided). That's how I ended up wanting to calculate the addresses of labels relative to the pc() - my compiler emits labels at branch targets.