v6502

Using, learning, programming and modding the Gigatron and anything related.
Forum rules
Be nice.
User avatar
marcelk
Posts: 377
Joined: 13 May 2018, 08:26

v6502

Post by marcelk » 14 Jun 2019, 23:30

I just pushed into GitHub some experimental code that might become part of the platform: a 6502 emulator. This is very much work in progress, based on an idea I was pondering about for a while and that needed to get out.

v6502 is a MOS 6502 emulator written in 8-bits native Gigatron code, and that can coexist with the Gigatron's video and sound generation. It's about 1K in size, or more than double that of vCPU. It follows much of the same implementation principles of our 16-bit vCPU that runs all applications. The MOS 6502 of course is the microprocessor that formed the heart of the Apple 1, Apple II, Atari 2600, Commodore 64, BBC Micro, NES game console and Tamagotchi. If you want to know how it works, watch this amazing talk.

This new emulator operates as an alternative to the 16-bits vCPU. It is easy to switch between the two, even within the same VGA scan line. So we can now say that "the microcomputer without microprocessor is becoming dual core"! :lol:

v6502-first-spin.png
v6502-first-spin.png (257.48 KiB) Viewed 2905 times

I haven't benchmarked it, but we can estimate: the processing of 6502 instructions is split in a fetch phase and an execute phase. The fetch reads the instruction, operands and updates the program counter. The execute then performs the action. These two steps typically take up to 38 clock cycles each (depending on their complexity: sometimes it's still a bit more). In an attempt to improve the granularity of execution, these steps can be scheduled on different VGA scanlines. This is much more advanced than vCPU, but necessary because of the relative complexity of 6502 instructions. This way a VGA scanline can typically accommodate 3-5 such steps. So on average we should get almost 2 full instructions per scanline done. In the fastest video mode we have (521-120)*60 scanlines per second available. So we should get around 0.05M instructions per second. With even more handwaving we can hope that's about 5-10 times slower than the original 1 MHz 6502 chip.

v6502 code will always be slower than vCPU code. That's because it processes only 8-bits per instruction, and because it must emulate a lot of stuff that might not be needed, such as page crossing and status flags. For example, for the ADC instruction the execute step alone takes 38 Gigatron cycles (6 µs), and that just adds two 8-bit values. That is something the hardware can do in a single cycle. The other 37 cycles are mostly just busy work to figure out the carry and overflow status flags. For reference, vCPU's 16-bits ADDW instruction takes 28 cycles, and that includes both fetch AND execute. Much more effective.

But the attraction is not the speed: it's the familiarity and the code base that is out there. MS BASIC with floating point? MicroChess? The coolness of implementing a 6502 with half of the gates that are in the actual 6502...? It should still be faster than the MOnSter 6502 (and that one doesn't do video :o)

Keep you posted!


P.S: Some technical details from the commit message:

v6502: 50% implemented, basic testing ok on actual hardware
  • All of the addressing modes implemented
  • Half of the instructions implemented
  • Linear address space, unlike the underlying hardware
  • Most status flags implemented, including overflow flag for ADC/SBC
  • Smooth switching between vCPU and v6502:
    • The v6502 program counter is vLR, and v6502 doesn't touch vPC
    • Resuming vCPU with the BRK instruction
    • vCPU can then save/restore the vLR with PUSH/POP
    • Stacks are shared, vAC is shared. High vAC byte will be cleared by BRK
    • 6502 can also set vPC before BRK, and vCPU will continue there (at vPC+2)
  • Makefile targets for subproject ('make mos' and 'make burnmos')
  • Cycle correct video timing for demo program below

Code: Select all

[def                            {      org $0202   }
        $EA#                    { 0202 nop         }
        $AE# $00# $08#          { 0203 ldx $0800   }
        $E8#                    { 0206 inx         }
        $FE# $00# $08#          { 0207 inc $0800,X }
        $D0# $FA#               { 020a bne *-6     }
        $00#                    { 020c brk         }
] \vLR= { = v6502 PC }

Code: Select all

{ GCL notation for main loop }
$0b0c \sysFn=                   { SYS_v6502_v4_80 (preliminary address) }
[do
  push                          { Save start address on stack }
  80!                           { Run v6502 until BRK }
  pop                           { Restore start address }
  loop]                         { Forever }

monsonite
Posts: 62
Joined: 17 May 2018, 07:17

Re: v6502

Post by monsonite » 15 Jun 2019, 08:25

Hi Marcel,

This is a colossal, yet worthy sub-project - and judging from your test program, you have already passed the point of proving it to be viable - Well Done!

I thought that we hadn't heard from you for a while - and I happened to revisit the Gigatron Github ROM repository yesterday morning when I stumbled across v6502 - and my first reaction was WTF!

Having looked through some of the code, I could see that to emulate a 6502 and keep it within the timing constraints of the video generation is no small undertaking.

With a virtual 6502 on board - this opens up access to a vast range of software - and languages such as Pascal, Forth and floating point BASIC that were widely implemented on the 6502. Plus Wozniak's Sweet16 virtual machine as a means of manipulating data and strings within memory.

Where processor intensive work - such as floating point calculations were required, I wonder if it would be possible to suppress alternate frames of video, with the only overhead being Hsync and Vsync generation. That would almost double the time available to run v6502 and still produce a recognisable text display.

If vCPU and v6502 can run together within the same scanline - then is the task of video generation devoted to vCPU and v6502 can be used for running the application?

I look forward to the official release of this exciting new venture.

Perhaps we can then build one of these.....
Benders_Brain.JPG
Benders_Brain.JPG (35.21 KiB) Viewed 2859 times
https://spectrum.ieee.org/tech-history/ ... ders-brain

cmpxchg
Posts: 7
Joined: 07 Jun 2019, 11:44

Re: v6502

Post by cmpxchg » 15 Jun 2019, 11:15

Very nice! Yet another 'encapsulated processing core' - similar to a russian doll.
There is one thing which I dearly miss in both cores, and that is interrupt handling - NMI or normal IRQs - within 'reasonable' latency.
It doesn't need to be cycle-exact, as long the user-level end-purpose - reliable working input and output - is reached.

I am thinking to explore a separate clockdomain next to the gigatron own clock, to process external asynchronous events and serial frames.

This 2nd clock domain can be clocked by analog VCO like a CD4046B, providing a coarse clock for initial symbols, that can be aligned with
minimal hardware with the semi-regular interval-spaced edges in an incoming serial datastream.

The incoming serial datastream could be floppy data @ 250 or 500 kbps, decoded in hardware with aid of aligning the VCO clock with the
incoming data, or a asynchronous UART data, or an ethernet frame from the physical layer (10BASET) - all using hardware to fine-steer the VCO.

The decoded data could be cut into 8 raw bits and pushed word-for-word into a small dual-ported register bank attached to
somewhere in the top half gigatron's RAM address space.
A modified bitcounter reset circuit could skip the first startbit of a UART, along with a few other bits that select the mode the 'serial coprocessor' is in.

The external simple 3- or 4-bit register index counter, can make a multi-word FIFO. This counter at another location next to the registerfile, could be read out as well by the main processor, to have an idea how many and which of the words are valid - perhaps read twice or thrice (or copy upon update into another register in gigatron clockdomain)

To output data, the same clockdomain and another registerfile, or other section of the same registerfile could be used.

User avatar
marcelk
Posts: 377
Joined: 13 May 2018, 08:26

Re: v6502

Post by marcelk » 15 Jun 2019, 11:56

cmpxchg wrote:
15 Jun 2019, 11:15
There is one thing which I dearly miss in both cores, and that is interrupt handling - NMI or normal IRQs - within 'reasonable' latency.
It doesn't need to be cycle-exact, as long the user-level end-purpose - reliable working input and output - is reached.
I would do that by polling the signals of interest from the video/audio/led/input driver, and then redirecting the virtual CPU that must handle the interrupt by modifying its program counter. The reset interrupt already works like that.

User avatar
marcelk
Posts: 377
Joined: 13 May 2018, 08:26

Re: v6502

Post by marcelk » 15 Jun 2019, 12:42

monsonite wrote:
15 Jun 2019, 08:25
Having looked through some of the code, I could see that to emulate a 6502 and keep it within the timing constraints of the video generation is no small undertaking.
For personal reasons I can't spend much time on these projects recently. I needed a break from tackling the C compiler register allocation issue, because I had burnt too much energy on that already and I was getting stuck. Drafting the v6502 is a surprisingly fun activity during which I can switch off my brain. Getting the timing right is much like solving simple Sudoku puzzles.

With a virtual 6502 on board - this opens up access to a vast range of software - and languages such as Pascal, Forth and floating point BASIC that were widely implemented on the 6502.
That's the emerging common theme of all these subprojects: working towards external standards (PS/2, WozMon, TinyBASIC, SPI, ANSI-C, 6502, ...).

Plus Wozniak's Sweet16 virtual machine as a means of manipulating data and strings within memory.
Or use vCPU for that...

Where processor intensive work - such as floating point calculations were required, I wonder if it would be possible to suppress alternate frames of video, with the only overhead being Hsync and Vsync generation. That would almost double the time available to run v6502 and still produce a recognisable text display.
That's also always in the back of my mind. Unfortunately, I don't think there is an interlaced VGA mode that LCD monitors can display. So you need to hook up a CRT and that is holding me back. We can fake interlacing by displaying every other frame as completely black. Or by sending out just one color per scanline in those frames. We can even make the number of such "fast frames" part of the video mode definition ("N out of N+1"). But we have to watch out for photosensitive epilepsy effects.

monsonite
Posts: 62
Joined: 17 May 2018, 07:17

Re: v6502

Post by monsonite » 15 Jun 2019, 13:18

Marcel,

Thanks for the prompt reply.

I was thinking that a 30Hz frame rate might be tolerable - but anything less than that would be hard on the eyes and the brain. Films, back in the old days, were shot at 24 fps, and had a sort of motion blur, depending on the shutter angle - that gave them a softer look.

Without the timing constraints of video generation, the Gigatron has a performance that approaches that of a 1MHz 6502. Perhaps we have reached the point where one Gigatron emulates a 6502 and another Gigatron becomes the GPU?

Are there any low hanging fruit - where a native Gigatron instruction performs an exact equivalent of a 6502 instruction?

Currently I am looking at an easy extension to BabelFish using an external Ferroelectric FRAM device - to provide non-volatile storage of user programs.

regards

Ken

User avatar
marcelk
Posts: 377
Joined: 13 May 2018, 08:26

Re: v6502

Post by marcelk » 15 Jun 2019, 13:59

Are there any low hanging fruit - where a native Gigatron instruction performs an exact equivalent of a 6502 instruction?
Not really, but there's not much overhead in the emulation either. The 6502 instruction set is more orthogonal than I remembered, and decoding was easier than expected. Implementation wise, there are a lot of "happy little accidents" that could have gone in the wrong direction just as easily. With respect to behaviour, the 6502 has a few modules inside that work in parallel, and you just have to emulate their effect sequentially.

Initially I feared most for the program status register "P". The breakthrough came by splitting it in a P and Q variable. Q simply gets a copy of the calculation result when the N and Z flags must be remembered. That takes 1 cycle. v6502_BEQ/BNE/BMI/BPL need 1 cycle to read it back and then the hardware condition decoding can do the rest. The "true" P will be reconstructed only with PHP/PLP/RTI.

The hardest feature (except decimal mode), is the overflow flag V, insofar generated by ADC/SBC ("Add/Subtract with Carry"). In the 6502 this is the job of a few transistors. The emulator uses 12 "ticks" (24 instructions) just for determining the C and V flags and stuffing them in P. Luckily, all other instructions don't require this complex V calculation, even CMP/CPX/CPY don't set it. (Thank you Chuck!) I plan to ignore decimal mode completely.

This is what "ADC" looks like now (it still needs to undergo rigorous testing):

Code: Select all

v6502_ADC:    101b 152a  ld   [$2a],y
              101c 0126  ld   [$26]       ;Carry in
              101d 2001  anda $01
              101e 8118  adda [$18]       ;Sum
              101f f03a  beq  .adc2
              1020 8d00  adda [y,x]
              1021 c227  st   [$27]       ;Update Q
              1022 6118  xora [$18]       ;Overflow flag V
              1023 c218  st   [$18]
              1024 0d00  ld   [y,x]
              1025 6127  xora [$27]
              1026 2118  anda [$18]
              1027 2080  anda $80
              1028 c21d  st   [$1d]
              1029 0127  ld   [$27]       ;Update A
              102a c218  st   [$18]
              102b e82f  blt  .adc0       ;Carry out C
              102c ad00  suba [y,x]
              102d fc31  bra  .adc1
              102e 4d00  ora  [y,x]
.adc0:        102f 2d00  anda [y,x]
              1030 0200  nop
.adc1:        1031 3080  anda $80,x
              1032 0126  ld   [$26]       ;Update P
              1033 207f  anda $7f
              1034 4500  ora  [x]
              1035 411d  ora  [$1d]
              1036 c226  st   [$26]
              1037 140e  ld   $0e,y
              1038 e020  jmp  y,$20
              1039 00ed  ld   $ed
.adc2:        103a c218  st   [$18]       ;Special case
              103b 0126  ld   [$26]
              103c 207f  anda $7f         ;V=0, keep C
              103d c226  st   [$26]
              103e 140e  ld   $0e,y
              103f e020  jmp  y,$20
              1040 00f5  ld   $f5
Currently I am looking at an easy extension to BabelFish using an external Ferroelectric FRAM device - to provide non-volatile storage of user programs.
That's cool. The output bandwidth is now 1 bit per vPulse. Barely ok for small TinyBASIC programs. There is a fully-tested SYS function still commented-out in the source that does 2 bits per vPulse. I left it out of ROM v3, because my monitors didn't like that much pulse width variation. Look for "SYS_SendSerial2_vX_110".

User avatar
marcelk
Posts: 377
Joined: 13 May 2018, 08:26

Re: v6502

Post by marcelk » 17 Jun 2019, 11:19

A bit more debugging done, and larger programs are possible. Video link: https://youtu.be/9jHlEjr7xJk


munch.jpg
munch.jpg (236.7 KiB) Viewed 2716 times

Code: Select all

        #$85 #$30               { munch sta $30         }
        #$a9 #$00               {       lda #0          }
        #$85 #$31               {       sta $31         }
        #$a9 #$10               {       lda #16         }
        #$85 #$32               {       sta $32         }
        #$a9 #$18               { nextT lda #$18        }
        #$85 #$33               {       sta $33         }
        #$a5 #$33               { nextY lda $33         }
        #$45 #$31               {       eor $31         }
        #$a8                    {       tay             }
        #$a5 #$30               {       lda $30         }
        #$91 #$32               {       sta ($32),y     }
        #$e6 #$33               {       inc $33         }
        #$10 #$f3               {       bpl *-13        }
        #$e6 #$31               {       inc $31         }
        #$10 #$eb               {       bpl *-21        }
        #$a5 #$30               {       lda $30         }
        #$00                    {       brk             }
This simple benchmark suggests it does two 6502 instructions on average for each black VGA scanline. At the fastest video mode, this will be the equivalent of 125,000 cycles per second, or 8 times slower than the original NMOS chip at 1 MHz. Not bad for half the transistors that are in a true 6502.

User avatar
marcelk
Posts: 377
Joined: 13 May 2018, 08:26

Re: v6502

Post by marcelk » 23 Jun 2019, 16:41

v6502 now manages to run wozmon. We already have a built-in WozMon of course, but that's a rewrite in GCL/vCPU with extra features (such as backspace).

This is the original Apple-1 wozmon running in v6502 land. This means you can use it to toggle in MOS 6502 code and run it with R.

Of course I needed to patch the memory mapped I/O because our hardware is different. See code in GitHub. Changes I made are:
  1. Relocate to $600 because $FF00/$7F00 conflicts with default screen memory
  2. Relocate zero page variables to above $30
  3. Replace keyboard input ("KBD") with some 6502 instructions that peek at the serial input
  4. Change keyboard codes for CR (13) to LF (10), and "rubout" ($5F) to delete ($7F)
  5. Replace terminal output ("DSP") with vCPU code that can draw characters and scroll the screen
  6. Substitute space separators with a smaller shift, so that the output doesn't wrap around
  7. Everything could be patched without relocating unrelated code
With a tongue in the cheek, we can say it turns the Gigatron into an Apple-1 clone :D . After all: "Apple1 = 6502+RAM+WozMon+PIA", and "PIA = DSP+KBD". There wasn't more to it and we emulate it all! In reality, any original Apple-1 software will likely need some relocation and a tiny bit of patching for I/O.

Apple1emu.png
Apple1emu.png (115.3 KiB) Viewed 2582 times

I put an emulator version online here
where you can type to try it out. Note that it doesn't emulate the signature cursor symbol (`@'), and letters must be typed in UPPER case!


P.S:The `@' in the screenshot above is a fake cursor symbol: I keyed it in manually for the photo-op.

User avatar
marcelk
Posts: 377
Joined: 13 May 2018, 08:26

Re: v6502

Post by marcelk » 03 Jul 2019, 08:28

Today's HaD coverage made me realise lots has happened last week.

HaD-20190703-small.png
HaD-20190703-small.png (157.87 KiB) Viewed 1985 times

Recent developments include:
  1. All MOS 6502 instructions, addressing modes and flags are implemented(*)
  2. Not all are fully functionally tested yet, but timing-wise (with respect to VGA) it all looks good
  3. We do have big parts of Microchess running
  4. Virtual CPU switching can happen between any two instructions, so within the same VGA scanline
  5. The Apple-1 mockup remaps the Gigatron video a bit (it's software-defined after all). Short story is that wozmon routines are now visible at their standard location in $FFxx, even on the 32K system
  6. An '@' sign is displayed while waiting for input, flashing at 1Hz with a 75% duty cycle
  7. Keyboard input is mapped to upper case automatically, and that also maps DEL to rubout ('_')
  8. All DSP/KBD mockup code is moved to page 3, making the entire input buffer $200..$27F available
  9. Even more of the original wozmon bytes are restored to their authentic value
All for a vastly improved retro experience. And oh, surprisingly, something like

Code: Select all

FF00.FFFF
is much faster than on the original Apple-1.

I'm tempted to support

Code: Select all

E000R
as well (Apple1 BASIC), at the expense of some screen area to make room for it on the 32K system. But VTL02 and Microchess will likely go first to shake out any hidden bugs and to clear the path for larger programs.



(*)Except that ADC/SBC ignore decimal mode, and it will likely remain that way

Post Reply