v6502
Posted: 14 Jun 2019, 23:30
I just pushed into GitHub some experimental code that might become part of the platform: a 6502 emulator. This is very much work in progress, based on an idea I was pondering about for a while and that needed to get out.
v6502 is a MOS 6502 emulator written in 8-bits native Gigatron code, and that can coexist with the Gigatron's video and sound generation. It's about 1K in size, or more than double that of vCPU. It follows much of the same implementation principles of our 16-bit vCPU that runs all applications. The MOS 6502 of course is the microprocessor that formed the heart of the Apple 1, Apple II, Atari 2600, Commodore 64, BBC Micro, NES game console and Tamagotchi. If you want to know how it works, watch this amazing talk.
This new emulator operates as an alternative to the 16-bits vCPU. It is easy to switch between the two, even within the same VGA scan line. So we can now say that "the microcomputer without microprocessor is becoming dual core"!
I haven't benchmarked it, but we can estimate: the processing of 6502 instructions is split in a fetch phase and an execute phase. The fetch reads the instruction, operands and updates the program counter. The execute then performs the action. These two steps typically take up to 38 clock cycles each (depending on their complexity: sometimes it's still a bit more). In an attempt to improve the granularity of execution, these steps can be scheduled on different VGA scanlines. This is much more advanced than vCPU, but necessary because of the relative complexity of 6502 instructions. This way a VGA scanline can typically accommodate 3-5 such steps. So on average we should get almost 2 full instructions per scanline done. In the fastest video mode we have (521-120)*60 scanlines per second available. So we should get around 0.05M instructions per second. With even more handwaving we can hope that's about 5-10 times slower than the original 1 MHz 6502 chip.
v6502 code will always be slower than vCPU code. That's because it processes only 8-bits per instruction, and because it must emulate a lot of stuff that might not be needed, such as page crossing and status flags. For example, for the ADC instruction the execute step alone takes 38 Gigatron cycles (6 µs), and that just adds two 8-bit values. That is something the hardware can do in a single cycle. The other 37 cycles are mostly just busy work to figure out the carry and overflow status flags. For reference, vCPU's 16-bits ADDW instruction takes 28 cycles, and that includes both fetch AND execute. Much more effective.
But the attraction is not the speed: it's the familiarity and the code base that is out there. MS BASIC with floating point? MicroChess? The coolness of implementing a 6502 with half of the gates that are in the actual 6502...? It should still be faster than the MOnSter 6502 (and that one doesn't do video )
Keep you posted!
P.S: Some technical details from the commit message:
v6502: 50% implemented, basic testing ok on actual hardware
v6502 is a MOS 6502 emulator written in 8-bits native Gigatron code, and that can coexist with the Gigatron's video and sound generation. It's about 1K in size, or more than double that of vCPU. It follows much of the same implementation principles of our 16-bit vCPU that runs all applications. The MOS 6502 of course is the microprocessor that formed the heart of the Apple 1, Apple II, Atari 2600, Commodore 64, BBC Micro, NES game console and Tamagotchi. If you want to know how it works, watch this amazing talk.
This new emulator operates as an alternative to the 16-bits vCPU. It is easy to switch between the two, even within the same VGA scan line. So we can now say that "the microcomputer without microprocessor is becoming dual core"!
I haven't benchmarked it, but we can estimate: the processing of 6502 instructions is split in a fetch phase and an execute phase. The fetch reads the instruction, operands and updates the program counter. The execute then performs the action. These two steps typically take up to 38 clock cycles each (depending on their complexity: sometimes it's still a bit more). In an attempt to improve the granularity of execution, these steps can be scheduled on different VGA scanlines. This is much more advanced than vCPU, but necessary because of the relative complexity of 6502 instructions. This way a VGA scanline can typically accommodate 3-5 such steps. So on average we should get almost 2 full instructions per scanline done. In the fastest video mode we have (521-120)*60 scanlines per second available. So we should get around 0.05M instructions per second. With even more handwaving we can hope that's about 5-10 times slower than the original 1 MHz 6502 chip.
v6502 code will always be slower than vCPU code. That's because it processes only 8-bits per instruction, and because it must emulate a lot of stuff that might not be needed, such as page crossing and status flags. For example, for the ADC instruction the execute step alone takes 38 Gigatron cycles (6 µs), and that just adds two 8-bit values. That is something the hardware can do in a single cycle. The other 37 cycles are mostly just busy work to figure out the carry and overflow status flags. For reference, vCPU's 16-bits ADDW instruction takes 28 cycles, and that includes both fetch AND execute. Much more effective.
But the attraction is not the speed: it's the familiarity and the code base that is out there. MS BASIC with floating point? MicroChess? The coolness of implementing a 6502 with half of the gates that are in the actual 6502...? It should still be faster than the MOnSter 6502 (and that one doesn't do video )
Keep you posted!
P.S: Some technical details from the commit message:
v6502: 50% implemented, basic testing ok on actual hardware
- All of the addressing modes implemented
- Half of the instructions implemented
- Linear address space, unlike the underlying hardware
- Most status flags implemented, including overflow flag for ADC/SBC
- Smooth switching between vCPU and v6502:
- The v6502 program counter is vLR, and v6502 doesn't touch vPC
- Resuming vCPU with the BRK instruction
- vCPU can then save/restore the vLR with PUSH/POP
- Stacks are shared, vAC is shared. High vAC byte will be cleared by BRK
- 6502 can also set vPC before BRK, and vCPU will continue there (at vPC+2)
- Makefile targets for subproject ('make mos' and 'make burnmos')
- Cycle correct video timing for demo program below
Code: Select all
[def { org $0202 }
$EA# { 0202 nop }
$AE# $00# $08# { 0203 ldx $0800 }
$E8# { 0206 inx }
$FE# $00# $08# { 0207 inc $0800,X }
$D0# $FA# { 020a bne *-6 }
$00# { 020c brk }
] \vLR= { = v6502 PC }
Code: Select all
{ GCL notation for main loop }
$0b0c \sysFn= { SYS_v6502_v4_80 (preliminary address) }
[do
push { Save start address on stack }
80! { Run v6502 until BRK }
pop { Restore start address }
loop] { Forever }