| |
Virtual 6510 - Technical Overview
Introduction:
Ever wanted to upgrade your C64's speed as often as PC technology
changes? Well this project aims to be just that - a software /hardware
combination that would give 8-bit Commodore machines a "virtual"
CPU upgrade. The real CPU (a 6510 or 6502 chip) is not replaced
by electronics, but rather by a software emulation of it run on
a modern PC. The PC provides accelerated performance because of
its modern design and faster speed, whilst the real Commodore hardware
provides compatibility as it is used for sound, video and device
I/O.
To make all this possible a hardware adaptor and cable is needed,
in addition to the emulation software.
Work on this project has now stopped due to a general lack of interest in it by the C64 online community. The information below may of interest to those who ware interested in a review of the challenges involved.
Overview of Concept:
The PC runs the V6510 software which emulates the functionality
of the Commodore's CPU. Whilst the emulation software is able to
do the CPU processing, it has to access the real Commodore hardware
for anything to be visible on the screen or for commands to be received
from the keyboard (for example). To do this the software directs
most write operations through to the real Commodore hardware.
Sometimes the PC also needs to read from the real Commodore.
These reads are only needed when a value changes on the C64
hardware that is not otherwise detectable by the PC (for example
a key is pressed or the joystick moves).
Information is exchanged between the PC and the Commodore via
a cable and hardware adaptor. Other accelerators for the Commodore
do essentially the same thing, but instead of a 65C02 or 65816
processor your PC will run a software model, or "virtual"
processor, and that uses its own copy of the memory map. With
the appropriate hardware adaptor, the same software could be adapted
to accelerate all 8-bit CBM computer types.
The benefit of using PC software is that speed and emulation
capability can be developed independently of the availability
to buy a faster 65xxx processor. Simply put, as PC technology
is upgraded, so will the speed of the "virtual" CPU
upgrade for your Commodore. The challenge, as will become clear,
is maximising the speed of the virtual processor whilst handling
all the I/O which needs to be done to keep the real hardware in-sync
with the software emulation.
How does Virtual6510 work?
The heart of V6510 is the software engine which emulates the 6510
CPU. This software engine reads code and processes it. Most operations
are conducted in the software engine, which includes the Commodore's
memory map and ROMs. Depending upon configuration settings, operations
which write to memory or I/O are written to both the virtual model
and the real hardware.
The operation of writing to real hardware needs a high degree
of optimisation in order to not slow the operation of the software
engine. If the hardware interface could achieve the write operation
in 1us (1 clock cycle) then the process will be faster than a
real C64, and as fast as the SuperCPU. Unfortunately, without
a 32-bit synchronised I/O card on the PC this I/O speed is not
possible.
V6510 proposes to use two priority levels when writing to the
hardware. Firstly, all memory writes are cached to the virtual
map and a byte-by-byte system is proposed for writing the data
to the real hardware, skipping unchanged locations. This method
ensures the memory gets synchronised, the rate of which depends
upon the optimisation mode configured. Secondly, all writes to
Commodore I/O chips (such as the VIC, SID, etc) take priority
over the memory sync bytes.
Similar to I/O writes, I/O reads have the highest level priority.
This means that a read from an I/O location which has the potential
to change will be read from the hardware in preference to the
memory sync operations. Some I/O locations do not change on their
own and so data from these is read from the virtual map (eg a
sprite location will not change under hardware control, it only
changes by software).
The actual hardware read and write operations are synchronised
by the C64's readiness for data (thus taking into account the
VIC-II stolen cycles). Data however is sent in an economical fashion
using tokens, meaning that most operations need only one or two
PC I/O calls. In between token requests, the PC runs at full speed
to process code. When a token has been received by the C64, it
executes one of its microcode routine, typically consisting of
3-4 instructions. Because micro-code is used rather than DMA,
each C64 read or write operation takes longer than it would on
either a real C64 or SuperCPU. The efficiency of the micro-code
minimises the delay. With the use of appropriate optimisation
modes (modes similar to the SuperCPU as well as smart modes) I/O
delays are minimised, and memory synchroning is fast enough to
minimise most graphics aliasing effects. C64 memory will only
read back into the virtual map after a disk LOAD operation.
There is one other situation where the C64's CPU is relied upon,
and that is to run timing critical 1MHz disk I/O code. JiffyDOS
routines in particular are critical to a couple micro-seconds.
Certain kernal calls must be made inside the real machine rather
than by simulation on the virtual engine. This is similar to what
a SuperCPU does as it also has to slow down (it has a patched
kernal) before doing disk I/O.
Proposed hardware adaptor:
The block diagram shows the proposed functionality of the hardware
adaptor. The implementation is achieved with several EPROMs and
a latch. One of the EPROMs is configured to act as a low-speed Programmable
Logic Array (PLA). The adaptor is expected to be most neatly configured
as a cartridge; potentially minimising the effort of building the
project by recycling an existing CBM cartridge design. Similar adaptors
could be configured for Commodore's other than the C64.
System Issues which had to
be Overcome:
This project requires a circuit to adapt and synchronise the signals
from the PC to the C64 and vis-versa. The circuit is more complex
than a simple X1541 cable, but is much simpler to build than a SuperCPU
(especially so if a specific CBM cartridge is used as the donor!)
The need for the circuit comes from the fact that I/O communications
on the PC bus can be severely limited in both speed and bus width.
The limitation means that the PC (even the very fast ones) cannot
directly write and read into the Commodore's bus. Additionally,
there is no standard I/O port available on all PCs which has sufficient
control lines.
The LPT port used for the X-cables is a robust connection that
is standard on most PCs. Unfortunately, the LPT connection is
also effectively an 8-bit data bus with 4-bit control and 5-bit
status lines.
In SPP or Bi-Dir modes the maximum speed the LPT port can reach is
approximately 1MHz, but on most machines it is much slower. This
limitation is largely independent of processor speed and exists
for historical reasons. The LPT port could be operated in an enhanced
mode such as EPP, but this only provides a marginal increase in
data throughput (max. 2.4MB/s) whilst it mandates a significant
increase in hardware complexity between the PC and C64.
On the other hand, developing a PC-Card to handle the data exchange
in hardware is complex and costly, and it is also unlikely users
will want to build their own. Additionally, the I/O limitation described
above for the LPT port also applies to other conventional cards
on the PC-bus and so only the complex 32-bit I/O buses such as VLB
or PCI would improve speeds. With a 32-bit bus it may be possible
to Direct Memory Access (DMA) the Commodore hardware and also minimise
the software overhead in accessing the card, but these 32-bit bus
modes however need special I/O chips to comply with the operating
standard and thus further increase cost /complexity. Also, only
the C64/C128 has DMA capability and thus the project would in that
case be restricted to just those two machines.
Clearly 8-bit PC I/O is the limitation, and intelligence rather
than power, is necessary if a budget solution is to be found.
Synchronous vs Asynchronous
Emulation
Certain elements of the simulation are synchronous whilst others
are not.
Whilst exchanges between the PC and C64 are synchronised (to
the Commodore's readiness), the software CPU emulation is not.
The use of an asynchronous CPU model improves processing speed
(or virtual MHz).
Consider a JMP instruction, all that needed is the operation
PC=new_value. On the other hand an instruction such as LDA #$00
needs A=0, Z=0, N=0, etc. The JMP instruction is emulated faster
by the virtual engine than the LDA (the opposite of what happens
on the real CPU).
Wherever possible, the virtual engine would assume the state
of some inputs and continue. This is particularly beneficial in
polling loops where sampling the real input at a faster speed
than was possible on the original hardware is not advantageous
(eg, when polling for a joystick click, reading the port at 50MHz
is no more advantageous than reading it at 1MHz).
Advanced Options for the Future
The emulation software could be continually developed to make use
of more features available on modern PC hardware. For example, a
16Mb REU is not easily obtainable in real hardware, but in software
such a device is configurable within the RAM of a typical modern
PC. Likewise, serial modems and other expansions could be emulated.
Feasibility Testing and PC
Survey
Several PC configurations have been already assessed and more data
is required, especially for faster PC types. The data gathered so
far suggests that even though 486-class machines can achieve 5MHz
virtual operation, the need for memory synchronisation and I/O access
is likely to eliminate much of this power and result in no significant
speed acceleration.
High-end PC processors in the 500-600MHz range can achieve upwards
of 50MHz when operating on simple instructions, but more typically
achieve upwards of 15MHz with the benchmarking code and allowances
made for memory synchronisation and I/O access.
The project is now entering the public feasibility phase. If
you would like to take part in the survey send an email to
v6510@lycos.com for a link to the
software test suite. The test suite will check the speed of emulation
possible on your machine. It would be appreciated if you could
email back the result file with some feedback and comments so
I can assess whether this project is worth pursuing in the future.
|