Wednesday, March 28, 2012

A 65C02 Disassembler

I just wrote a disassembler that runs on the Replica 1. I did it mostly for a personal programming challenge as there are lots of them around. Woz did one for the Apple 1 -- I think it was published in Byte and was probably the one included in the Apple II ROMs.

Years ago on my first computer (a 6502-based Ohio Scientific) I wrote one, first in BASIC and then in machine language. I remember it used a somewhat simplified format, e.g. STAX $nn for STA $nn,X and JMPI $1234 for JMP ($1344) etc.  It was written on paper and hand assembled. Once I had a disassembler it made it much easier to catch errors in hand assembled code.

My implementation is in assembler, written for the CC65 cross-assembler. It supports all 65C02 mnemonics including the Western Digital only opcodes. The output is virtually identically to Krusader's disassembler. To test it I captured the output of my disassembler and Krusader's and checked for any differences.

I have a standalone version which disassembles memory a screen at a time. I also integrated it into my JMON machine language monitor program to which I added a new U (unassemble) command.

Here is a screen shot with some sample output:

Screen Shot of Dissembler Output
And here is an example of some 65C02 instructions being disassembled:

some 65C02 Instructions

I didn't look at any other disassembler implementations, but there are only so many ways to do it. About half of the information is in data structures or tables.

I have one lookup table of all the instruction mnemonics. They are 3 characters each and there are 71 of them including all the 65C02 instructions. I have another table of all 256 possible opcodes. For each opcode, the table has two entries - the instruction (an index into the previously described table) and an entry listing the addressing mode. Thus, for a given op code, say $EA, I can look up in the table that is is a NOP instruction, using implicit addressing mode, and the mnemonic is "NOP". Another small table lists the number of instruction bytes for each addressing mode. For example, for implicit addressing it is one byte. There are 16 possible addressing modes.

I initially included the number of instruction bytes in the table of opcodes until I realized that for a given addressing mode it was always the same so I could use a small lookup table based on the addressing mode.

The major part that is hardcoded rather than in tables is the logic that displays the instruction operands appropriately given the addressing mode. The total size is about 1.5K including all utility routines of which a little over half is code and the rest is data. It will run out of ROM if desired.

If one really wanted to optimize the code for size I suppose you could reduce the size of the opcode table by taking advantage of the fact that over a quarter of it is not valid opcodes (instructions ending with hex value 3, 7, B, and F, for example) but this would complicate the logic for the table lookup, and once you add 65C02 instructions many of the invalid opcodes are used.

Adding 65C02 support did not add much code although handling the SMB, RMB, BBR, and BBS instructions was a little complex due to the funky format they have.

I put in an assemble time option so that the output can contain only the instructions and not the memory data bytes, so you could feed the output into an assembler. Here is an example of it running in that mode:

    JSR   $0540
    LDX   #$94
    LDY   #$08
    JSR   $0579
    JSR   $0540
    LDA   #$80
    STA   $37
    LDA   #$02
    STA   $38
    JSR   $0540
    LDA   #$17
    JSR   $02B7
    SBC   #$01
    BNE   $029A
    LDX   #$6F
    LDY   #$08
    JSR   $0579
    JSR   $055F
    CMP   #$20
    BEQ   $0295

The code is licensed under Apache license so you are free to use it if you wish.  This first version can be considered beta -- it is complete but may still have bugs.

The standalone version and the version of JMON with the disassembler can be downloaded from these link:

No comments: