Learn Multi platform 65816 Assembly Programming... For Ultimate Power!

While most hardware developers moved to the 68000, and those looking for a more powerful 8 bit could move to the 6809, there was 'another alternative'.... The 65816

Backwards compatible with the 65C02, and with full 16 bit capabilities, the 65816 (AKA W65C816) gave the 'best of both worlds'... although 1/3 slower than the 68000 mhz for mhz, the 65816 was powerful and capable, and was the driving force behind the SNES and APPLE IIGS...

It's 24 bit address bus gives it the capability to address 16 megabytes - like the 68000
The 65802 is essentially a 65816 with an 8 bit address bus.

If you want to learn 65816 assembly get the Cheatsheet! It covers all the varieties of the 6502 and it's derivatives, including the 65816!

We'll be using WLA as a 65816 assembler in these tutorials for 65816 assembly,

You can get the source and documentation for AS from the official website HERE

The 65816 contains all of the commands and functionality of the 6502, and then some!

In this tutorial we'll cover the new features, and what's changed, but because we don't like repeating our selft, we won't cover the stuff that's the same, so you'll need to read over our 6502 tutorials as well at some point.

Hello World Series

Hello World on the Super Nintendo - 65816 Assembly Lesson H1 [SNS]

Simple Samples

Lesson S1 - Moving a bitmap on the SNES! [SNS]

65816: 6502 Upgraded!
The 65816 improved on the 6502, upgrading all of the registers to 16 bits, enhancing the addressable memory range to 24 bit (16MB) allowing the Stack and Zero Page to be relocated anywhere in the first 64k range ($000000-$00FFFF)

As well as the registers being upgraded to 16 bits, commands that access memory also work at 16 bits, All the classic commands like EOR, ROR and ADC all act on 16 bit values now in all addressing modes.

There are many more addressing modes, to handle Long addressing (outside of a 64k range) and adding more functionality

New commands have been added, some make old tasks easier, like being able to directly Push and Pop X and Y, others add significant power, like block memory copy MVN and MVP.


The 65816 Registers
The 65816 extends the 6502 with an extra 8 bit 'B' accumulator (like the 6809) these are combined in 16 bit mode in which they are referred to as the 16 bit 'C' Accumulator.

24 bit memory addressing is achieved by adding two registers DBR and PBR which are effectively like the extra 8 bits to allow for 24 bit addressing (similar to segments on the 8086)

The Zero Page is now called the Direct Page - because it no longer has to be at $0000xx
The 65816 has an 'Emulation mode' in which it works as an 8-bit 65c02, but we can turn it into 'Native mode' for 16 bit operation... the CPU will start in 'Native' 8 bit mode.

Even in 'Native mode', the Accumulator+Memory Ops , as well as the X+Y flags can be toggled as 8 or 16 bits via two new flags.

Undocumented 6502 commands will not work correctly on the 65816, in fact they are all now new 65816 commands - and these can even be used in 6502 emulation mode, so you can use commands like "XBA" to gain access to the B accumulator in 6502 mode.

When we set the Accumulator to 8 bits (by setting the flag M=1) , commands like EOR, ROR and ADC which act on memory will also become 8 bit.
When we set the Accumulator to 16 bits (by setting the flag M=0) , commands like EOR, ROR and ADC which act on memory will also become 16 bit.
The C register (16 bit accumulator) becomes the combination of BA... switching from 16 to 8 bit does not zero either byte in the B+A accumulator pair

We can also set the X+Y registers to 8 bit (by setting flag X=0) or set the X+Y registers to 16 bit (by setting flag X=1). We cannot set them separately. The state of the X+Y registers does not affect commands which act on memory.

Switching from 16 bit mode to 8 bit mode will set the top byte of X/Y to 0...  they will still be 0 if you switch back to 16 bit mode - so the top byte of X+Y will have been lost!

If data is transferred between X and A, when the source is 16 bit, and the destination is 8 bit, then only the Low byte will be transferred.

When we surround something with brackets () we're loading indirectly from a 16 bit address, eg LDA ($12) loads a byte from the address in Direct Page  $12+$13 (16 bit address - the top byte will be the DBR)

When we use SQUARE brackets [] we're loading from a Long address, eg LDA [$12] loads 2 bytes from the address in Direct Page $12+$13+$14 (24 bit address)


24 Bit 16 Bit 8 Bit Use cases
accumulator A

A 8 Bit Accumulator
accumulator B

B 8 Bit Accumulator
16-Bit Accumulator  C
B A A+B combined to make a 16 bit accumulator
Processor flag register

P Flags
Indirect X DBR X Indirect Register
Indirect Y DBR Y Indirect Register
Hardware Stack  Pointer  00 S Stack 
Program Counter PBR PC Running Command
Direct page Register (D) 00 D Zero page is relocatable on 65816
Databank Register (B)

 DBR  Top byte of 16 bit addresses
Program Bank Register (K)

 PBR Top byte of program counter

    65816 Flags: NVMXDIZC / -------E
  6502 Flags: NV1BDIZC

Name Meaning
 N Negative 1=Negative
V Overflow 1=True
M Memory / Accumulator Select 1=8bit / 0=16bit
X XY index register select 1=8bit / 0=16bit
D Decimal mode 1=True
I IRQ disable 1=Disable
Z Zero 1=Result Zero
C Carry 1=Carry
E Emulator mode
(accessed via carry flag with XCE)
0=native 16 bit
1=Emulated 8 bit
B Brk Flag (6502 mode only)

The Break Flag only exists in 6502 mode, the BRK event is a separate interrupt vector in 65816 mode.

The ZeroPage (ZP) has been renamed to the Direct Page (DP) - but it's basically the same thing - its just now that it doesn't need to be at $00xx they felt the need to rename it!... any addressing modes you're used to with the ZeroPage will work EXACLY the same now with DP... and in these tutorials we'll keep the DP in the same memory location as the ZeroPage was to keep things simple!

Maybe we should call it "The addressing mode formally known as ZeroPage" or something! (TAMFKAZ!!)

Converting from VASM to Macro-AS or WLA

Because VASM doesn't support 65816, we're going to have to use Macro-AS for assembly, unfortunately, there's a few differences! VASM Macro-AS WLA
1. Symbol definitions must precede first use
   In Vasm an EQU statement could come after it's first use



2. #>label and #<label do not work
 use #label/256 and #label&255 instead
3. Macro definitions are different
4. we have to tell Macro AS we're defaulting to 8 bit mode


The Old 6502 Addressing Modes
The 6502 has 11 different addrssing modes... many have no comparable equivalent on the Z80
Mode 6502
Codes
65c02 Codes 65816
Codes
Description Syntax Sample Command Z80 Equivalent effective result
Implied / Inherant Imp i i A command that needs no paprameters
SEC  (set carry) SCF
Relative Rel r r A command which uses the program counter PC with and offset nn (-128 to +127)  #$nn BEQ [label] (branch if equal) JR Z,[label]
Accumulator Accum A A A command which uses the Accumulator as the parameter
ROL (ROtate bits Left) RLCA
Immediate Imm # # A command which takes a byte nn as a parameter  #$nn ADC #1 ADC 1 &nn
Absolute Abs a a
Take a parameter from a two byte memory address $nnnn  $nnnn LDA $2000 LD a,(&2000) (&nnnn)
Absolute Indexed Abs,X
Abs,Y
a,X
a,Y
a,x
a,y
Take a parameter from a two byte memory address $nnnn+X (or Y)  $nnnn,X LDA $2000,X
(&nnnn+X)
Direct Page ZeroPg zp d Take a parameter from the direct page address $00nn $nn ADC $32
(&00nn)
Direct Page Indexed ZeroPg,X
ZeroPg,Y
zp,X
zp,Y
d,x
d,y
Takes a parameter from memory address $00nn+X $nn,X ADC $32,X
(&00nn+X)
Indirect (Ind) (a) (a) Take a parameter from pointer at address $nnnn...
if $nnnn contains $1234 the parameter would come from the address at $1234
 ($nnnn)
LD HL,(&1000)
JP (HL)
(&nnnn)
Pre Indexed (Indirect,X) (Ind,X) (zp,X) (d,x) Take a parameter from pointer at address $nnnn+X
if $nnnn contains $1234, and X contained 4  the parameter would come from the address at $1238
($nn,X) ADC ($32,X)
((&00nn+X))
Postindexed  (Indirect),Y (Ind),Y (zp),Y (d),y Take pointer from address $nnnn, add Y... get the parameter from the result
if $nnnn contains $1234, and Y contained 4, the address would be read from $1234... then 4 would be added... and the parameter would be read from ther resulting address
($nn),Y ADC ($32),Y
((&00nn)+Y)

The New 65c02 Addressing Modes
Mode Description 65c02 Codes
65816 Codes Syntax Sample Command effective result
Direct page Indirect Read from the address in the Direct page (Zp) (d) ($nn) LDA ($10)
(($nn))
Absolute indexed indirect read from a 16 bit address with an X offset (a,X) (a,x)
($nnnn,x) JMP ($1000,X) (($nnnn))

The New 65816 Addressing Modes
The 6502 has 11 different addrssing modes... many have no comparable equivalent on the Z80
Mode 65816 Code Syntax Sample Command
Program Counter Relative Long rl
$nnnn BRL Label
Stack Relative (Stack + Fixed number offset) D,s
$nn,S LDA 3,S
Stack Relative Indirect Indexed with Y (Stack + Fixed number offset indirect + Y) (D,s),y
($nn,S),Y LDA (3,S),Y
Block Move xyc
src,dest MVP 0,0
Absolute Long al
$nnnnnn LDA $020000
Absolute Long Indexed with X al,x
$nnnnnn,X LDA $020000,X
Absolute Indirect Long [a]
[$nnnn] JML [$2000]
Direct Page Indirect Long [d]
[$nn] LDA [$20]
Direct Page Indirect Long Indexed with Y [d],y [$nn],Y LDA [$20],Y

Interrupt vectors
The top few bytes of the first 64k contain addresses to the handlers for interrupts - these contain a 16 bit address for the interrupt handler (meaning the handler must be in the first 64k of ram)

Some of the handlers work in 8 and 16 bit mode, others are exclusive to one mode or the other

From

To

Purpose

$00FFE4

$00FFE5

65816 - COP

$00FFE6

$00FFE7

65816 - BRK

$00FFE8

$00FFE9

65816 - ABORT

$00FFEA

$00FFEB

65816 - NMI: Non Maskable Interrupt Vector

$00FFEC

$00FFED


$00FFEE

$00FFEF

65816 - IRQ: Interrupt Vector

$00FFF0

$00FFF1


$00FFF2

$00FFF3


$00FFF4

$00FFF5

6502 - COP

$00FFF6

$00FFF7


$00FFF8

$00FFF9

6502 - ABORT

$00FFFA

$00FFFB

6502 - NMI: Non Maskable Interrupt Vector

$00FFFC

$00FFFD

6502 - Reset Vector

$00FFFE

$00FFFF

6502 - IRQ: Interrupt Vector


Defining data
The following are used for defining bytes data in WLA
Size
Bits
WLA Syntax
Byte 8 .DB
Word 16 .DW
24 bit long 24 .DL

Forcing Sizes!
WLA may get confused about the data size we're trying to use, if it does, we can force it to a specific size with a .B .W or .L extension
    lda TestCode.L,x

Simulating Commands:
BSR subprogram PSR retaddr
brl subprogram



Lesson 1 - 8 and 16 bit modes on the 65816
On power-on the 65816 will start in 6502 emulation mode - apart from undocumented opcodes it will work the same as a classic 6502.

To gain access to the 16 bit we need to turn it on.. lets learn how.

Lesson1.asm

Working in 8 Bit Mode 6502 emulation mode

While the 65816 starts in emulation mode, we may need to turn it back on, or check it's on if we don't know what the operating system has done.

The CPU mode is handled by the 'E' Flag... this isn't part of the regular flags, and there's no flag to directly change it.
To change it, we must set or clear the carry (SEC), then use the 'XCE' command (Transfer C to E)

Here we've set the Carry to 1, then transferred this to the E flag.... now E=1 , meaning 6502 emulation is enabled in the processor - and the processor is 'Fully 8 bit'
In many cases the commands our assembler needs to compile are different depend on if the CPU is in 8 bit or 16 bit mode, so after setting 6502 mode, we need to tell the assembler the Accumulator  (flag M=1) and X and Y registers (flag X=1) are both 8 bit

With the WLA assembler we do this with ".ACCU 8" and ".INDEX 8"
We can now use regular 8 bit commands as before... we won't be able to use any 16 bit values.
6502 undocumented opcodes no longer work on the 65816. These opcodes are now new functions.

In fact, these new opcodes can even be used in 6502 emulation mode!
Here are the results
The Assembler directives you need to use will depend on your assembler.

".ACCU and .INDEX" are used by WLA, but there's no standard, so you've no choice but to check the manual of your assembler to know what's right.

Turning on 65816 mode and 16 bit mode

If we want to enable 65816 mode, we need to clear the 'E' flag, once again we do this via the carry.

This time we clear the carry (CLC) and use 'XCE' to set E=0
The CPU is now in 65816 mode, but that doesn't mean the registers are 16 bit yet.

There are two parts of the CPU we can switch... they are set by two new flags in the P flags register.
The Accumulator and memory-commands (EOR ROL ASL etc) are set by the M flag (Bit 5). if M=1 they are both 8 bit, if M=0 they are both 16 bit
The Index Registers (X and Y) are set by the X flag (Bit 4). if X=1 they are both 8 bit, if X=0 they are both 16 bit.

We can use the new REP command to clear these two bits, any '1' in the parameter is a bit in the P flags to clear.

Finally we need to tell the assembler the X/Y and A/Memory are now 16 bit, we do this with ".ACCU 16" and ".INDEX 16"
We can now use 16 bit parameters... yay!
It's unlikely you'll want to switch back to 6502 mode once you enable 16 bits... You may want to switch the X/Y and Accumulator to 8 bit, but that's not the same as the 6502 mode, as the direct page is still relocatable, the stack is still 16 bit and mode.

Switching between 8 and 16 bit in 65816 mode.

There may be times we need to switch our registers to 8 bit mode, To save space (LDA #1 will take 2 bytes in 8 bit mode, or 3 in 16 bit mode) for compatibility with old code, or for working in 8 bit with memory.

We can do this with the SEP command (the opposite of REP) this can be used to set one or both the X and M bits to switch the registers to 8 bit.

Here we've switched to 8 bit mode... but the CPU is still in 65816 mode.

Doing this causes the top byte of the 16-bit X,Y registers to be zeroed. If we switch back to 16 bit mode, these top bytes are not restored.

The top byte of the 16 bit 'C' accumulator is not lost... The 8 bit halves of the register are referred to B and A
We loaded in 16 bit values into X and Y.

Switching to 8 bit caused the top bytes to be lost.

These bytes are still zero after the return to 16 bit.

The Accumulator B+A / C are unchanged

Mixed 8 and 16 bit...Never Go full 8 bit!

We can have 8 bit Accumulator and Memory commands, but 16 bit Index registers (X/Y)

Here we've reset the X bit, but Set the M bit... this means the X register is 16 bit, but the accumulator and memory commands are 8 bit.
Here is the result
Here we've reset the M bit, but Set the X bit... this means the accumulator and memory commands are 16 bit but the X register is 8 bit.
Here is the result

M Flag... for 8/16 bit Accumulator and Memory

When we set the Accumulator to 16 bit with the M flag (M=0) it also affects commands which write to memory addresses like the zero page.

Here we've loaded some test values, one into the accumulator, and the other into direct page entry z_HL

We repeatedly perform INC z_HL, and ASL - in this case in 16 bit.
The bits in the C Accumulator (16 bit accumulator) shifted in 16 Bits... the two bytes at z_HL (and z_HL+1) acted as a 'Little Endian' 16 bit pair, so when the value went over $80FF, it rolled over to $8100
We now repeat the procedure with the same commands, this time with the M flag set (M=1), meaning the Memory and Accumulator are now 8 bit.
The bits in the A Accumulator (8 bit accumulator) shifted in 8Bits... this time the two bytes at z_HL (and z_HL+1) were only altered in 8 bit, so when the value went over $80FF, the top byte was changed, meaning it returned to $8000

Lesson 2 - New Addressing Modes
As well as the 6502 addressing modes, the 65816 is capable of the extra 65c02 addressing modes, and also adds some new 65816 exclusive modes!

Lets learn about all the new modes available.

Lesson2.asm


Because the memory layout of every system is different, this tutorial will not work on all systems.

It's been tested on the SNES, so you should use it on that machine - if you want to try it on another system, you're on your own!

Direct Page - The relocatable Zero Page

On the 65816 the Zeropage does not have to be at $000000-$0000FF, we can position it anywhere between $000000-$00FFFF. it's now referred to as the 'Direct Page'

A 2 byte 'D' register defines the base of the Direct page... it must be in 'bank zero' so bits 16-24 are zero. We can't set the D register directly, we must set the 16 bit C accumulator, and transfer it to the Direct page register (D) with the TCD command.
When we specify a one byte address, in this case $80 or $82, it will now be loaded from the direct page - so address $380 and $382 in this case.
Here we've loaded X and Y from the two direct page addresses

Direct Page Indirect

On the 6502 if we wanted to use an indirect address from the Zero page, we had to use an index register... even if the index was zero

Like the 65c02, the 65816 can use an indirect address in the Direct page without a parameter
We used the command "LDA ($81)"... this loaded from the 16 bit address specified by zero page entries $81-$82.

As it's a 16 bit address, the top 8 bits of the 24 bit address are made up of the data bank register.
the bytes at that address are $01 $02... so the resulting address the data is loaded from is $0201
This time we used square brackets to specify the address "LDA [$81]"... This specifies we're loading a full 24 bit 'Long' address... This time we load in 3 bytes $01 $02 $03 - so we load our first value from $030201. (Direct Page Indirect Long)

The second example uses Y as an index (Direct Page Indirect Long Indexed with Y)... We use the command "LDA [$81],Y"... this time we load from address $030201+Y ($030202)
here are the results

(Shown on Apple II GS emulator)

Direct Page Indirect

This can only be used with the jump command.

This addressing mode takes a 16 bit address, and the X index register as an offset.

The "jmp (TestJumps,X)" command will effectively jump to address 'X' in a lookuptable of destination addresses. (A Jump Vector table)... in this case, X should be a multiple of 2
In this example X=4, so the Jump went to the 3rd address in the list "PrintC"

This prints a C to the screen

Stack Relative

The 65816 adds a new 'Stack Relative' addressing mode.

This loads a byte from an offset in the stack, this allows items on the stack to be loaded directly.

This can be used to read in bytes pushed onto the stack. This could be used to read parameters pushed onto the stack before calling a subroutine.
Here we loaded a pair of bytes from S+1, and S+2
we can also use "Stack Relative Indirect Indexed with Y" addressing.

This takes an address from the stack plus fixed offset... Y is added to this address, and the bytes are loaded from that address.

Here we've used "LDA (2,S),Y"... the resulting address is (S+2)+Y
In this case, the address $0203 is taken from the stack, Y was 1, so the final address was $0204,

At this address was the word $6665, which was loaded into C

Block Move

Clearly jealous of the Z80's LDIR command (!) The 65816 adds block memory copy commands.

There are two new commands MVN (Move Next) and MVP (Move Previous) commands

MVN (Move Next) copies data in ascending order.
MVP (Move Previous) copies data in descending order.

These use X as 16 bit the source address, Y as the 16 bit destination address.
The  16 bit accumulator C  as the bytecount-1 ...  The loop repeats until C below goes below zero, so if C=7 eight bytes will be copied.

X and Y are only 16 bit, the top byte of the 24 bit source and destination address are specified as parameters of the command.

It should be noted that the current databank will be changed if the destination bank is not the current one.
We can prevent this by backing up, and restoring the bank with PHB & PLB

These are the only two commands which use this addressing form.
Here we've copied 8 bytes from "TestData" in bank $00 to $030000 in ascending order with MVN.

We then copied 8 bytes from "TestData" in bank $00 to $03000F in descending order with MVP.

Absolute Longs

We can specify the address of a parameter with a complete 24 bit long address (Absolute Long Addressing)

We can also use X as an offset (Absolute Long Indexed with X)... Here we test with an offset of X=1, and X=2
We loaded directly from $030200, we loaded from an Absolute Long indexed with X from $030200+1 and $030200+2
Absolute Indirect Long addressing can only be used with Jumps.

We specify a 16 bit address in square brackets with JML (JuMp Long)... this will load 24 bits from the specified address, and jump to that address.

In this example we've used "jml [LongTest]"... this loads the address at LongTest jumping to address $030400
Here we've loaded a test program to $030400, this loads the 16 bit 'C' accumulator with $6666
Like the 6502, These addressing modes are not available for all commands.

The best thing to do is check the Cheatsheet... or just try the commands, and see if the assembler will compile it!

Lesson 3 - New Commands
As well as new addressing modes the 65816 adds a lots of new commands.

We've covered a few already, but lets take a look at all the others, and what they do.

Lesson3.asm

Pushing and Pulling new registers

The 3 new registers of the 65816 - Data Bank (B) , Program Bank (K) and Direct Page (D) - can be manipulated via the stack
Data Bank (B) and Direct Page (D) and be pushed or pulled with PHB, PLB, PHD and PLD

We can push the Program Bank (K) with PHK, but there is no way to pull the Program Bank (K) - this is because the program bank is the top byte of the program counter, so changing it would alter the running command.


Here we pushed all these registers onto the stack


PHB/PLB are the only way of changing the DataBank Register (B)
PHK is the only way to get the value in the Program bank register (K)
the Direct page register (D) can also be chaned with TCD and TDC

Pushes Exchange Transfers

Like the 65C02 we now have Push and Pull commands for the X and Y register

PHX and PLX will push and Pull X, PHY and PLY will push and Pull Y,

We also have TXY to transfer X to Y, and TYX to transfer Y and X

The 16 bit C accumulator is made up of 8 bit A and B halves... we can swap these with XBA
Here are the results... notice we popped X and Y in the opposite order, effectively swapping X and Y
We can perform the same commands in 8 bit mode... this time only 8 bit will be pushed to and pulled from the stack
Here are the results

Transfer for 65816 registers

If we want to set the Direct page Register (D) we can do so via the C accumulator with TCD... if we want to get the value back, we can use TDC.

We can back up and restore the registers with PHD and PLD

here are the results
if we want to change the 16 bit stack pointer we can use TCS and TSC.

Items pushed will go onto the new stack

Of course there's no way to push or pop the stack pointer!
We've set a new Stack pointer, and pushed an item onto the new stack.

Push Effective Addresses

The 65816 includes commands to Push Effective Addresses onto the stack.

This can be used to store parameters onto the stack, for use in subroutines, or popped off later.

PEA will Push an Effective Address - effectively putting an immediate 16 bit value onto the stack.

PEI will Push an Effective Indirect address - this command will load a 16 bit value from the zero page, and push it onto the stack.

PER will Push an Effective Relative address - this adds an immediate value or label to the program counter
We pushed an immediate, an address via the direct page, and two relatives.

Test and Set / Test and Reset

The 65816 adds two new commands to work with bits, these simultaniously test, and set or clear bits in a destination address compared to the accumulator.

TSB (Test and Set Bits) sets the flags like an AND command, then sets the bits like an OR command.

TRB (Test and Reset Bits) sets the flags like an AND command, then sets the bits like an AND command with the bits flipped of the accumulator... Setting a 0 where the accumulator has a 1 - Like BIC on some systems.
We've used TSB to set the top nibble of direct page $0010, then used TRB to clear the next  two nibbles.

In both cases the Zero flag was not set.

New Branches

Like the 65c02, the 65816 includes a Branch Always command - BRA

BRA can only jump -128 to +127 bytes, but there is a second command Branch always Long - BRL - which can jump -32768 to +32767
We've used BRA and BRL to skip over the commands which would change C

Long Calls and Jumps

There will probably be times we need to jump to a memory location that's not in the same program bank,

For these occasions we can use JML to Jump to a Long address, and JSL to Jump to a Long Subroutine.
The accumulator was changed by jump and call to the two long addresses

Other Commands!

There are a few other commands we need to mention for completeness you probably wont need.

STP stops the processor until the CPU is reset - it's intended for power saving.

WAI will wait for an interrupt

COP is intended for use with a co-processor.

WDM was intended for future extra commands (it was never used)