65816 Assembly programming for the SNES and Super Famicom

The Super Nintendo (AKA Super Famicom)  was the sucessor to the NES... relatively late to the 16 bit game, the SNES made the unusual choice to not use the 68000 CPU for its processor, favoring the slower and less efficient 65186... this is roumoured to be because during development backwards compatibility with the NES was planned...

The 65186 is a true 16 bit CPU, but it has a 6502 compatibility mode, in which it works exactly like a 6502, and we can perform basic SNES programming without even using a 16 bit command!
Cpu 3.58 mhz 65816 
Ram 128k
Vram 64k
Tiles 1024
Sprites 128 onscreen (32 per line)
Resolution 512x448
Colors 256 (16 per tile) from 32768 
Sound chip 64k DSP 8 channel  SPC700

Memory Map

The 65816 has a 16 bit address bus, it generally works with a 16 bit address, and a 8 bit Data Bank register (DB)

The ZeroPage and Stack pointer are relocatable, but default to the 6502 normal of $0000-$01FF

Our minimal 32k cartridge will appear in rom from $8000, and execution will start there, the CPU will default in 8 bit mode.
Bank 16 bit address Purpose
$00-$3F $0000-$1FFF RAM
$2000-$5FFF Hardware Regs
$6000-$7FFF Expand (???)
$8000-$FFFF Cartridge Rom
$70 $0000-$7FFF Battery Backed Up Ram
$7E $0000-$1FFF Scratchpad RAM (same as bank $00 to $3F)
$2000-$FFFF RAM
$7F $0000-$FFFF RAM


VRAM addresses are reconfigurable, here's a sample map... note the addresses are in WORDS... so the 64k memory is accessed by addresses $0000-$7FFF

 Address  Use
$0000  BG1 Tilemap
$1000  Tile Patterns
$4000  Sprite Patterns
$7FFF  Last byte of ram 

Cartridge Header

Address Bytes Category Purpose Example
$FFC0 21 Rom Cartridge title (Space Padded) Test Rom9012345678901
$FFD6 1 Rom ROM/RAM information on cart. $00
$FFD7 1 Rom ROM size. $01
$FFD8 1 Rom RAM size. $00
$FFD9 1 Rom Developer ID code. $00
$FFDB 1 Rom Version number. $00
$FFDC 2 Rom Checksum complement. $????
$FFDE 2 Rom Checksum. $????
$FFE0 2 65816 Mode
$FFE2 2 65817 Mode
$FFE4 2 65818 Mode COP Vector $0000
$FFE6 2 65819 Mode Brk Vector $0000
$FFE8 2 65820 Mode Abort Vector (Unused) $0000
$FFEA 2 65821 Mode NMI Vector (V-blank) $0000
$FFEC 2 65822 Mode Reset Vector (Unused) $0000
$FFEE 2 65823 Mode IRQ Vector (H/V/External) $0000
$FFF0 2 6502 Mode
$FFF2 2 6503 Mode
$FFF4 2 6504 Mode COP Vector $0000
$FFF6 2 6505 Mode BRK Vector (unused) $0000
$FFF8 2 6506 Mode Abort Vector (Unused) $0000
$FFFA 2 6507 Mode NMI Vector (V-blank) $0000
$FFFC 2 6508 Mode )Reset Vector (6502 Mode) $8000
$FFFE 2 6509 Mode IRQ/BRK Vector $0000

Sound on the SNES via the SPC700
To make sound on the SNES we have to use it:s SPC700 processor! what's the SPC700? well it's an 8 bit dedicated sound CPU with 64k of isolated memory - meaning that it can't be directly accessed from the main CPU!
The SPC700 is a great sound processor, used by the Super Nintendo and many other systems like... er.. the super nintendo!
Ok, so nothing else uses it, and it's a total pain!  while it's sound ability is good, its a real hassle to program... it uses it's own special instruction set, and its bytecode matches no other CPU, and data has to be transferred to it using a special procedure because we can't access it's memory directly.

The CPU is 8 bit and Little endian, so $1234 is stored in ram as $34,$12... it's registers are the same as the 6502, but it has some 16 bit commands that use YA as a 16-bit pair like the Z80

 it has a ZeroPage, but the ZeroPage is referred to as the DirectPage, as it can be at $00xx or $01xx... $01xx is also used as the Stack

Most data transfer commands are done with MOVe commands (like the 68000), but like the z80, the destination is on the left... for example:
mov a, #$00     ;Set A to $00

I'm not planning to go into greater detail than I need on this CPU, so please see the tutorials below:
Best SP700 reference
Also a good tutorial

Sound samples must also be held in SPC700 ram...

Pointers to Sound samples are stored in a single block..  this block must be byte alined, and has 256 sound definitions - each of which contains 2 pointers... one for the start of the sample, and one for the loop

For example, take the example definition, this will be loaded into SPC700 ram at $300... we would tell the sound chip to use $03 as the memory position using the "DIR" sound register $5D

The sound samples themselves are made up of 9 byte chunks, the first byte is a header, and the other 16 nibbles are the sound data... the final sample in the sound should have the End bit set in the header, and the Loop bit if you wish
    align 8
SFXBank:                                    ;We're going to load this into $0300
    dw SFXBank_Sound1-SFXBank+$300            ;Sample 0 main
    dw SFXBank_Sound1-SFXBank+$300            ;Sample 0 Loop
    align 4
    ;     SSSSFFLE S= bitshift (0-12) FF=Filter L=Loop E=End
    db  %11000111,$FF,$F0,$F0,$F0,$F0,$F0,$F0,$F0
    ;              01  23  45  67  89  AB  CD  EF

SPC700 and VASM

Vasm doesn't support the SPC700, but we can simulate the commands with macros, for example, we can set an immediate value with the macro to the right         macro s_mov_a_ii,aval        ;Set A=immidiate value
            db $E8
            db \aval
If we want to do a relative jump, we can calculate the relative offset with Destination-(*+1) s_bne_r SoundCallPause-(*+1)
If we want to include our SPC700 code in our main rom, we'll need to adjust call addresses for the changing location, to do this we can use the formula [DestinationLabel]- [StartOfProgramInMainRom]+ [DestinationOfProgramInSPC700Ram] s_call_addr SoundCallResume-SoundProgram+SndPrgMemLoc   

Tile Definitions
Tile definitions use 4 bitplanes for 16 colors, and tile definitions are 8x8 - so 32 bytes ... Data is transfered in Words, and rather strangely we send bitplane 1+2 of lines, one at a time... then we do the same for bitplanes 3 and 4
Byte 1 Byte 2
First 16 bytes 00111100
Second 16 bytes 00333300

Palette Definitions 
Palettes are defined by something called 'CGRAM'
there are 256 palette entries...
Because we can only select 8 palettes for Tiles/Sprites, assuming our sprites and tiles are 16 colors the first 128 colors will be used by Tiles, and the second 128 will be used by Sprites

To define a palette entry we need to select a palette entry number by writing it's number as a byte to $2121

We have to write two bytes to $2122... we write them in REVERSE order, so the low byte is written first!

H byte (sent second) L byte (sent first)
 F  E  D  C  B  A  9  8  7  6  5  4  3  2  1  0
- B B B B B G G G G G R R R R R

Sprite Definitions - Overview
Sprites use as pecial bank of 512 bytes of 'OAM' memory for their definitions... they also use standard VRAM for the pattern data.

In theory the Pattern data can be relocated... but in practice it's best to just assume it's at &2000 (address in 16 bit words)

Sprites can be various sizes - a 'default size' is set for all sprites... and certain selected sprites can be double size...

this is, however a bit tricky... lets say you have the default size as 8x8... and one double size 16,16 sprite

If we point this sprite 'double size' 16x16 sprite to pattern  'Tile 0',  the 4 8x8 chunks will be made up of tile numbers:
1 2
16 17

Lets look at this example sprite in AkuSprite Editor

If we want to export this quickly, so we can use it as a single doublesize sprite, one option is to tick the 'FixedSize' tickbox, and set the size to 128,16

This will export the sprite correctly - of course there will be a lot of unused space in the exported file!
Sprite Definitions - Ports Used
Address Name Purpose Bits Details
$2101 OBSEL OAM size (Sprite) sssnnbbb sss^size nn=name addr bb=base addr
$2102 OAMADDL/L OAM address aaaaaaaa a=oam address
$2102 OAMADDL/H OAM address r000000m m=oam address MSB
$2104 OAMDATA OAM data ????????
$212C TM Main screen designation ---S4321 S=sprites 4-1=enable Bgx
$2138 OAMDATAREAD Read data from OAM

Sprite Definitions - OAM Data
Selecting a HL address is done by setting registers $2102 (L) and $2103 (H)
Each address below $0100 holds Two Bytes (The first table)...each address $0100 or above holds just one!... All data is written via the $2104 
Note, Sprites use Palettes from 128... so the color pallete used is the value in CCC +128
Sprite data should only be written to Vram during Vsync.
Address Byte 1 Byte 2 Meaning SprNum
$0001 XXXXXXXX YYYYYYYY X=Xpos (bits 0-7) Y=Ypos 0
$0002 YXPPPCCCT TTTTTTTT Y=yflip X=xflip P=priority compared to BG (C=palette +128) 0
$0003 XXXXXXXX YYYYYYYY X=Xpos (bits 0-7) Y=Ypos 1
$0004 YXPPPCCCT TTTTTTTT Y=yflip X=xflip P=priority compared to BG (C=palette +128) 1
$00FE XXXXXXXX YYYYYYYY X=Xpos (bits 0-7) Y=Ypos 127
$00FF YXPPPCCCT TTTTTTTT Y=yflip X=xflip P=priority compared to BG (C=palette +128) 127
$0100 SXSXSXSX (no 2nd byte) S=doubleSize sprite X=Xpos (bit 8) 0-3
$0101 SXSXSXSX (no 2nd byte) S=doubleSize sprite X=Xpos (bit 8) 4-6

$011F SXSXSXSX (no 2nd byte) S=doubleSize sprite X=Xpos (bit 8) 124-127