Learn Multi
platform Z80 Assembly Programming... With
Vampires!
Chibi
Akumas Technical Documentation
Lesson
Aku16
-
Gradient Draw
We've covered solid fill areas, and Tilemap areas, but we've
overlooked arguably the most complex.. the gradient fill..
In the original game, this routine did the entire background,
but now it just does part of it!
|
|
|
|
Set Scroll direction
Depending on the scroll
direction, we may need to change the direction the gradient
moves, we use self-modifying code for this, and alter the bits
within the gradient definition.
There are 3 options, Scroll Right, Left or no movement
(inteded for vertical levels) |
 |
Background Gradient
First, lets remember what the
gradient definition looks like!
The first two bytes are the starting pair... they are used for
the first and second line...
every other line is a new line number, and an extra byte for
the gradient - this pushes out one of the old bytes, and the
new pair is used.
when a 255 is found, the gradient routine ends! |
 |
When this routine is called, HL
points to the right of the top screenline... DE points to the
gradient definitions (as shown above), B is the linecount and
C is the bitmask of the scroll rate..
We use Self-modifying code to store the scroll rate at the
correct point
next we load in the first pair for the gradient - we load them
into IXH and IXL... we also swap the source data pos
into IY
We also load in the 'next line' number - and self modify this
in so we know when we need to read in more line data.
We also back up the stack pointer - we're going to use stack
misuse to draw to the screen quickly! |
 |
We now start the main sequence,
first we check if we're at a 'new line' point.. .if we're not
then we skp to the 'rendering routine'
If we do need to do a shift, first we check if the line byte
is zero (blank area)... bitshifting zero makes no sense, so we
skip if it is...
If it's not, then we next check if the timer ticks ANDed with
the scrollrate are not zero - if it's not, then we want to do
a bitshift at this point, so we jump to the correct routine to
do this (it will be altered by self-modifying code, depending
on the scroll direction) |
 |
The drawing routine starts by
checking the B counter, to see if we need to draw any more
lines...
If B is not zero, we need to draw some more - we misuse the
stack pointer to draw quickly to the screen, so we swap HL to
the stack pointer...
Next we set D and E to IXH - the first line byte of the
gradient...
We then need a lot of PUSH DE's to quickly draw to the screen!
Note - on the CPC we keep interrupts enabled - the ChibiAkuams
interrupt handler can cope with runing during stack misuse! |
 |
While the interrupt handler can
cope with stack misuse, it won't work right unless there are a
few bytes for it to write to, so we disable interrupts before
the final push.
We need to move down a line, because we know we're on an even
line, we can simply add &8 to H - but we need to re-update
the stack pointer
The line drawing is done in pair, so once we've done the first
line, we load in the fata for the second from IXL and repeat
the procedure
|
 |
For the finish up procedure, we
need to move down a line... however this time we need to check
bit 7, to see if we've gone over the bottom of the memory
range, and correct the HL position if we have.
We now repeat the procedure again.
The finishup procedure restores the proper stack pointer,
turns interrupts on and returns. |
 |
These
routines work in PAIRS - therefore a gradient change can
only occur on EVEN lines
The routine is limited to make it as fast as possible, so
many comprimises were made!
|
 |
Shift procedure
The bitshift procedure is pretty simple, we need to pull out
the 2 bits that make up the rightmost pixel, shift the bits to
the left, and OR in the two bits - to put them in the left
hand pixel
we store the result back into the memory which makes up the
gradient definition |
 |
Lesson
Aku17
-
Tile Draw MSX
The MSX version is very different to the CPC/Speccy version as
the VDP has very different capabilities and limitations.
Note: The V9990 uses a seperate Gradient routine to the regular
MSX2 (but it's simpler than the MSX one!)
Lets take a look at it!
|
|
|
|
Just like on the CPC, the MSX2
screen is drawn in 3 sections... the Gradient,
Solid Fills,
and Tile Fills..
Unlike the CPC though, on the MSX2, the Z80 draws the Gradient
WHILE the VDP draws the fills and Tiles - We're doing 8 bit
Multithreaded programming! |
 |
Drawing the TileMap
When we call the Tilemap
drawing routine, we need to pass some parameters
HL is the Tilemap Definition (example shown right)
DE is the V9990 tilemap definiton (it can be more complex
because the V9k is faster
IX is the Gradient definition for the MSX2 - we'll look at
that later |
 |
The Background routine starts
with some special code on the V9K, which will switch in the
alternate definition - we do this here, because the level code
is common on both the V9K and MSX2
Now we need to load in the current screen position... the MSX2
screen will be at Ypos 0 or 256, depending on which
framebuffer is shown, we then replace the low byte with the
one read in from the definition, this gives us DY - our
destination to show the tile to.
Next we load in the number of lines to show, this is saved
into NY
The next one is a bit tricky, our tiles are saved as a block
in the sprite data - and we need to load in the start of that
block into this code (Via self-modified code) - we then add
the Y offset of the current strip to that block Y offset - the
result is the source Y - SY for the current strip. |
 |
You can see the Sprite data
here - in the Z80 coding bugs level, the Tiles should be at
the start, so the position will be calculated correctly
automatically
|
 |
Here's how the scrolling effect works.
The tilestrip is split in two parts by the 'scroll point'...
everything to the right is copied to the far left of the
screen... everything to the left is put on the right... then
the scroll point is increased...
The result is a tilestrip the full width of the screen, which
scrolls, and is filled with only 2 VDP commands (fewer
commands= faster) |
 |
Next we load in the
'Tickmask... the top bit will allow the scroll direction to be
reversed, the other bits are ANDed with 'TicksOccured'... this
allows the rate of the scroll to be altered
It should be noticed there is a 'slowdown' section - this is
used on the V9990, and is self modified to slow down the
effect of the scroll
The MSX2 VDP can only work in pairs, but the V9K can work in
indevidial pixels, the two NOPS will be converted to INC (HL)
or DEC (HL)s depending on the scroll direction.
We're copying from the middle of the strip to the end - so the
Width (NX) of the strip to copy is caclulated by flipping all
the bits in the X pos (SX)
the Destination DX is the far left of the screen, and we
set the top byte of the NX to Zero.
|
 |
Next we need to do a check on the number of pixels we're
going to draw... usually NX will be <255 - so a single byte
- but if the low bit is 0 - then we need to set the high byte
to 1... we do that now, buy calling the 'ByteUp' command,
which will increase the high byte by one...
On the V9K we just draw the tilemap.
On the MSX2 we run 'BusyGradient'... this checks if the VDP is
busy, and if it is, then the Z80 will draw part of the
gradient via memory access... we'll have a look at this
command later.
When the VDP is idle, we actually draw the gradient
|
 |
We now copy from the start of the tile definition up to the
scroll point, so we zero SX
We now load in NX and check if it was 256 - if it was, we've
nothing more to draw, as we've filled the screen.... if not we
use this as the destination of pixels to copy - effectively
copying to the position the last copy ended.
We do another CPL to work out how many pixels we need to draw
to fill the remaining width of the screen
Now we use the same routines as before, BusyGradient to draw
any gradient while the VDP is busy, then we actually draw the
tiles!
We now repeat untill we've drawn all the tile definitions..
Finally we have the Byteup command - it just sets NX to 256,
for the rare time we need to draw the full width in one go!
|

|
Lesson
Aku18
-
Solid fill & Gradient on the MSX
The MSX Tilemap is just part of the Tile Drawing - solid fill
handles the blank area... but the key to speed on the MSX2 is
the gradient routine
Lets see what magic it contains!
|
|
|
|
Solid Fill
The 'Solidfil' routine takes a definition table, each line
contains four bytes.
The first byte is a Y position - this is where the fill starts.
The second byte is a number of lines to be filled.
The third byte is a 'tickmask' - this is not used for fills..
but originally this function was intended to perform crude MSX2
gradients
the fourth byte is the fill color. |
 |
The first thing to note is that
the solid fill routine is misnamed... it's called
'Background_Gradient'!
This is because this routine WAS going to draw a gradient, but
couldn't do a good job, so was demoted to solid fill!
We're going to set up MyHMMV to solid fill the area...
First we load in the Ypos of the drawing screen (DY)
Next we set the starting Xpos to 0 (DX)... we set the width to
256 (NX)
Now we load in the Y offset (one byte)... this is the start
position (SX)
Now we load in the number of lines - if it's Zero, we've ended
the fill commands... otherwise we load it into the height (NY) |
 |
We now load in a 'tickmask'...
this was used when this function drew gradients - and allowed
scrolling.
Once we've done any scrolling (never happens!) we load in the
color byte into the fill command |
 |
Next we check IX... this points
to the 'true gradeint' definition - you see, the gradient is
drawn by the Z80 WHILE the VDP fills the blank areas... but if
the gradients are done, then we can use the Z80 to help fill the
blank areas with HYPER-FILL!
If we have gradients to do, then we check if the VDP is busy,
and draw gradients while it is using BusyGradient... which
returns once the VDP is no longer busy... once the gradient is
done we can do the fill...
We now repeat the procuedure, untill all the fills are done! |
 |
The solid fills are done after
the tile fills...
So once the solid fills areas are done, we see if all the
gradient is drawn - and if it isn't we draw it until it is! |
 |
Gradient fill via Z80 ram access
When The Tile/Fill routines run,
IYL is initialized to 0 - this is the line number
IX needs to be passed a table of gradient definitions...
The definition starts with a line number - this is where the
gradient is drawn...
Then follows 8 lines... each line fills 4 lines of the gradient
- MSX2 gradients are ALLWAYS 32 lines tall
Each Line totals 3 bytes..
The first two bytes are the gradient color... there is
also a 1 byte tick mask - this is when the scroll occurs. |
 |
The BusyGradient should only run
when the VDP is busy - so first we check if it is... and return
if the VDP is idle...
Next we flip into the shadow registers - these are used during
by the gradient routine - allowing us to do two 'tasks' in
paralell ( the Gradient and the Tile/Fill)
We load in HL from the last time the gradient ran, and we see if
the line number is Zero - if it is, then this is the first
run...
We load in a byte, and see if it's 255 - this would mark the end
of the definition... otherwise we have gradient to draw! |
 |
We read in the Line offset from
the definition and store it in B, we load C with Zero, next we
get the drawing screenbuffer offset (0 or 256) and load it into
A
We use these values to calculate, and set up the VDP Ram Access
so we can do the ram access...
Next we set IYL to 32 - this is the number of lines to draw...
We load HL with the two bytes of the gradient, and compare the
TickMask of the gradient with the ticks occured... if the result
is NonZero we need to scroll the gradient on this tick |
 |
We now fill a full line by OUTING
the byte pairs to the DATA port of the VDP |
 |
We need to swap the two bytes
before drawing the next line...
If we've drawn all the lines, we're done... otherwise we check
if we've drawn 4 lines, if not then we repeat
If we've drawn all the lines, then we load in the next bytepair
and repeat the procedure. |
 |
We're going to repeat the draw
line procedure, but first we need to check if the VDP is idle...
if it is then we back up HL, flip the shadow registers
and return.
We also have the "HyperdoneNext" routine which will load in the
next stage of the gradient. |
 |
We also have a bitshift routine
which rotates the 4 pixels... this moves the leftmost pixel in L
to the rightmost pixel in H |
 |
Lesson
Aku19
-
Gradient fill on the V9990
The V9990 uses a simple, more flexible gradient draw - it was
originally designed for the basic MSX2 - but it was too slow on
that platform... but the V9K is super fast, and makes easy work
of it...
Lets see how to do it!
|
|
|
|
The theory!
We start with a blank screen... our gradient procedure is:
1. Fill a strip 4 pixels wide, and 192 pixels tall with the
gradient (on the left hand side)
2. Copy it over the whole screen
3. Add the tilestrips, sprites etc. |
 |
Drawing The Gradient
The gradient is made up of 2 alternating line definitions -
of 2 bytes per line...
The Gradient definition has to be passed in HL...
The definition has a start line - which is the first two lines
- each line takes 2 bytes...
All following lines have a Line-number - a byte-pair for the
new definition (which replaces the oldest of the two being
cycled) and a shift timer -which is a bitmask used with 'ticks
occured' to decide when to scroll the background
Finally the last definition is ended by a 255 byte...
|
 |
When we call the routine, HL
will point to the gradient definition and
B will be the line count (which is always 192)
The first thing we do is INIT the destination ranges... we
need to set the 'screen start Y', which will either be 0 or
256, depending on if screen buffer 0 or 1 is shown
We now load in the two bytepairs for the first line.. we store
them into IX/IY first - but we soon move them into the shadow
registers BC/DE (which are faster)
Then we load in the next point we need to change the gradient,
and store it via self modifying code. |
 |
We need to initialize two HMM
commands... one to fill the strip on the left (HMMC) and the
other to fill the whole screen with that strip (HMMM)
Once we've set the command definitions up, we start the HMMC
command. |
 |
We send 4 bytes to the VDP data
port - remember our strip is only 2 bytes wide, so this
effectively fills 2 lines!
We now see if we need to do a gradient change - if not we jump
over the next bit...
Now we check if we need to do the Gradient scroll by ANDing in
the 'Ticks occured' counter... if the result is not Zero, then
we need to do a shift now! |
 |
When we need to load in a new
byte, we first swap DE into BC... then we load new bytes for
DE in from the source again
We also load in the new scroll tick mask, and the next line we
need to change the gradient again. |
 |
Whether any updates are done or not - we decreace B and
repeat until done! |
 |
We're going to copy a 252x192
area from the far left to the right... this results in the
copy command copying the 4 pixels repeatedly over the entire
screen..
In theory we'd just copy the 4 pixels across the screen,
but in practace this doesn't work!... the reason for this is
that the V9K is caching the bytes we just wrote...
To counter this we double the 4x192 area to 8x192... then we
copy this over the entire screen...
The result is the gradient now fills the entire screen. |
 |
Scrolling the gradient data
The background pixel shift for
the V9K is almost identical to the MSX2...
We take the rightmost pixel in the two bytes and shift it...
Then we shift the other 3 left, and put the shifted first
pixel at the far right.
Once the bytes have been shifted, we save them back to the
Gradient definition. |
 |
Lesson
Aku20
-
Level Init and Loop
When a level starts, some Initialization is required to get
things working,
Once this is done, the main level code will proceed with the
Level loop - which will continue to execute until either the
player dies, or wins the level.
Lets see how all this works!
|
|
|
|
Level Init
On the MSX systems, we need to
initialize the memory locations of the Tile data used to draw
the background - the positions these need to go will have been
set by the previous load commands - however we need to
Self-Modify these into the code.
Afterwards, we need to actually get the bitmap data...
The data is RLE compressed, and it's bytes are stored in the
level data block, so we decompress this from RAM into
VRAM |
 |
On the V9K on the MSX2 in
Chibiakumas, boss battles uses custom sprites for enemy
bullets, we need to load them in here. |
 |
For the next stage, we'll want the Player & game
parameters. |
 |
Some levels of ChibiAkumas on
the CPC flipped colors every frame to 'fake in between'
colors... if the user allowed it it would be turned on here...
On the MSX2 we would see if the V9K was present, and turn on
the alternate background if it is. |
 |
On the Spectrum we have a
problem! the 2nd set of sprites are held in the level data,
but we need to page it out , and page in bank 7 for the second
screen buffer - which means we can't get access to the
sprites!
To solve this, we copy the sprites into bank 7 where the
second buffer is. |
 |
Now we need to initialize the 'Event Stream' (the level
contents'
HL points to the Event Stream data...
DE points to a bank of ram which can be used by the event
stream for an extra 16 object definitions (128 bytes)... it's
not needed if you don't use "evtSettingsBankEXT" |
 |
Next we need to INIT arkostracker to get the music playing.
Note that on the Speccy we use Bankswapper_CallHL - this
function swaps the bank in, calls HL, and restores the level
bank on the spectrum, so everything works OK |
 |
We now initialize the screen
buffers - this sets up double buffering...
Because of the memory layout of the CPC, and the fact we need
to recalculate the memory location every time we go outside of
the 16k bank of screen ram, we need to use self modifying code
to change the conditions of the "Get Next Line" routines... HL
will be set to the correct code by the ScreenBuffer_Init
routine. |
 |
Finally, we start the interrupt handler...
The main level loop will now begin! |
 |
Level Loop
We need to draw everything in
order, so first we draw the background. |
 |
We're going to do the level objects... first we calculate
any new level events via EventStream_Process...
Next we draw all the level object with ObjectArray_Redraw
|
 |
We're going to start handling
the player...
First we run the Player_Handler, this draws the
player, and reads player controls.
Then we execute Player_StarArray_Redraw, this draws the
player bullets. |
 |
We're going to draw the enemy
bullets...
On the V9K boss battles we use animated sprites for bullets
- we need to calculate which to use - they are 6 pixels
wide, and we cycle through them for each frame of animation.
Whether we're on the MSX or not, we now need to draw the
enemy bullets with StarArray_redraw |
 |
On the V9K we will draw an extra foreground layer... this
is a tilestrip with transparency, and is handled by the same
routine that handles the background code. |
 |
We've finished the drawing of
the main level objects, so we now draw the UI (Scores/Lives
etc) |
 |
We're pretty much done...
finally we play any sound effects
Also, If debugging is enabled we'll show the current level
time onscreen, so we can work out any problems with the
event stream. |
 |
At this point we'll do any 'Level specific' operations
that need to occur before the page flip |
|
Now we do the actual page
flip... this shows the screen we just drew!
On the CPC we now need to update all the GetNextLine
commands again to take account of the change in the drawing
page's memory position.
On the Spectrum, we force an Interrupt to occur - this is
because the drawing procedure is too slow, and is likely to
have missed an interrupt - causing the music to slow down...
we have to do this on the spectrum only, because unlike
other systems, interrupts that occur when interrupts are
disabled are missed, not delayed like on other systems. |
 |
Lesson
Aku21 - RLE bitmaps with AKUSprite
When developing ChibiAkumas Episode 2, I wanted to be able
to store MORE graphics in less space - I decided I needed
RLE compression, and as I like to try to do everything
myself, I decided to write my own RLE
Compressor/Decompresser and file format!
Check out what I came up with!
|
|

|
 |
|
What is RLE?
RLE stands for Run Length
Encoding... this is a compression method where consecutive
pixels that are the same are compressed.. .by storing them as
a color and a count
RLE is LOSSLESS - the image will be a perfect recreation of
the original, but it will be slower to draw!
Lets imagine we have a bitmap.... and the pixels are
colored 122222222221111111123 ... that's 21 characters....
Now let's imagine we stored that as 1x1.2x10.1x8.2x1.3x1...
that's 19 characters
Now that's a ridiculous example, but the principle stands -
and rather than characters, we'll store our data in Nibbles
(half bytes)
|
Rle Bitmap shown on the CPC!

|
With the right data, RLE files are much smaller than raw
bitmaps, and because the compressor 'understands' image data,
they can beat compressors designed to work with byte data (an
RLE compressor can save space on 5 consecutive pixels the same
color, a byte compressor cannot)
One thing to bear in mind, the resulting RLE will be more
efficient if lots of consecutive pixels are identical, and you
can use this in your art design if you need to save space! |
A file optimized for RLE - note the grey and
black 'lines' and few single pixels
These compress better than a checkerboard
 |
How does ChibiAkumas format it's RLE images?
Data is stored in bytes, but these bytes are split into 2
nibbles - and depending on the data, the pixels will be drawn
accordingly... on the CPC, one nibble is 2x Mode1 pixels - and we
work in Pixel Pairs - this is because the chibiakumas graphics will
often use 'checkerboard' (alternating 2 pixel colors) to simulate
other colors - and the RLE compressor is designed to compress that!
Bits |
Meaning |
RRRRCCCC+ |
C
repetitions of RLE data R
if C=16 then read in the next byte and add to count... if
that's 255 read in the next byte too!!) |
CCCC0000+ BBBBBBBB.... BBBBBBBB |
C
bytes of linear (uncompressed raw) byte data (there are C x bytes
of B)
(C can be more than byte when 16/255+) |
00000000
CCCCCCCC+ BBBBBBBB |
C repetitions
of
Byte B
(all bytes are the same color... C can be more than one byte
when 255+) |
Why
so many options? COMPRESSION!
Not all data is RLE compressible - and trying to compress
it will make it bigger... so we have a 'linear'
option to resolve this...
Also writing lots of bytes of the same color is slow - so
we have a 'repeated byte' option too! |
 |
Creating an RLE!
My free, open source AkuSprite
Editor can create RLE files for the CPC, Spectrum, MSX and Sam
Coupe... the file format is basically the same, but because of
the different bit depths of the machines, the number (and
order) of the pixels compressed differers... on the Spectrum
the color information is also RLE compressed as an extra data
set.
It should be possible to write decompress ors for the other
systems, however it is not my plan to do so at this time, as I
have no plans to use RLE on other systems at this time - and
porting the decompresser is time consuming.
|
 |
Using RLE on the CPC
When we use the RLE function,
the RLE sourcecode is stored into the clipboard...
If we just add an ORG &8000 we can compile the code,
and just do a CALL &8000 to show our RLE...
Note, the color information is not stored in the RLE, so we'll
need to set the palette ourself!
Also Note, this example has a DI, HALT at the end - so we'll
need to change this depending on what we want to do! |
 |
If we want to use the CPC
firmware (for example to return to basic, we need to make some
changes!
The firmware relies on the shadow register BC being unchanged,
so we need to back that up!
By adding an EXX and PUSH/POP of BC, we can keep the firmware
happy! |
 |
 |
The
compressor was designed for the CPC, but was ported to the
other systems, on the speccy each nibble is 4 pixels - on
the MSX/SAM it's just one!
Basically the decompresser is the same though, just with
code reprogrammed for screen pixel drawing, and the
'GetNextLine' command is changed
|
Using RLE on the Sam Coupe
The RLE export for the SAM is
basically the same, we need to add some extra commands to
page in the video ram, but otherwise it's the same
like the CPC, the palette is not included . |
 |
Using RLE on the ZX Spectrum
The Spectrum RLE export is
basically the same, it will export code that you can just go
ahead and run.
you should note that the decompresser uses the IY register -
again, this is needed by the Spectrum firmware, so you'll need
to back it up if you're using the speccy firmware. |
 |
Using RLE on the MSX
The MSX version is the 'odd one out'...
Rather than saving the RLE to the clipboard, it is saved to a
file - this is because the MSX screen is very large (24k), and
the ram is very small (64k)... also the disk is relatively
slow...,
For this reason ChibiAkumas usually streams RLE from the disk
to the VDP without using ram - this ends up faster, and
doesn't use any memory (well technically it uses a 128 byte
buffer)
However, when we use the export routine in Akusprite editor, a
simple 'Loader' will be created which does allow the RLE to be
viewed from ram,
This can just be pasted in and should run OK, provided your
screen is set up correctly, and you have the 'VdpMemory.asm'
module included - which handles the VDP init commands for the
RLE draw |
 |
The
ChibiAkumas RLE format probably isn't the best - but it's
FREE...
Chibiakumas was developed for personal achievement by the
developer, and the author wanted to try to make his own RLE
compressor - being the best or fastest wasn't the mission! |
 |
Lesson
Aku22
-
ChibiAkumas RLE decompresser
While the Compressor is written in C#, The ChibiAkumas RLE
decompresser is Z80 based...
In this lesson we're going to look at the CPC version of the
decompresser and see how it works!
(the Speccy/SAM and MSX ones are fairly similar)
|
|

|
 |
|
Initialization routine
As we saw before, When the RLE decompresser is started HL
points to the start of the data in RAM, DE points to the end,B
points to the Y line to start drawing, IXH is the width
in bytes, and ITL is the X pos of the RIGHT HAND SIDE....
This is because the decompresser works Right-> Left, this
was to allow the option of using stack misuse for filling. |
 |
First we need to initialize
various parameters using self modifying code...
Next we need to calculate the starting screenpos, we use the
NextScreenLine function to do this, but it automatically adds
the image width (as it's designed to recalculate the starting
line during drawing) so we need to subtract that width for
each iteration
Once we're ready we set E to 255 - E is used during drawing
RLE data to mark which half of the byte is currently drawn -
as each RLE nibble is 2 pixels there are two iterations per
screen byte.
Once set up is done, we're ready to draw our RLE!
|
 |
The main loop
First we check what kind of
data we have ....as discussed before, the second nibble is the
RLE count... if the count is zero - then this byte is linear
data.
Otherwise we store the count for later |
 |
RLE Data
Now we load in the pixel
data... two visually consecutive pixels on the CPC are not
consecutive in RAM.
To optimize the RLE compression we store in the file according
to the visual layout, and bitshift into the correct position
for the screen now.
We also check the count... if the count is 15 (all bits of the
nibble are 1) we read in the next byte as well, and add it to
the count... if that byte is 255 then we keep reading in bytes
and adding them to the count until one isn't! |
 |
We're going to fill the bytes
of the screen with the nibble we read in,
we have two versions of the code, so that whatever the
position we need to plot our pixel pair to video ram, we can
do so.
If we've completed a line of the RLE, we recalculate the
position of the start of the next line.
We check IX to see when we've completed all the drawing for
this RLE batch - once we're done we back up the screen pos of
the current draw
Note 'ByteNibbles' this is a special routine that will fill
entire bytes with the nibble for extra speed.
|
 |
We need some commands to allow
for the starting of RLE data half way through a screen byte.
These commands handle this, flipping E (the current nibble
selection), and jumping to the middle of the loop |
 |
 |
On the
MSX each RLE nibble is just one pixel... on the Speccy
it's 4 pixels - also, on the speccy the format is simpler,
as unlike the CPC, the pixels on the speccy are
consecutive - so the code that shifts the bits is
different.
|
Nibble pairs (Full bytes)
If there is a lot of RLE data
left, it's faster to fill it in bulk, so we do that with the
ByteNibbles function
It's essentially just copies the C byte to the screen,
decreasing the count by 2 each time, until theres only 3
nibbles left, then we switch back to the regular routine |
 |
 |
The
nibble pair code is just a speedup!
You could remove it and the regular RLE code could do
the job, the drawing would just be significantly slower. |
Byte Series Data
If the 'RLE count' nibble is
zero then the top nibble is a count for a series of bytes to
be shown to the screen
This is where we could not compress the data as an RLE, so
we're storing it 'As Is'
we need to shift the top nibble to the bottom, then (as
before) if the count is 15, we load in more bytes to get the
total count.
Once we've got our count, we start reading bytes from the
source, and writing them to the screen, checking for a newline
until we've done all the bytes |
 |
Repeated Byte Data
There will be many times
where a large area is filled with a single repeating byte,
although that byte may not be RLE compressible (eg 4
different colors repeated)- we have a special routine to
handle this!
This is defined by the first byte being zero - so the count
is entirely in the following bytes.
Once we have the count ,we just copy bytes accordingly ,
checking for a new line as usual |
 |
Lesson
Aku23 - ChibiAkumas Compiled Sprite Compressor for the CPC
and Spectrum
The RLE Compressor allows us to image images small... but
sometimes we don't need things to be smaller...we need them
to be FAST!
On the CPC and Spectrum we do this with something called a
'Compiled Sprite'...
Lets learn about Compiled Sprites and how we can use them!
|
|

|
|
|
What is a 'Compiled Sprite'
A normal sprite will have
some kind of Bitmap data, and some Code... the Code reads in
the Bitmap data, and 'Draws' the sprite to the screen
A Compiled sprite is different... there is no Bitmap Data to
read in... the sole purpose of the code is to draw that one
sprite...and it's optimized to do that job as fast as
possible!
Effectively a compiled sprite is an ASM program, and
AkuSprite Editor can produce the code to show the sprite for
us! |
 |
In ChibiAkumas, Compiled sprites are used for the
background of the last level of EP1, and the 'Sakuya' battle
of EP2 - which had a pre-rendered 3D background with up to 8
frames on repeating animation...
Completely redrawing a full screen each frame during
gameplay takes a lot of CPU power, so this is a time
Compiled sprites are needed...
The Compiled sprites in ChibiAkumas do have some bitmap
data... this is because a pure compiled sprite of a complex
16k screen could easily end up as greater than 64k!
In this episode, we'll learn how to make and use compiled
sprites for the CPC and ZX spectrum... Because direct memory
access is slow on the MSX, compiled sprites are not possible
on the MSX! |
 |
RLE
is good when you need the space, and can spare the CPU
power...
Compiled Sprites are for when you need the CPU power, and
can spare the space...
In both cases, it's unlikely you're going to be able to
clip the sprite (have the sprite partially onscreen) so
they're really best for 'special cases'
|
 |
Creating Compiled sprites for the CPC
We're going to convert the same bitmap we used in the RLE
example...
We just load the picture we want to convert into AkuSprite
editor... and select the AddOne
menu option... the source code is copied into the clipboard
This will add an extra frame to the compiled sprite data
(multiple frames can be combined into one code file)...
AddOneDiff adds the difference between the new frame and the
last one - effectively a transparent layer.
Clear will remove all the compiled sprite data. |
 |
We can paste the compiled
sprite code into Winape to execute it...
We do need to make a few changes first... |
 |
Delete the incomplete ORG statement |
 |
Remove the last EI command from the
EndCode - the Firmware interrupt handler won't like it. |
 |
To keep the firmware happy,
we need to change the first Jump
to a Call,
and backup shadow BC
This will produce a program we can run from basic. |
Before:

After:
 |
We can show the sprite by
typing "Call &8000"
from basic.
The sprite will be shown - much faster than the RLE - and
basic will continue working. |
 |
Creating Compiled sprites for the ZX Spectrum
Just like on the CPC, we can use the 'AddOne' option on
the ZX Spectrum to add an extra frame to the compiled sprite |
 |
After we paste in the clipboard, we need to delete the
first ORG
Statement |
 |
We need to change the start Jump
to keep basic happy.
We need to change it to a Call,
and back up IY...
|
Before:

After:
 |
The Compiled sprite generator has made a mistake, it
didn't define MultiPushDE31...
we can fix this though! |
 |
If we make MultipushDE31
using MultiPushDE40 as a template the problem will be fixed |
 |
The compiled sprite will be
shown to screen, and basic will continue working! |
 |
 |
Akusprite
Editor also does speccy colors, so you can do a
full-screen full color background...
This was used in ChibiAkumas EP1 V1.666 during the last
boss battle for the background. |
Lesson
Aku24 - Compiled Sprite Source
We looked at how to use a compiled sprite in the last
lesson... this time we'll look at the resulting compiled
sprite source-code, and see what magic is happening!
We're going to look at a CPC compiled sprite, but the Speccy
one is 95% identical
|
|

|
|
|
Starting the draw
The start point of the Compiled sprite is pretty simple...
we use Stack Misuse to get data to the screen as fast as
possible... this is where we use PUSH commands to write
bitmap data to the screen.
We first need to back up the real stack pointer, then point
SP to the right hand side of the top line of the
destination...
Next we load IX with a pointer to the DrawOrder line list,
and execute 'JumpToNextLine' which will handle the draw |
 |
The DrawOrder is a set of pointers to sections of code...
the idea is that many images will have repeating parts, and
we can use the same code to draw multiple lines...
unfortunately this image has no such lines, so it doesn't
save space - but usually it will! |
 |
When we start drawing a line, we load in the address of
that line code, and jump to the code. |
 |
The
DrawOrder list allows us to reuse lines to save space,
however this still needs 400 bytes for a full 200 line
screen... if many of the lines are the same (eg if much of
the screen is blank) we can use a Looper... this will
repeat one or two lines a certain number of times,
allowing for a smaller DrawOrder table to fill the screen. |
 |
Fast filling with PUSH
Because PUSH is the fastest way to fill the screen we use
it for the fill..
The first few bytes are all blank, so we load DE with
&0000, and use 'Multipush' to do the fill... this is a
set of PUSH DE commands... even though we're misusing the
stack, we can use a CALL here, as we've got two bytes left
to draw, and the code is designed to work around this...
we'll look at this in a moment...
The next bytes we need to write are &0080 - we use HL..
then we want to write &7100 - we use BC...
Now we want to write $0000 again... the compiler knows DE
still contains this, so we can just PUSH DE again! |
 |
We now need to write $0040... HL still contains $0080...
so if we just change L to $40, we'll have the right value to
push...
We need some more $0000's... and DE still contains that
value...
we also need $8000... BC contains $7100, so if we change B
to $80 we'll have the value new need...
Finally we need to push 2 DE's, and then we've completed the
line |
 |
 |
The C# Sprite Compiler remembers the
current state of each register, and trys to use the best
way to produce the new value we need... this can be
setting part of the register with commands like LD
L,&xx or copying parts of other registers like LD L,B
It's just a case of finding the simplest way to get the
resulting pair to PUSH to save speed, and space! |
MultiPush
We CALLED MultipushDE5... we're going to push bitmap data to
the screen, so we need to get that return address out of
there... we pop it into HL, and then do the 5 pushes...
Once we've done, we effectively return, by Jumping to the
address in HL that we popped earlier |
 |
The NextLinePushDE2 is just a pair of DE pushes... but
execution falls into the code to calculate the start of the
next screen line, and then falls into the jump code which runs
the drawing routine for the next line |
 |
The NextLine
Command will need to be modified if you're intending to use
a second screen buffer,
You will also need to reprogram the code if you want to
reposition the sprite |
 |
Bitmap data... when Push doesn't
work
Writing 2 bytes of data with "LD HL,&xxxx... PUSH HL"
takes 4 bytes... so in the worst case scenario we're
doubling our data size... a serious problem when our screen
is 16K and our ram is 64K!
Unfortunately, we do have to use some bitmap data to stop
the program getting too large - we only do this where PUSH
commands aren't helping at all...
We can call BitmapPush to do this... the following 2 bytes
after the call are the address of the data. |
 |
The bitmap data is... well... bitmap data!
The C# Compiler remembers all previously defined data, and
if it's possible to 'reuse' some that's already defined,
then it will! |
 |
Before we jump to the BitmapPush itself, we set B to the
number of WORDS of data we want to copy...
We back up DE... it may contain useful data we'll want to
push again later.
Next we POP the return address into IY, and load in the
address of the calling function... we read in the following
2 bytes.. these are the address of the bitmap data.
Now we read in a bytepair from the bitmap data and PUSH
it... we repeat until B reaches zero, then we restore DE and
return.
|
 |
 |
We often
try to combine the last command with the next line command
with functions like " jp NextLinePushHl "
Because these commands will be used often, we can save a
byte or two with the last command in this way. |
| |
Buy my Assembly programming book on Amazon in Print or Kindle!



Available worldwide! Search 'ChibiAkumas' on your local Amazon website!
Click here for more info!
Buy my Assembly programming book on Amazon in Print or Kindle!



Available worldwide! Search 'ChibiAkumas' on your local Amazon website!
Click here for more info!
Buy my Assembly programming book on Amazon in Print or Kindle!



Available worldwide! Search 'ChibiAkumas' on your local Amazon website!
Click here for more info!
|