Optimizing an Imaginary Sprite

There was an interesting thread on Facebook this week that was talking about optimization. They took an imaginary sprite routine and optimized the crap out of it. In this article I’m going walk through the various stages of optimization and provide my own analysis of why the changes were made, and why (or if) the optimization was any good.

I’m going to assume you know how, or are familiar with some arcane programming language, and are either somewhat familiar with assembly or enjoy punishing your brain.

You might be surprised with the results of version 5 and 5p. Sometimes less instructions doesn’t always mean faster.

Part 1 – The problem, EDTASM
Part 2 – Listing 1 – Unoptimized
Part 3 – Version 2 – A little fix
Part 4 – Version 3
Part 5 – Version 5
Part 6 – Version 5p – Bigger and Faster
Part 7 – Version 6c – Mind. Blown.
Part 8 – Longer and faster
Part 9 – Resources

Just a quick note, this is a rather technical post. I will list who made the change, show the diffs between the previous and current version, then provide my analysis about the change. Writing this article was educational and a lot of fun, as it really forced me to examine the code closely, and really work through it to see what it was doing.

The Problem

Originally this was started as a sprite routine for the Coco3’s 320×192 16 color mode that Paul T is using for his Caveman game. When Paul asked Simon for help with a routine, Simon jumped at the chance to spin some code. I should point the routine he created is only a proof of concept, in that it only reads and writes some test data. It would need considerably more code to make it into a sprite routine. But as it is, it is a perfect assembly programming exercise!

The routine needs to read from an imaginary sprite, and combine that with an imaginary background, producing the result in a mixed buffer. To make things a little more complicated, since we are using 16 color mode, each byte of memory stores 2 pixels.

The table below shows the truth table. Remember that each byte that is processed, contains 2 pixels, and each has to be dealt with separately.

sprite backgnd mixbuf
00 DD DD
A0 DD AD
0A AD AA
A4 BD A4

Compile and Run With EDTASM

My tool chain consisted of lwasm, XRoar and EDTASM. You can find the manual for Disk EDTASM at the TRS–80 Color Computer Archive. Compile the source with lwasm from [LW Tool Chain][lwasm]:

lwasm -9 -b -o spritetester.bin spritetester.asm 

Or use this to see the CPU cycle counts. Ensure you are using version 4.12 of lwasm:

lwasm -9 -b -l -p cd -o spritetester.bin spritetester.asm 

Then load the binary into your favorite emulator. Simon and I used XRoar and EDTASM for this exercise. I found it was crazy easy to load a program, and step through a program using XRoar and EDTASM. First load EDTASM with File > Load and point to the rom file, probably EDTASM–1982–26–3250-U-.rom. Once loaded you might need to do a soft reset if it crashes. If so, do the reset then type EXEC &HC000 to start EDTASM.

To get into the debugger, type Z<Enter>.

From there you can File > Load and point to spritetester.bin.

To list the source type 3F00/ then use the down arrow to see the next line. Hit enter when you are done.

To start and step through the code type 3F00, that is the address followed by a comma. Keep using the comma to step over the next instruction. Hit to stop listing.

To see registers type R.

To view memory go into byte mode by typing B. Then like listing the source, 4000/ and use the down arrow. Hit to stop listing. To get back to seeing mnemonics (the source) use M.

To run the program, you can type G3F00. Note that EDTASM won’t return you to the prompt unless the last instruction is SWI. The original listings use RTS so be sure to change it if you are typing in the source. The source files are at the end of this article.

Once you have stepped through the code, or ran it, you can inspect the results at address 4200 by going into byte mode, and examining memory.

Next, Listing 1 – Unoptimized