Optimizing an Imaginary Sprite Part 5

<< Previous | Next >>

Version 5

From Simon’s post, "Roger Taylor and John Strong gave me both some pointers yesterday regarding the transparent sprite copy routine…..

I proceeded to correct those errors today and came up with a v2 [edit: version 5 actually]….
then my good friend Pere Serrat from the dragon scene decided he needed a bit of a break from his current project and decided to provide some input

Pere Serrat, Hugo Dufort and i have had a collab before working on my whacky PAL artifact stuff…

@pere and i have bounced this transparent sprite thing back and forth for the last 2 hours or so, providing input to each other…

so, this is what we came up with…

and it’s ALOT different than v1 or v2"

sprite5.asm

Version 3 on left, 5 on right.

Things get interesting here, as the changes are a little more subtile. Put on your mad cap! You might be surprised with the result in this version.

Lines 7–8 Simon just reversed the load order. No, he’s not crazy. Okay yes he is, but that’s not why he did it.

You will also notice he tossed out clr ,u from version 3, as it is no longer useful. It actually wasn’t useful in version 3 either. The CLR was there, I assume, to handle OR’ing the individual nibbles into the destination.

Version 3

11 [4+0]                   sta     ,u              ; update dest buffer with sprite

Version 5

14 [4+0]                   stb     ,u              ; update dest buffer

In both cases, effectively negates the usefulness of CLR as the sta/stb overwrites the entire byte of the destination.

Lines 10–14 we deal with the left nibble. In the quest to drop instructions, he managed to drop 1. By loading B with the sprite byte (line 12) after the bita test, he is able to reuse the instructions at 13 & 14.

10 [2]     seeN1           bita    #$f0            ; test left nibble
11 [5]                     beq     useB1           ; if zero use background
12 [4+1]                   ldb     -1,x            ; else get sprite byte
13 [2]     useB1           andb    #$f0            ; use only left nibble and clear right one
14 [4+0]                   stb     ,u              ; update dest buffer

But is it faster? Let’s take a look at the number of instructions that have to be run when the left nibble of A is zero, and when it is not.

Number of instructions

Version Not 0 Equals 0 Total
5 5 4 9
3 4 4 8

Hmm. Simon is actually running more instructions in the new version. Bad Simon, bad!

Since instruction count isn’t the real deciding factor, let’s take a closer look at this and include cycle counts. Let’s not include the CLR instruction from version 3 to be a little fairer.

Cycle counts

Version Not 0 Equals 0 Total
5 30 25 55
3 24 21 45

Version 3

06 [4+0]   loop1           lda     ,x              ; get sprite byte
07 [4+0]                   ldb     ,y              ; get backgroud byte
;08 [6+0]                   clr     ,u              ; clean dest buffer
09 [2]     seeN1           anda    #$f0            ; use left nibble
10 [5]                     beq     useB1           ; if zero use background
11 [4+0]                   sta     ,u              ; update dest buffer with sprite
12 [5]                     bra     seeN2           ; test right nibble
13                 
14 [2]     useB1           andb    #$f0            ; use background left nibble
15 [4+0]                   stb     ,u              ; update dest buffer

Version 5

07 [4+2]   loop1           ldb     ,y+             ; get backgroud byte
08 [4+2]                   lda     ,x+             ; get sprite byte
09                         
10 [2]     seeN1           bita    #$f0            ; test left nibble
11 [5]                     beq     useB1           ; if zero use background
12 [4+1]                   ldb     -1,x            ; else get sprite byte
13 [2]     useB1           andb    #$f0            ; use only left nibble and clear right one
14 [4+0]                   stb     ,u              ; update dest buffer

So that is 45 cycles in version 3 with the extra instruction vs 55 in version 5. Oops.

Next, Version 5p