Previous Table of Contents Next


LISTING 58.4 L58-4.ASM

; Inner loop to draw a single texture-mapped vertical column,
; rather than a horizontal scanline. Maxed-out 32-bit version.
;
; At this point:
;       EAX = sum of integral X & Y source pointer advances
;       ECX = source pointer increment to advance one in Y
;       EDX = fractional source texture Y coordinate in lower
;             15 bits of DX, fractional source texture X coord
;             in high word of EDX, bit 15 set to 0
;       ESI = initial source texture pointer
;       EDI = initial destination pointer
;       EBP = fractional Y advance in lower 15 bits of BP,
;             fractional X advance in high word of EBP, bit
;             15 set to 0

SCANOFFSET=0

     REPT LOOP_UNROLL

     mov   bl,[esi]                    ;get image pixel
     add   edx,ebp                     ;advance frac Y in DX,
                                       ; frac X in high word of EDX
     adc   esi,eax                     ;advance source pointer by integral
                                       ; X & Y amount, also accounting for
                                       ; carry from X fractional addition
     mov   [edi+SCANOFFSET],bl         ;set screen pixel
                                       ; (located here to avoid 486
                                       ; AGI from previous byte op)
     test  dh,80h                      ;carry from Y fractional addition?
     jz    short @F                    ;no
     add   esi,ecx                     ;yes, advance Y by one
                                       ; (produces Pentium AGI for MOV BL,[ESI])
     and   dh,not 80h                  ;reset the Y fractional carry bit
@@:

SCANOFFSET = SCANOFFSET + SCANWIDTH

     ENDM

And there you have it: A five to 10-times speedup of a decent assembly language texture mapper. All it took was some help from my friends, a good, stiff jolt of right-brain thinking, and some solid left-brain polishing—plus the knowledge that such a speedup was possible. Treat every optimization task as if John Miles has just written to inform you that he’s made it faster than your wildest dreams, and you’ll be amazed at what you can do!

Texture Mapping Notes

Listing 58.3 contains no 486 pipeline stalls; it has Pentium stalls, but not much can be done for them because of the size prefix on ADD EDX,ECX, which takes 1 cycle to go through the U-pipe, and shuts down the V-pipe for that cycle. Listing 58.4, on the other hand, has been rearranged to eliminate all Pentium stalls save one. When the Y coordinate fractional part carries and ESI advances, the code executes as follows:


ADD ESI,ECX     ;cycle 1 U-pipe
AND DH,NOT 80H  ;cycle 1 V-pipe
                ;cycle 2 idle AGI on ESI
MOV BL,[ESI]    ;cycle 3 U-pipe
ADD EDX,EBP     ;cycle 3 V-pipe

However, I don’t see any way to eliminate this last AGI, which happens about half the time; even with it, the Pentium execution time for Listing 58.4 is 5.5 cycles. That’s 61 nanoseconds—a highly respectable 16 million texture-mapped pixels per second—on a 90 MHz Pentium.

The type of texture mapping discussed in both this and earlier chapters doesn’t do perspective correction when mapping textures. Why that is and how to handle perspective correction is a topic for a whole separate book, but be aware that the textures on some large polygons (not the polygon edges themselves) drawn with the code in this chapter will appear to be unnaturally bowed, although small polygons should look fine.

Finally, we never did get rid of the last jump in the texture mapper, yet John Miles claimed no jumps at all. How did he do it? I’m not sure, but I’d guess that he used a two-entry look-up table, based on the Y carry, to decide how much to advance the source pointer in Y. However, I couldn’t come up with any implementation of this approach that didn’t take 0.5 to 1 cycle more than the test-and-jump approach, so either I didn’t come up with an adequately efficient implementation of the table, John saved a cycle somewhere else, or perhaps John implemented his code in a 32-bit segment, but used the less-efficient table in his fervor to get rid of the final jump. The knowledge that I apparently came up with a different solution than John highlights that the technical aspects of John’s implementation were, in truth, totally irrelevant to my optimization efforts; the only actual effect John’s code had on me was to make me believe a texture mapper could run that fast.

Believe it! And while you’re at it, give both halves of your brain equal time—and watch out for aliens in short skirts, 60’s bouffant hairdos, and an undue interest in either half.


Previous Table of Contents Next

Graphics Programming Black Book © 2001 Michael Abrash