Previous Table of Contents Next

Listing 16.6 OPT2.ASM

 ;          Opt2         Final optimization word count
 ;          Written by   Michael Abrash
 ;          Modified by  Willem Clements
 ;                       C/ Moncayo 5,  Laurel de la Reina
 ;                       18140 La Zubia
 ;                       Granada, Spain
 ;                       Tel 34-58-890398
 ;                       Fax 34-58-224102
 parms          struc
                dw         2 dup(?)
 buffer         dw         ?
 bufferlength   dw         ?
 charflag       dw         ?
 wordcount      dw         ?
 parms          ends
                .model     small
 charstatustable label byte
                rept       2
                db         39 dup(0)
                db         1
                db         8 dup(0)
                db         10 dup(1)
                db         7 dup(0)
                db         26 dup(1)
                db         6 dup(0)
                db         26 dup(1)
                db         5 dup(0)
                public     _ScanBuffer
 _ScanBuffer    proc       near
                push       bp
                mov        bp,sp
                push       si
                push       di
                mov        si,[bp+buffer]
                mov        bx,[bp+charflag]
                mov        al,[bx]
                mov        cx,[bp+bufferlength]
                mov        bx,offset charstatustable
                xor        di,di      ; set wordcount to zero
                shr        cx,1       ; change count to wordcount
                jc         oddentry   ; odd number of bytes to process
                cmp        al,01h     ; check if last one is char
                jne        scanloop4  ; if not so, search for char
                jmp        scanloop1  ; if so, search for zero
 oddentry:      xchg       al,ah      ; last one in ah
                lodsb                 ; get first byte
                inc        cx
                cmp        ah,01h     ; check if last one was char
                jne        scanloop5  ; if not so, search for char
                jmp        scanloop2  ; if so, search for zero
 ;              locate the end of a word
 scanloop1:     lodsw                  ; get two chars
                xlat                   ; translate first
                xchg       al,ah       ; first in ah
 scanloop2:     xlat                   ; translate second
                dec        cx          ; count down
                jz         done1       ; no more bytes left
                cmp        ax,0101h    ; check if two chars
                je         scanloop1   ; go for next two bytes
                inc        di          ; increase wordcount
                cmp        al,01h      ; check if new word started
                je         scanloop1   ; locate end of word
 ;              locate the begin of a word
 scanloop4:     lodsw                     ; get two chars
                xlat                      ; translate first
                xchg       al,ah          ; first in ah
 scanloop5:     xlat                      ; translate second
                dec        cx             ; count down
                jz         done2          ; no more bytes left
                cmp        ax,0           ; check if word started
                je         scanloop4      ; if not, locate begin
                cmp        al,01h         ; check one-letter word
                je         scanloop1      ; if not, locate end of word
                inc        di             ; increase wordcount
                jmp        scanloop4      ; locate begin of next word
 done1:         cmp        ax,0101h       ; check if end-of-word
                je         done           ; if not, we have finished
                inc        di             ; increase wordcount
                jmp        done
 done2:         cmp        ax,0100h       ; check for one-letter word
                jne        done           ; if not, we have finished
                inc        di             ; increase wordcount
 done:          mov        si,[bp+charflag]
                mov        [si],al
                mov        bx,[bp+wordcount]
                mov        ax,[bx]
                mov        dx,[bx+2]
                add        di,ax
                adc        dx,0
                mov        [bx],di
                mov        [bx+2],dx
                pop        di
                pop        si
                pop        bp
 _ScanBuffer    endp

Level 2: A New Perspective

The second level of optimization is one of breaking out of the mode of thinking established by my original code. Some entrants clearly did exactly that. They stepped back, thought about what the code actually needed to do, rather than just improving how it already worked, and implemented code that sprang from that new perspective.

You can see one example of this in Listing 16.6, where Willem uses CMP AX,0101H to check two bytes at once. While you might think of this as nothing more than a doubling up of tests, it’s a little more than that, especially when taken together with the use of two loops. This is a break with the serial nature of the C code, a recognition that word counting is really nothing more than a state machine that transitions from the “in word” state to the “not in word” state and back, counting a word on one but not both of those transitions. Willem says, in effect, “We’re in a word; if the next two bytes are non-separators, then we’re still in a word, else we’re not in a word, so count and change to the appropriate state.” That’s really quite different from saying, as I originally did, “If the last byte was a non-separator, then if the current byte is a separator, then count a word.” Willem has moved away from the all-in-one approach, splitting the code up into state-specific chunks that are more efficient because each does only the work required in a particular state.

Another example of coming at the code from a new perspective is counting a word as soon as a non-separator follows a separator (at the start of the word), rather than waiting for a separator following a non-separator (at the end of the word). My friend Dan Illowsky describes the thought process leading to this approach thusly:

“I try to code as closely as possible to the real world nature of those things my program models. It seems somehow wrong to me to count the end of a word as you do when you look for a transition from a word to a non-word. A word is not a transition, it is the presence of a group of characters. Thought of this way, the code would have counted the word when it first detected the group. Had you done this, your main program would not have needed to look for the possible last transition or deal with the semantics of the value in CharValue.”

John Richardson, of New York, contributed a good example of the benefits of a different perspective (in this case, a hardware perspective). John eliminated all branches used for detecting word edges; the inner loop of his code is shown in Listing 16.7. As John explains it:

“My next shot was to get rid of all the branches in the loop. To do that, I reached back to my college hardware courses. I noticed that we were really looking at an edge triggered device we want to count each time the I’m a character state goes from one to zero. Remembering that XOR on two single-bit values will always return whether the bits are different or the same, I implemented a transition counter. The counter triggers every time a word begins or ends.”

Previous Table of Contents Next

Graphics Programming Black Book © 2001 Michael Abrash