Previous Table of Contents Next


Listing 16.2 is Listing 16.1 modified to call a function that scans each block for words, and Listing 16.3 contains an assembly function that counts words. Used together, Listings 16.2 and 16.3 are just about twice as fast as Listing 16.1, a good return for a little assembly language. Listing 16.3 is a pretty straightforward translation from C to assembly; the new code makes good use of registers, but the key code—determining whether each byte is a character or not—is still done with the same multiple-sequential-tests approach used by the code that the C compiler generates.

LISTING 16.2 L16-2.C

 /* Word-counting program incorporating assembly language. Tested
    with Borland C++ in C compilation mode & the small model. */
 
 #include <stdio.h>
 #include <fcntl.h>
 #include <sys\stat.h>
 #include <stdlib.h>
 #include <io.h>
 
 #define BUFFER_SIZE  0x8000   /* largest chunk of file worked
                                  with at any one time */
 int main(int, char **);
 void ScanBuffer(char *, unsigned int, char *, unsigned long *);
 
 int main(int argc, char **argv) {
    int Handle;
    unsigned int BlockSize;
    long FileSize;
    unsigned long WordCount = 0;
    char *Buffer, CharFlag = 0;
 
    if (argc != 2) {
       printf(“usage: wc <filename>\n”);
       exit(1);
    }
 
    if ((Buffer = malloc(BUFFER_SIZE)) == NULL) {
       printf(“Can’t allocate adequate memory\n”);
       exit(1);
    }
 
    if ((Handle = open(argv[1], O_RDONLY | O_BINARY)) == -1) {
       printf(“Can’t open file %s\n”, argv[1]);
       exit(1);
    }
 
    if ((FileSize = filelength(Handle)) == -1) {
       printf(“Error sizing file %s\n”, argv[1]);
       exit(1);
    }
 
    CharFlag = 0;
    while (FileSize > 0) {
       FileSize -= (BlockSize = min(FileSize, BUFFER_SIZE));
       if (read(Handle, Buffer, BlockSize) == -1) {
          printf(“Error reading file %s\n”, argv[1]);
          exit(1);
       }
       ScanBuffer(Buffer, BlockSize, &CharFlag, &WordCount);
    }
 
    /* Catch the last word, if any */
    if (CharFlag) {
       WordCount++;
    }
    printf(“\nTotal words in file: %lu\n”, WordCount);
    return(0);
 }

LISTING 16.3 L16-3.ASM

 ; Assembly subroutine for Listing 16.2. Scans through Buffer, of
 ; length BufferLength, counting words and updating WordCount as
 ; appropriate. BufferLength must be > 0. *CharFlag and *WordCount
 ; should equal 0 on the first call. Tested with TASM.
 ; C near-callable as:
 ; void ScanBuffer(char *Buffer, unsigned int BufferLength,
 ; char *CharFlag, unsigned long *WordCount);
 
 parms   struc
         dw      2 dup(?)        ;pushed return address & BP
 Buffer  dw      ?               ;buffer to scan
 BufferLength dw ?               ;length of buffer to scan
 CharFlag dw     ?               ;pointer to flag for state of last
                                 ; char processed on entry (0 on
                                 ; initial call). Updated on exit
 WordCount dw    ?               ;pointer to 32-bit count of words
                                 ; found (0 on initial call)
 parms   ends
 
         .model  small
         .code
         public  _ScanBuffer
 _ScanBuffer     proc    near
         push    bp              ;preserve caller’s stack frame
         mov     bp,sp           ;set up local stack frame
         push    si              ;preserve caller’s register vars
         push    di
 
         mov     si,[bp+Buffer]  ;point to buffer to scan
         mov     bx,[bp+WordCount]
         mov     cx,[bx]         ;get current 32-bit word count
         mov     dx,[bx+2]
         mov     bx,[bp+CharFlag]
         mov     bl,[bx]            ;get current CharFlag
         mov     di,[bp+BufferLength];get # of bytes to scan
 ScanLoop:
         mov     bh,bl           ;PredCharFlag = CharFlag;
         lodsb                   ;Ch = *BufferPtr++ & 0x7F;
         and     al,7fh          ;strip high bit for word processors
                                 ; that set it as an internal flag
         mov     bl,1            ;assume this is a char; CharFlag = 1;
         cmp     al,‘a’          ;it is a char if between a and z
         jb      CheckAZ
         cmp     al,‘z’
         jna     IsAChar
 CheckAZ:
         cmp     al,‘A’          ;it is a char if between A and Z
         jb      Check09
         cmp     al,‘Z’
         jna     IsAChar
 Check09:
         cmp     al,‘0’          ;it is a char if between 0 and 9
         jb      CheckApostrophe
         cmp     al,‘9’
         jna     IsAChar
 CheckApostrophe:
         cmp      al,27h           ;it is a char if an apostrophe 
         jz      IsAChar
         sub     bl,bl           ;not a char; CharFlag = 0;
         and     bh,bh
         jz      ScanLoopBottom  ;if ((!CharFlag) && PredCharFlag) {
         add     cx,1            ;    (WordCount)++;
         adc     dx,0            ;}
 IsAChar:
 ScanLoopBottom:
         dec     di              ;} while (—BufferLength);
         jnz     ScanLoop
 
         mov     si,[bp+CharFlag]
         mov     [si],bl         ;set new CharFlag
         mov     bx,[bp+WordCount]
         mov     [bx],cx         ;set new word count
         mov     [bx+2],dx
 
         pop     di              ;restore caller’s register vars
         pop     si
         pop     bp              ;restore caller’s stack frame
         ret
 _ScanBuffer     endp
         end

Which Way to Go from Here?

We could rearrange the tests in light of the nature of the data being scanned; for example, we could perform the tests more efficiently by taking advantage of the knowledge that if a byte is less than ‘0,’ it’s either an apostrophe or not a character at all. However, that sort of fine-tuning is typically good for speedups of only 10 to 20 percent, and I’ve intentionally refrained from implementing this in Listing 16.3 to avoid pointing you down the wrong path; what we need is a different tack altogether. Ponder this. What we really want to know is nothing more than whether a byte is a character, not what sort of character it is. For each byte value, we want a yes/no status, and nothing else—and that description practically begs for a lookup table. Listing 16.4 uses a lookup table approach to boost performance another 50 percent, to three times the performance of the original C code. On a 20 MHz 386, this represents a change from 4.6 to 1.6 seconds, which could be significant—who likes to wait? On an 8088, the improvement in word-counting a large file could easily be 10 or 20 seconds, which is definitely significant.


Previous Table of Contents Next

Graphics Programming Black Book © 2001 Michael Abrash