Previous Table of Contents Next


Chapter 16
There Ain’t No Such Thing as the Fastest Code

Lessons Learned in the Pursuit of the Ultimate Word Counter

I remember reading an overview of C++ development tools for Windows in a past issue of PC Week. In the lower left corner was the familiar box listing the 10 leading concerns of corporate buyers when it comes to C++. Boiled down, the list looked like this, in order of descending importance to buyers:

1.  Debugging
2.  Documentation
3.  Windows development tools
4.  High-level Windows support
5.  Class library
6.  Development cycle efficiency
7.  Object-oriented development aids
8.  Programming management aids
9.  Online help
10.  Windows development cycle automation

Is something missing here? You bet your maximum gluteus something’s missing—nowhere on that list is there so much as one word about how fast the compiled code runs! I’m not saying that performance is everything, but optimization isn’t even down there at number 10, below online help! Ye gods and little fishes! We are talking here about people who would take a bus from LA to New York instead of a plane because it had a cleaner bathroom; who would choose a painting from a Holiday Inn over a Matisse because it had a fancier frame; who would buy a Yugo instead of—well, hell, anything—because it had a nice owner’s manual and particularly attractive keys. We are talking about people who are focusing on means, and have forgotten about ends. We are talking about people with no programming souls.

Counting Words in a Hurry

What are we to make of this? At the very least, we can safely guess that very few corporate buyers ever enter optimization contests. Most of my readers do, however; in fact, far more than I thought ever would, but that gladdens me to no end. I issued my first optimization challenge in a “Pushing the Envelope” column in PC TECHNIQUES back in 1991, and was deluged by respondents who, one might also gather, do not live by PC Week.

That initial challenge was sparked by a column David Gerrold wrote (also in PC TECHNIQUES ) concerning the matter of counting the number of words in a document; David turned up some pretty interesting optimization issues along the way. David did all his coding in Pascal, pointing out that while an assembly language version would probably be faster, his Pascal utility worked properly and was fast enough for him.

It wasn’t, however, fast enough for me. The logical starting place for speeding up word counting would be David’s original Pascal code, but I’m much more comfortable with C, so Listing 16.1 is a loose approximation of David’s word count program, translated to C. I left out a few details, such as handling comment blocks, partly because I don’t use such blocks myself, and partly so we can focus on optimizing the core word-counting code. As Table 16.1 indicates, Listing 16.1 counts the words in a 104,448-word file in 4.6 seconds. The file was stored on a RAM disk, and Listing 16.1 was compiled with Borland C++ with all optimization enabled. A RAM disk was used partly because it returns consistent times—no seek times, rotational latency, or cache to muddy the waters—and partly to highlight word-counting speed rather than disk access speed.


Listing Time to Count Words

16.1 (C) 4.6 seconds
16.2 & 16.3 (C+ASM) 2.4 seconds
16.2 & 16.4 (C+ASM w/lookup) 1.6 seconds
These are the times taken to search a file containing 104,448 words, timed from a RAM disk on a 20 MHz 386.
Table 16.1 Word count timings.

LISTING 16.1 L16-1.C

 /* Word-counting program. Tested with Borland C++ in C
    compilation mode and the small model. */
 
 #include <stdio.h>
 #include <fcntl.h>
 #include <sys\stat.h>
 #include <stdlib.h>
 #include <io.h>
 
 #define  B UFFER_SIZE  0x8000   /* largest chunk of file worked 
                                  with at any one time */
 int main(int, char **);
 
 int main(int argc, char **argv) {
    int Handle;
    unsigned int BlockSize;
    long FileSize;
    unsigned long WordCount = 0;
    char *Buffer, CharFlag = 0, PredCharFlag, *BufferPtr, Ch;
 
    if (argc != 2) {
       printf(“usage: wc <filename>\n”);
       exit(1);
    }
 
    if ((Buffer = malloc(BUFFER_SIZE)) == NULL) {
       printf(“Can’t allocate adequate memory\n”);
       exit(1);
    }
 
    if ((Handle = open(argv[1], O_RDONLY | O_BINARY)) == -1) {
       printf(“Can’t open file %s\n”, argv[1]);
       exit(1);
    }
 
    if ((FileSize = filelength(Handle)) == -1) {
       printf(“Error sizing file %s\n”, argv[1]);
       exit(1);
    }
 
    /* Process the file in chunks */
    while (FileSize > 0) {
       /* Get the next chunk */
       FileSize -= (BlockSize = min(FileSize, BUFFER_SIZE));
       if (read(Handle, Buffer, BlockSize) == -1) {
          printf(“Error reading file %s\n”, argv[1]);
          exit(1);
       }
       /* Count words in the chunk */
       BufferPtr = Buffer;
       do {
          PredCharFlag = CharFlag;
          Ch = *BufferPtr++ & 0x7F; /* strip high bit, which some
                                       word processors set as an
                                       internal flag */
          CharFlag = ((Ch >= ‘a’) && (Ch <= ‘z’)) ||
                     ((Ch >= ‘A’) && (Ch <= ‘Z’)) ||
                     ((Ch >= ‘0’) && (Ch <= ‘9’)) ||
                     (Ch == ‘\’’);
          if ((!CharFlag) && PredCharFlag) {
             WordCo u nt++; 
          }
       } while (—BlockSize);
    }
 
    /*  Catch the last word, if any */ 
    if (CharFlag) {
       WordCount++;
    }
    printf(“\nTotal words in file: %lu\n”, WordCount);
    return(0);
 }
 


Previous Table of Contents Next

Graphics Programming Black Book © 2001 Michael Abrash