Previous Table of Contents Next

Rules for Building High-Performance Code

We’ve got the following rules for creating high-performance software:

  Know where you’re going (understand the objective of the software).
  Make a big map (have an overall program design firmly in mind, so the various parts of the program and the data structures work well together).
  Make lots of little maps (design an algorithm for each separate part of the overall design).
  Know the territory (understand exactly how the computer carries out each task).
  Know when it matters (identify the portions of your programs where performance matters, and don’t waste your time optimizing the rest).
  Always consider the alternatives (don’t get stuck on a single approach; odds are there’s a better way, if you’re clever and inventive enough).
  Know how to turn on the juice (optimize the code as best you know how when it does matter).

Making rules is easy; the hard part is figuring out how to apply them in the real world. For my money, examining some actual working code is always a good way to get a handle on programming concepts, so let’s look at some of the performance rules in action.

Know Where You’re Going

If we’re going to create high-performance code, first we have to know what that code is going to do. As an example, let’s write a program that generates a 16-bit checksum of the bytes in a file. In other words, the program will add each byte in a specified file in turn into a 16-bit value. This checksum value might be used to make sure that a file hasn’t been corrupted, as might occur during transmission over a modem or if a Trojan horse virus rears its ugly head. We’re not going to do anything with the checksum value other than print it out, however; right now we’re only interested in generating that checksum value as rapidly as possible.

Make a Big Map

How are we going to generate a checksum value for a specified file? The logical approach is to get the file name, open the file, read the bytes out of the file, add them together, and print the result. Most of those actions are straightforward; the only tricky part lies in reading the bytes and adding them together.

Make Lots of Little Maps

Actually, we’re only going to make one little map, because we only have one program section that requires much thought—the section that reads the bytes and adds them up. What’s the best way to do this?

It would be convenient to load the entire file into memory and then sum the bytes in one loop. Unfortunately, there’s no guarantee that any particular file will fit in the available memory; in fact, it’s a sure thing that many files won’t fit into memory, so that approach is out.

Well, if the whole file won’t fit into memory, one byte surely will. If we read the file one byte at a time, adding each byte to the checksum value before reading the next byte, we’ll minimize memory requirements and be able to handle any size file at all.

Sounds good, eh? Listing 1.1 shows an implementation of this approach. Listing 1.1 uses C’s read() function to read a single byte, adds the byte into the checksum value, and loops back to handle the next byte until the end of the file is reached. The code is compact, easy to write, and functions perfectly—with one slight hitch:

It’s slow.

LISTING 1.1 L1-1.C

* Program to calculate the 16-bit checksum of all bytes in the
* specified file. Obtains the bytes one at a time via read(),
* letting DOS perform all data buffering.
#include <stdio.h>
#include <fcntl.h>

main(int argc, char *argv[]) {
     int Handle;
     unsigned char Byte;
     unsigned int Checksum;
     int ReadLength;

     if ( argc != 2 ) {
          printf(“usage: checksum filename\n”);
     if ( (Handle = open(argv[1], O_RDONLY | O_BINARY)) == -1 ) {
          printf(“Can’t open file: %s\n”, argv[1]);

     /* Initialize the checksum accumulator */
     Checksum = 0;

     /* Add each byte in turn into the checksum accumulator */
     while ( (ReadLength = read(Handle, &Byte, sizeof(Byte))) > 0 ) {
          Checksum += (unsigned int) Byte;
     if ( ReadLength == -1 ) {
          printf(“Error reading file %s\n”, argv[1]);

     /* Report the result */
     printf(“The checksum is: %u\n”, Checksum);

Table 1.1 shows the time taken for Listing 1.1 to generate a checksum of the WordPerfect version 4.2 thesaurus file, TH.WP (362,293 bytes in size), on a 10 MHz AT machine of no special parentage. Execution times are given for Listing 1.1 compiled with Borland and Microsoft compilers, with optimization both on and off; all four times are pretty much the same, however, and all are much too slow to be acceptable. Listing 1.1 requires over two and one-half minutes to checksum one file!

Listings 1.2 and 1.3 form the C/assembly equivalent to Listing 1.1, and Listings 1.6 and 1.7 form the C/assembly equivalent to Listing 1.5.

These results make it clear that it’s folly to rely on your compiler’s optimization to make your programs fast. Listing 1.1 is simply poorly designed, and no amount of compiler optimization will compensate for that failing. To drive home the point, conListings 1.2 and 1.3, which together are equivalent to Listing 1.1 except that the entire checksum loop is written in tight assembly code. The assembly language implementation is indeed faster than any of the C versions, as shown in Table 1.1, but it’s less than 10 percent faster, and it’s still unacceptably slow.

Previous Table of Contents Next

Graphics Programming Black Book © 2001 Michael Abrash