Previous Table of Contents Next


The answer varies considerably depending on what display adapter and what display mode we’re talking about. The display adapter cycle-eater is worst with the Enhanced Graphics Adapter (EGA) and the original Video Graphics Array (VGA). (Many VGAs, especially newer ones, insert many fewer wait states than IBM’s original VGA. On the other hand, Super VGAs have more bytes of display memory to be accessed in high-resolution mode.) While the Color/Graphics Adapter (CGA), Monochrome Display Adapter (MDA), and Hercules Graphics Card (HGC) all suffer from the display adapter cycle-eater as well, they suffer to a lesser degree. Since the VGA represents the base standard for PC graphics now and for the foreseeable future, and since it is the hardest graphics adapter to wring performance from, we’ll restrict our discussion to the VGA (and its close relative, the EGA) for the remainder of this chapter.

The Impact of the Display Adapter Cycle-Eater

Even on the EGA and VGA, the effect of the display adapter cycle-eater depends on the display mode selected. In text mode, the display adapter cycle-eater is rarely a major factor. It’s not that the cycle-eater isn’t present; however, a mere 4,000 bytes control the entire text mode display, and even with the display adapter cycle-eater it just doesn’t take that long to manipulate 4,000 bytes. Even if the display adapter cycle-eater were to cause the 8088 to take as much as 5µs per display memory access—more than five times normal—it would still take only 4,000× 2× 5µs, or 40 µs, to read and write every byte of display memory. That’s a lot of time as measured in 8088 cycles, but it’s less than the blink of an eye in human time, and video performance only matters in human time. After all, the whole point of drawing graphics is to convey visual information, and if that information can be presented faster than the eye can see, that is by definition fast enough.

That’s not to say that the display adapter cycle-eater can’t matter in text mode. In Chapter 3, I recounted the story of a debate among letter-writers to a magazine about exactly how quickly characters could be written to display memory without causing snow. The writers carefully added up Intel’s instruction cycle times to see how many writes to display memory they could squeeze into a single horizontal retrace interval. (On a CGA, it’s only during the short horizontal retrace interval and the longer vertical retrace interval that display memory can be accessed in 80-column text mode without causing snow.) Of course, now we know that their cardinal sin was to ignore the prefetch queue; even if there were no wait states, their calculations would have been overly optimistic. There are display memory wait states as well, however, so the calculations were not just optimistic but wildly optimistic.

Text mode situations such as the above notwithstanding, where the display adapter cycle-eater really kicks in is in graphics mode, and most especially in the high-resolution graphics modes of the EGA and VGA. The problem here is not that there are necessarily more wait states per access in highgraphics modes (that varies from adapter to adapter and mode to mode). Rather, the problem is simply that are many more bytes of display memory per screen in these modes than in lower-resolution graphics modes and in text modes, so many more display memory accesses—each incurring its share of display memory wait states—are required in order to draw an image of a given size. When accessing the many thousands of bytes used in the high-resolution graphics modes, the cumulative effects of display memory wait states can seriously impact code performance, even as measured in human time.

For example, if we assume the same 5 µs per display memory access for the EGA’s high-resolution graphics mode that we assumed for text mode, it would take 26,000 × 2 × 5 µs, or 260 µs, to scroll the screen once in the EGA’s high-resolution graphics mode, mode 10H. That’s more than one-quarter of a second—noticeable by human standards, an eternity by computer standards.

That sounds pretty serious, but we did make an unfounded assumption about memory access speed. Let’s get some hard numbers. Listing 4.11 accesses display memory at the 8088’s maximum speed, by way of a REP MOVSW with display memory as both source and destination. The code in Listing 4.11 executes in 3.18 µs per access to display memory—not as long as we had assumed, but a long time nonetheless.

LISTING 4.11 LST4-11.ASM

; Times speed of memory access to Enhanced Graphics
; Adapter graphics mode display memory at A000:0000.
;
     mov  ax,0010h
     int  10h;        select hi-res EGA graphics
                      ; mode 10 hex (AH=0 selects
                      ; BIOS set mode function,
                      ; with AL=mode to select)
;
     mov  ax,0a000h
     mov  ds,ax
     mov  es,ax       ;move to & from same segment
     sub  si,si       ;move to & from same offset
     mov  di,si
     mov  cx,800h     ;move 2K words
     cld
     call ZTimerOn
     rep  movsw       ;simply read each of the first
                      ; 2K words of the destination segment,
                      ; writing each byte immediately back
                      ; to the same address. No memory
                      ; locations are actually altered; this
                      ; is just to measure memory access
                      ; times
     call ZTimerOff
;
     mov  ax,0003h
     int  10h         ;return to text mode

For comparison, let’s see how long the same code takes when accessing normal system RAM instead of display memory. The code in Listing 4.12, which performs a REP MOVSW from the code segment to the code segment, executes in 1.39 µs per display memory access. That means that on average, 1.79 µs (more than 8 cycles!) are lost to the display adapter cycle-eater on each access. In other words, the display adapter cycle-eater can more than double the execution time of 8088 code!

LISTING 4.12 LST4-12.ASM

; Times speed of memory access to normal system
; memory.
;
     mov  ax,ds
     mov  es,ax       ;move to & from same segment
     sub  si,si       ;move to & from same offset
     mov  di,si
     mov  cx,800h     ;move 2K words
     cld
     call ZTimerOn
     rep  movsw       ;simply read each of the first
                      ; 2K words of the destination segment,
                      ; writing each byte immediately back
                      ; to the same address. No memory
                      ; locations are actually altered; this
                      ; is just to measure memory access
                      ; times
     call ZTimerOff

Bear in mind that we’re talking about a worst case here; the impact of the display adapter cycle-eater is proportional to the percent of time a given code sequence spends accessing display memory.


Previous Table of Contents Next

Graphics Programming Black Book © 2001 Michael Abrash