Tuesday, March 19, 2013

Beyond Bresenham: New Algorithms for Drawing Line Segments

Download Source Code

Introduction


J. E. Bresenham, an employee of IBM, published an algorithm (source [1]) for approximating line segments on a raster output device in 1965, and his work in this area is still widely taught and used. Unlike the algorithms dedicated to, for example, achieving mutual exclusion (Dijkstra's algorithm, Dekker's algorithm, Peterson's algorithm, etc.), Bresenham's algorithm has not really been augmented by many successor algorithms.

At the same time, though, raster output devices have progressed considerably beyond the plotter that Bresenham was targeting with his actual code. Modern hardware is typically not constrained to plotting one pixel at a time, as Bresenham's printer was. Rather, the low-level interface of modern hardware almost invariably offers, in one form or another, a facility by which multiple pixels are plotted (e.g. written into the display buffer, or printed onto the physical medium of a printer or plotter) in a single action. 

I refer to these modes-of-operation as "burst mode" in the subsequent discussion. The performance impact of using single-pixel operations on burst mode hardware will vary; but on many otherwise different systems, single-pixel drawing operations are fundamentally wasteful. 

On computers where single SIMD instructions can be used to write large groups of pixels into the display buffer, for example, relying on single-pixel operations may present performance issues. Similarly, while the target platform for which practical application code is provided with this article does not have SIMD instructions, it nevertheless can draw groups of pixels much more efficiently than it can draw single pixels.

This single-pixel assumption is one way in which Bresenham not only offered up a clever line-drawing algorithm, but also shaped our perception of the line-drawing problem. Another, more subtle example of this influence is the way in which Bresenham defined the set of segments to be approximated, and in particular the location of their endpoints.

The work presented below revisits such assumptions, and ultimately yields a new, general line-drawing algorithm. After that, this article presents a practical application designed as a proof-of-concept of the general algorithm. The target platform selected is an old one: the Tandy / Radio Shack Color Computer ("CoCo").

The Color Computer has an attractive but flawed burst drawing facility. Its 8-color "semigraphics" mode can produce attractive output, but also demands some clever workarounds of the developer. To fully exploit the capabilities of this early microcomputer is thus a difficult test for the new algorithm, which was originally constructed for a very different target platform. Its adaptability to this platform bodes well for its future utility, and stands in contrast with the relative inability of Bresenham's algorithm - for all its merits - to accurately describe what an optimized line-drawing routine actually does on this real target platform. 

Whether or not the algorithm succeeds in outperforming that of Bresenham must remain a platform-specific question. Bresenham's algorithm is not only integer-based, it can be written without using division, and with multiplications by 2 only. My own algorithm adds a single division operation per call, along with one multiplication by an arbitrary integer. Nevertheless, on contemporary hardware, it is quite likely that something like my new algorithm will perform better. This is ensured by the presence of CPU-level support for the necessary integer operations, along with display device support for burst mode transfers.

Also, it should be noted that Bresenham's algorithm has proven generally useful in the area of computational geometry, and that my own algorithm cannot supplant it in this respect. It is trivial, for instance, to adapt Bresenham's algorithm to the approximation of circles, and that is not an application of the new algorithm presented below. More generally, Bresenham pioneered the technique of replacing division with the iterative calculation of error via addition, a clever conceit which has become widespread. This technique remains central to all of the algorithms presented here.

Finally, please realize that this article deals with the drawing of non-antialiased line segments of a single color. Some very clever follow-ups to Bresenham's original work have shown how to plot smoothed-out line segments using multiple shades (see [2]). This article does not explore such techniques.

Background  

The problem of approximating a two-dimensional line segment on a raster output device is not a trivial one. A raster line-drawing routine's job consists of choosing the appropriate device pixels to simulate a real, continuous line segment. The segment to be modeled typically extends from the center of one device pixel to the center of a second pixel. For a segment that is more nearly horizontal in slope than vertical, the correct approximation consists of groups of horizontally adjacent pixels (which I call “pixel groups” in the discussion below). Each group differs from its neighbor group by ±1 in the Y-dimension. Conversely, a segment that is more nearly vertical in slope is approximated by several groups of vertically adjacent pixels, each of which differs from its neighbor group by ±1 in the X-dimension. 

Bresenham’s line rasterization algorithm is typically presented as the state of the art (e.g. in sources [3, 4, 5]). Bresenham based his algorithm on an algebraic attempt to minimize the distance between the real line segment and the centers of the approximating raster pixels. A simple version of his algorithm maintains an error term E, which, at any point during the execution of the algorithm, equals the distance between the segment and the center of the last approximating pixel drawn, at the point where they are closest.  

Figure 1 demonstrates the nature of the term E. It shows a nearly horizontal line segment passing through two pixels in the same pixel group. The two large squares represent two device pixels. The black dots represent pixel centers. The thick line represents part of the line segment being modeled, and the thinner, perfectly vertical lines span the distance E for each pixel. As shown in the figure, E is a distance in the dimension with the smaller absolute change between endpoints. In the other dimension, the segment and the pixel centers are at the same coordinate at the points where E is calculated. 

Figure 1: Bresenham’s Error Term  

In an unoptimized version of Bresenham’s algorithm, E begins the algorithm at 0; both the segment and its approximation begin at an exact pixel center (this part of the segment is not shown in figure 1). After each pixel is drawn, the slope or slope reciprocal (whichever is fractional) is added to the error term. This is done because no shift in the dimension with the smaller absolute change is evident in the pixel (which resides at the center of a device row or column), whereas the real segment has shifted by the slope m, or by 1/m. Before the algorithm plots each pixel, it checks this error term to see if it is necessary to start a new pixel group. New pixel groups are started when the error term exceeds ½, i.e. when the real segment is closer to the next row or column than it is to the row or column being used for the current pixel group. When a new pixel group is started, 1 is subtracted from the error. Whatever the error was before the new group was started, the approximation is now 1 pixel closer to reality.    

Clearly, much of Bresenham’s algorithm depends on which dimension exhibits the larger absolute change from one endpoint to the other. A practical implementation of the algorithm should thus compare the two dimensional changes once, near its beginning. In the implementation of the algorithm presented below as algorithm 1, a variable called interx is set to true for more nearly vertical segments, and a variable called DX holds the absolute dimensional change of greater magnitude (which is not necessarily the change in the x-dimension).

If we denote the endpoints as (X1,Y1) and (X2,Y2), a full-fledged line-drawing algorithm must also deal with cases in which X2-X1 and/or Y2-Y1 are negative. Typically, implementations of Bresenham’s algorithm (e.g that in source [4]) handle this issue by maintaining the magnitude of each dimension’s change and the sign of each change in separate variables. This allows the algorithm to, for example, store |X2-X1| in DX, and then loop from 1 to DX to draw a segment where |X2-X1| > |Y2-Y1|. The sign variables are used within the loop to plot the line segment in the correct direction. This is the approach taken by algorithm 1 below, in which variables S1 and S2 hold the sign of the change in the X-dimension and that of the change in the Y-dimension, respectively.

One optimization to Bresenham’s algorithm as outlined above consists of starting the error term at -½ instead of 0. This allows the algorithm to check only the sign of the error. More importantly, the error term (and the terms to which it is added and compared) can be scaled up by twice the larger absolute dimensional change (or 2DX), which eliminates need to deal with non-integer values.

After algebraic simplification, the optimizations outlined in the last paragraph mean that the error term E will be initialized to the larger of -|Y2-Y1| and -|X2-X1|. Since the coordinates of the first pixel are known, the error term need not be checked before the first pixel is plotted. The term E can actually be checked after each pixel is plotted, versus before. As such, an optimal implementation of Bresenham’s algorithm will incorporate the error component of the first pixel into the initial value of E, which will thus equal

-|X2-X1| + [|Y2-Y1|/|X2-X1|] · [2|X2-X1|] 

if the S-dimension change is larger in magnitude, and

-|Y2-Y1| + [|X2-X1|/|Y2-Y1|] · [2|Y2-Y1|]  

respectively. These are the expressions Bresenham uses to initialize the error term, and the result of the optimizations is a completely integer-based algorithm.   

Algorithm 1 below takes the approach outlined by Bresenham in his publication. It assumes the line segment has color C. It also assumes the availability of a subroutine plot, with C-language prototype plot(int x, int y, int c). This subroutine is device-specific and plots a single pixel at position (x,y) with color c. The rest of the algorithm is given in portable C1. A cursory main function is provided in the interest of completeness: 
#define _ABS(a) ((absval=(a))<0?(-absval):(absval)) 
 
void bres(int X1,int Y1,int X2,int Y2, int C){
 
 int X,Y,DX,DY,S1,S2,interx,temp,E,i,absval;
 
 X=X1;
 Y=Y1;
 DX=_ABS(X2-X1);
 DY=_ABS(Y2-Y1);
 S1=(X2-X1)<0?-1:1;
 S2=(Y2-Y1)<0?-1:1;
 
 if(DY>DX){
  temp=DX;
  DX=DY;
  DY=temp;
  interx=1;
 }else{
  interx=0;
 }
 
 E=2*DY-DX;
 
 for(i=1;i<=DX;i++){
  plot(X,Y,C);
 
  if(E>=0){
   if(interx)
    X+=S1;
   else
    Y+=S2;
   E-=2*DX;
  }
 
  if(interx)
   Y+=S2;
  else
   X+=S1;
 
  E+=2*DY;
 }
}
 
int main(){
 bres(1,1,5,22,1);
 return 0;
}
  
Algorithm 1 : Bresenham's Algorithm  

The output of the code shown above is shown in figure 2, in conjunction with the real line segment being modeled. 

Note that the Bresenham approximation excludes one end pixel from its approximation. In referring to end pixels, I mean the pixels that have the same coordinate numbers as the real segment’s endpoints, and contain these endpoints. Were one of these pixels not excluded, the approximation would be longer than the real segment. The exclusion of the lower end pixel in the approximation shown in figure 1 is thus a necessary but arbitrary aspect of the approximation. It is arbitrary in the sense that there is no particular reason the upper end pixel is not excluded instead. In the treatment of end pixel inclusion given later, a technique for eliminating this arbitrary decision is explored. 


Figure 2: Bresenham’s Algorithm Output  

Multiple Pixel Plot Operations 

Realistically, many modern display systems can move 32 or even 64 bits to the frame buffer in a single transfer operation. A single pixel will not fully utilize such a transfer on most systems, even those having a true (24-bit) color depth. Furthermore, many CPUs offer block transfer operations that can be used to move large blocks of memory in a single operation. Intel’s MMX and AMD’s 3DNOW! are well-advertised examples of such capabilities, but even the original IBM PC’s 8088 chip offered REP, a CPU-level iterative loop [6]. In fact, at the bottom of this article, some burst mode code written in Motorola 6809 assembly language is presented. This early 1980s hardware plots as many as four pixels in the graphics mode selected for my demo.

The desire to exploit burst modes led me to develop line rasterization algorithms whose outer loop iterates once per contiguous group of approximating pixels. At the top of such an outer loop, the number of contiguous pixels to be drawn for a particular iteration is explicitly available in a variable named W. This aspect of these algorithms makes them amenable to the exploitation of burst mode.

Noting that the pixel groups in the Bresenham approximation end when E becomes nonnegative, and that E is incremented by 2•DY with each single pixel plotted, I determined that, at the top of the iterative loop for each pixel group, W equals -FLOOR(E/2DY)+1, where the division in the calculation is done on a floating-point basis (and where the function FLOOR returns the integer number closest to its real number parameter, but not greater than it). The constant +1 term in the expression reflects the fact that E already includes the error attributable to the next pixel about to be plotted at the top of the loop. 

I went on to build a full-fledged version of Bresenham’s algorithm around this W calculation. The outer, iterative loops were replaced with loops based not on the dimension with the larger absolute change, but on that with the smaller absolute change (because the number of groups in the approximation equals the smaller of |X2-X1| and|Y2-Y1|). The calls to plot were replaced with calls to HLIN or VLIN (which are assumed to draw pixel rows and columns, respectively). The quantity added to E at the bottom of the loop (which is still central to the algorithm) was multiplied by W, to account for the multiple pixels drawn by each iteration.

Also, some outer, conditional statements were introduced to account for the assumption that HLIN and VLIN plot in the positive screen direction. In situations where the dimension with the larger absolute change exhibits a negative change between (X1,Y1) and (X2,Y2), special action must be taken to ensure the calls to HLIN or VLIN contain the appropriate starting coordinate for each group. In such situations, negative direction HLIN and VLIN plots are simulated via simple subtraction, followed by a positive-direction plot of W pixels. 

In the final algorithm, I was able to eliminate reliance on floating-point arithmetic. The original C assignment statement for W looked like this:

W=-floor((float)E/dyy)+1;

(The variable dyy equals 2*DY.) I was able to simulate this using integer math:

W=1+((-E-1+dyy)/dyy)

This expression simulates the floor function because of the way in which integer division works in the C language (it drops any remainder). For an integer division with a negative result (such as the one above), the behavior dictated by the C standard is tantamount to calling a “ceiling” function (that is, a function that returns the integer number closest to its real number parameter, but not less than it). In the integer expression above, the addition of dyy-1 to the numerator has the effect of adding –1 to the division’s result, except in cases where no remainder exists. The result is equivalent to calling a “floor” function.

One final complication had to be faced in the development of the W-based version of Bresenham’s algorithm. The technique used to calculate W does not always work for the final pixel group. Specifically, this final pixel group may end before E becomes nonnegative. This issue was resolved by virtue of the fact that a certain number of pixels, and no more, will comprise the correct approximation. This number equals |X2-X1| for more nearly horizontal line segments, and equals |Y2-Y1| otherwise. Therefore, I handled the final pixel group outside of the iterative loop of my W-based algorithm, by simply determining the number of pixels remaining to be plotted and plotting them. In some cases, the final pixel group ends exactly where E becomes nonnegative, and this results in a call to HLIN or VLIN with W equal to 0. It is assumed in the code below that HLIN and VLIN tolerate this situation (and return without action).  

Algorithm 2, below this paragraph, is the algorithm I developed. Again, it is given in portable C, except for the calls to HLIN and VLIN. The output of this code is the same as the approximation shown in figure 2. It should be noted that this algorithm is intended as an intermediate step between Bresenham's algorithm and the final version of the new algorithm developed for this article. It may be useful for some applications, but it has not been optimized quite so fully as the final algorithm presented later in this article.

#define _ABS(a) ((absval=(a))<0?(-absval):(absval))
#define SETW W=1+((-E-1+dyy)/dyy)
 
void fast_bres(int X1,int Y1,int X2,int Y2, int C){
 
 int X,Y,DX,DY,s1,s2,interx,temp,E,i,dxx,dyy,W,absval;
 
 X=X1;
 Y=Y1;
 DX=_ABS (X2-X1);
 DY=_ABS (Y2-Y1);
 s1=(X2-X1)<0?-1:1;
 s2=(Y2-Y1)<0?-1:1;
 
 if(DY>DX){
  temp=DX;
  DX=DY;
  DY=temp;
  interx=1;
 }else{
  interx=0;
 }
 
 dyy=2*DY;
 dxx=2*DX;
 E=dyy-DX;
 
 if(s1>=0 && !interx){
  for(i=1;i<=DY;i++){
   SETW;
   HLIN(X,Y,W,C);
   X+=W;
   E=E-dxx+W*dyy;
   Y=Y+s2;
  }
  HLIN(X,Y,(DX-(X-X1)),C);
 }
 if(s2<0 && interx){
  for(i=1;i<=DY;i++){
   SETW;
   VLIN(X,Y-W+1,W,C);
   Y-=W;
   E=E-dxx+W*dyy;
   X=X+s1;
  }
  VLIN(X,Y2+1,DX-(Y1-Y),C);
 }
 if(s1<0 && !interx){
  for(i=1;i<=DY;i++){
   SETW;
   HLIN(X-W+1,Y,W,C);
   X-=W;
   E=E-dxx+W*dyy;
   Y=Y+s2;
  }
  HLIN(X2+1,Y,DX-(X1-X),C);
 }
 if(s2>=0 && interx){
  for(i=1;i<=DY;i++){
   SETW;
   VLIN(X,Y,W,C);
   Y+=W;
   E=E-dxx+W*dyy;
   X=X+s1;
  }
  VLIN(X,Y,(DX-(Y-Y1)),C);
 }
}
 
int main(){
 fast_bres(1,1,5,22,1);
 return 0;
}  

Algorithm 2 : Bresenham's Algorithm with Burst Mode Support 

The second, third, fourth, and fifth if statements in algorithm 2 surround four essentially similar versions of the algorithm’s key iterative loop. These are loops that iterate once per pixel group. The existence of four versions of the loop reflects the need to call either HLIN or VLIN (depending on whether the line is more nearly horizontal or vertical) as well as the need to compensate for the situation where the dimension with the larger absolute change exhibits a negative change between (X1,Y1) and (X2,Y2).

A New Segment Set     

The discussion thus far implies a logical model of the raster device consisting of a grid, formed by lines intersecting at 90ยบ angles. Each rectangle formed by the grid approximates a pixel. This is the logical model shown in figure 2. Concurrent with the work discussed thus far, I was investigating the feasibility of developing a line rasterization algorithm that would approximate line segments having endpoints at the corners of these rectangles, instead of at their centers. Instead of having endpoint (0,0) refer to the center of the top left pixel (as has been customary), it could refer to the top left corner of that pixel’s approximating rectangle. On a system with 320 vertical columns, the extreme valid X-dimension inputs for the line-drawing routine would then be 0 and 320, versus 0 and 319 when pixel centers are used. Such an algorithm could potentially offer a gain in apparent resolution, I surmised. 

Also, simple expediency dictates that it is desirable, all else held equal, to make use of the grid formed by pixel edges for modeling purposes, rather than superimpose another grid for the line segment endpoints. This second grid is shown in gray in figure 3. This figure seems rather cluttered in comparison with the simple grid shown in figure 2, for instance, where the grid consists only of line segments intersecting at pixel corners.


Figure 3: The Extra Grid of Pixel Centers 

Bresenham’s figures show a single grid, formed by lines that intersect at pixel centers. No grid outlining the pixels is shown. Had he shown such a grid in his figures, they would have appeared similar to figure 3.  

Approximations of segments ending at pixel corners will also match their segments’ endpoints more closely than approximations of segments ending at pixel centers do. Better hardware typically has a smaller dot pitch, which implies larger pixels that more fully fill their approximating rectangles. These pixels almost certainly disappear from the screen or paper closer to the corner of their approximating rectangles than to their centers. Indeed, a device that did not exhibit that characteristic would produce very grainy output. On modern hardware, an algorithm that approximates line segments ending at pixel corners thus promises to approximate these segments’ endpoints with greater fidelity than an algorithm that approximates line segments ending at pixel centers. As discussed below, such an algorithm also does not suffer from the need to arbitrarily drop an end pixel (as does Bresenham’s algorithm). 

In some sense, segments ending at pixel corners are simpler to model than those ending at pixel centers. Figure 4 shows a real line segment that is similar to the line segment in figure 2,  except that its endpoints have been extended out to the corners of its end pixels. Note that the cardinality of the pixel groups in this approximation ranges from 4 to 5. Contrast this with the approximation in figure 2, whose groups range in cardinality from 3 to 6. Note also that the slope of the real segment being modeled is between 4 and 5. The reason the beginning / ending pixel groups in the Bresenham approximation are smaller than the middle groups is that the interior pixel groups begin and end where the real line segment crosses into a new row / column of raster pixels, whereas the beginning and ending pixel columns begin and end, respectively, at the center of a raster pixel row / column. The line segments with endpoints at pixel corners will not have any pixel groups that begin where the real line segment is at the center of the pixel column, since their endpoints do not reside at pixel centers. 


Figure 4: Approximation of a Corner-Defined Segment 

The key ratio for segments ending at pixel corners is actually not the slope, but the ratio of the number of rows occupied by the segment to columns occupied (for segments where |Y2- Y1| > |X2-X1|), or vice-versa (for other segments). If this ratio is exactly 2, then the pixel groups will all have cardinality of 2. If the ratio is between 2 and 3, some pixel groups will have a cardinality of 2, and others 3. The remainder of the ratio’s division determines how often the maximum pixel group cardinality applies compared to the minimum cardinality. In fact, another ratio - the ratio of this remainder to the denominator of the first ratio - equals the percentage of time that the maximum pixel group cardinality should be used. For instance, if a real segment occupies 13 columns and 4 rows, then the minimum pixel group cardinality will be 4 and the minimum 3 (since 13/4 is between 3 and 4). The remainder is 1, and as such, the maximum pixel group cardinality will be employed one-fourth of the time. Because the denominator of this fraction is equal to the number of pixel groups, the correct maximum-to-minimum proportion will always be achieved exactly.

One difficulty in modeling segments ending at pixel corners arises from the fact that the numbering system for line segment endpoints no longer coincides with the numbering system for the device pixels. Consider figure 5, which shows a representation of a real line segment from point (0,0) to point (2,6), in the new endpoint numbering system2. The two shaded pixels are the pixels with the same nominal coordinates as the line segment’s end points. In this case, the pixel numbered (2,6) is extraneous to the set of pixels that would best approximate the line segment. The pixel at coordinates (2,6) intercepts the line segment at coordinates (2,6), but that is the only point at which pixel (2,6) intersects the real line segment.  



Figure 5: End Pixel Determination Issue 



Ultimately, I decided on an approach that first determined the appropriate end pixels (vs. end points) for the approximation. The selection of end pixels is simplified by enforcing the condition X1<X2. This condition is not required of the user’s parameters, but is ensured by swapping endpoints, if necessary, at the start of the algorithm. Figure 6 is a graphical aid that was used to formulate the logic for end pixel selection. The left part of the figure deals with the endpoint (X1,Y1) and the right part deals with (X2,Y2). The gray pixels on both sides will not be part of the set of approximating pixels because of the condition X1<X2 (the line segment will always proceed in the positive X direction from the first endpoint). The pixel denoted by the real endpoint is outlined more darkly in the figure.



On each side, a model line segment of positive slope and another of negative slope are shown. On the left side, the positively sloped line segment intercepts the pixel with coordinates (X1,Y1), while the negatively sloped line segment intercepts (X1,Y1-1). On the right side, it is guaranteed that the end pixel will not be at X2, but at X2-1. If the slope is positive, the end pixel is at (X2-1, Y1-1), otherwise it is at (X2-1, Y1). The rules for these model line segments can be generalized into rules for all negatively / positively sloped line segments. Line segments with slope zero are not closer to either of the potential pixels’ center, and either of the potential pixels can be used.



Figure 6: End Pixel Determination Aid  

Another positive aspect of using pixel corners for endpoints deserves discussion. If the segments being modeled extend between pixel centers, it is impossible to say whether each of the line segment’s end pixels should be included in the approximation. For more nearly vertical segments, the real segment will occupy only the top half of one end pixel. At the other end pixel, it will occupy only the bottom half. For more nearly horizontal line segments, the segment will occupy only the left half of one end pixel and only the right half of the other. Therefore, segments terminating at pixel centers are ambiguous with respect to end pixel inclusion/exclusion. Bresenham’s algorithm includes pixel (X1,Y1) in the approximation but not (X2,Y2). When the segments being modeled terminate at pixel corners, though, this ambiguous situation does not apply. The real segment will either completely cross (X1,Y1) or it will completely avoid it. The same is true for (X2,Y2)

As in Bresenham’s algorithm, it is logical for an algorithm that approximates segments ending at pixel corners to plot contiguous groups in sequence from one endpoint to the other. With the number and cardinalities of these pixel groups known, along with the start/end pixels for the approximation, all that remains to be determined is exactly when (i.e. for which groups) the larger pixel group cardinality should be employed, and when the smaller cardinality should be used. 

Figure 7 models the situation where such a determination must be made. The segment being modeled is closer to horizontal than to vertical, and the ratio 1/m lies somewhere between 2 and 3. The correct approximation will thus consist of some groups (rows) of 3 pixels, and some of 2 pixels. The squares outlined in darker lines represent pixels that must clearly be plotted, in order to ensure the correct end pixels are plotted and the minimum group cardinality of 2 is respected. On the other hand, only one of the two squares in the center column of the grid should be plotted. The correct pixel is the one for which the checkered area is larger, i.e. the pixel which contains more of the real segment. If the point labeled A lies to the left of the center of the two pixels in the center column, then the correct pixel in the middle column will be the lower one. Otherwise, the upper pixel should be plotted. 



Figure 7: Pixel Group Cardinality Determination Aid  

Furthermore, if the distance labeled E in the figure is greater that ½ the slope, then point A will lie right of the pixel column’s center, and the appropriate pixel will be the upper one. This is true because the change in the vertical dimension is m pixels per each pixel in the horizontal dimension. Per half pixel, the change is thus m/2. At the center of the pixel column, the segment will lie m/2 pixels away (in the vertical dimension) from where it was located when it entered the central, ambiguous column. If E exceeds this distance, then the real segment will still lie within the upper pixel group at the pixels’ centers. This indicates that, for more nearly horizontal lines, the larger pixel group should be plotted in cases where E>m/2.  

For the first pixel group plotted (here, the leftmost), the term E can be calculated as 1, minus m times the group cardinality. If the segment were perfectly horizontal, then E would equal the entire height of the pixel, i.e. 1. But, the segment is not perfectly horizontal. It grows m pixels closer to the bottom of the row with each pixel. Thus, E=1-mW, where W is the minimum pixel group cardinality. After each group is plotted, E is incremented by 1 to reflect the shift to the next row. If the maximum cardinality is used for the group, an additional m is subtracted from E to account for the extra pixel.

The logic outlined in the preceding two paragraphs works for all lines that are more nearly horizontal in slope than vertical. This logic can be generalized by replacing m with 1/m for lines where |Y2-Y1| > |X2-X1|. Moreover, these calculations can be scaled up to rely solely on integers, by multiplying all terms by twice the larger dimensional change. This is the approach taken by the implementation shown below. 

One final consideration deals with perfectly horizontal and vertical segments. For segments ending at pixel centers, the correct approximations for such segments are obvious. For segments ending at pixel corners, an ambiguous situation exists. The real segment to be modeled straddles the precise location between two device rows or columns. The best that can be done is to arbitrarily select one or the other, and to make this choice consistently (e.g. to always choose the device column to the right of perfectly vertical segments).

The New Algorithm

Algorithm 3, below, shows an implementation of the techniques described above. Again, availability of HLIN and VLIN is assumed. The output for this code is as shown in figure 4:

#define _ABS(a) ((absval=(a))<0?(-absval):(absval))

void alg3(int X1, int Y1, int X2, int Y2, int C){

 int temp,W,DY,DX,s2W,S,absval;

 int s1w,SignY1,SignY2,s1,s2,E;

 if(X2==X1)X2++;
 if(Y2==Y1)Y2++;

 if(X2<X1)
 {
  alg3(X2,Y2,X1,Y1,C);
  return;
 }

 X2--;

 if((Y2-Y1)<0){
  s2=-1;
  Y1--;
  DY=Y1-Y2+1;
 }else{
  s2=1;
  Y2--;
  DY=Y2-Y1+1;
 }

 s1=(X2-X1)<0?-1:1;
 DX=_ABS(X2-X1)+1;

 if(DY>DX){
  temp=DX;
  DX=DY;
  DY=temp;
  W=DX / DY;
  s2W=s2*W;
  S=0;
  E=2*DX;

  if(s2>0){
   SignY1=0;
   SignY2=0;
  }else{
   SignY1=W;
   SignY2=W-1;
  }

  while(S<DY){
   E-=2*W*DY;
   if(E>DY){
    E-=2*DY;
    VLIN(X1,Y1-(SignY1),W+1,C);
    Y1+=(s2W+s2);
   }else{
    VLIN(X1,Y1-(SignY2),W,C);
    Y1+=s2W;
   }

   E+=2*DX;
   S++;
   X1+=s1;
  }

 }else{

  W=DX / DY;
  s1w=s1*W;
  S=0;
  E=2*DX;

  while(S<DY){
   E-=2*W*DY;
   if(E>DY){
    E-=(2*DY);
    HLIN(X1,Y1,W+1,C);
    X1+=(s1w+s1);
   }else{
    HLIN(X1,Y1,W,C);
    X1+=s1w;
   }
 
   E+=2*DX;
   S++;
   Y1+=s2;
  }
 }
}

int main(){
 alg3(1,1,6,23,1);
 return 0;
} 

Algorithm 3 : New Line-Drawing Algorithm 

After the initial int declarations, the first two if statements account for situations where the line segment is perfectly horizontal or vertical. The endpoints are altered slightly, so that the segment crosses from one side of a device row or column to the other, resulting in a new model segment residing entirely within that row or column. The next if statement ensures that X2>X1, which eliminates the need to maintain two sign variables. Next, X2 is decremented, in keeping with the end pixel determination logic outlined previously. The subsequent conditional structure completes the process of determining the end pixels. It also sets up the variable DY, which holds the change in the y-dimension, and sets up two variables used to account for the fact that VLIN plots in the positive direction. These are named SignY1 and SignY2.

With these setup tasks complete, the main line rasterization loop can begin. Different loops are used for more nearly horizontal lines versus more vertical ones; this is the purpose of the next if statement. In either case, DX holds the larger change and DY the smaller, and each of these may hold either a change in the X-dimension or a change in the Y-dimension, variable name notwithstanding. 

Algorithm 3 exhibits one key performance advantage compared to Bresenham's algorithm: its loop executes once per pixel group, as opposed to once per pixel. On hardware where drawing one pixel and drawing several adjacent pixels are equally costly options, this is a big advantage.

Algorithm 3 does make more extensive use of multiplication and division operations than Bresenham's algorithm does, which may be a disadvantage on some platforms. Algorithm 3 uses a single division near its start; Bresenham's algorithm uses none. Inside of its main loop, algorithm 3 only uses addition, subtraction, and multiplication. In this respect, it is similar to Bresenham's algorithm, although it is worth noting that all of the multiplications in Bresenham's algorithm are by 2. Algorithm 3 has a single multiplication that can involve any byte value.

How these tradeoffs pan out on any given platform depends on the sort of facilities available. In the example application described in the next section, none of the multiplication operations proved problematic. All of them were effected using single shift operations, or, in one case, using a CPU multiplication opcode. The single division operation per call did end up requiring repetitive subtraction; many more contemporary devices will not even require this.

Color Computer Application 

The Color Computer offers a single 8-color mode, along with several other modes with fewer colors. The 8-color mode has a resolution of 64x32, and is actually text-based. It is really a 32x16 character mode, where the character set includes a series of graphical characters that divide the character into 4 pixels [7]. Examples are shown in figure 8 below. 

The hybrid nature of CoCo "semigraphics" mode offers some interesting possibilities. The combination of text and graphics that is available seems to facilitate the construction of a more sophisticated and graphical user interface. 

Also, under this design, two pixels can be drawn in a single character write operation. (Using the graphical characters with more than two pixels on would cause clumps in the output for a line segment.) Moreover, the 6809 CPU used for the CoCo has a 16-bit addressing mode, which allows two horizontally contiguous characters (or up to four pixels) to be drawn at once [8]. There is no possibility of drawing three or four vertically adjacent pixels in a single operation in this way. This means that horizontal lines can be draw more quickly than vertical ones on the CoCo, if the program is optimal. Because of their unique nature, the CoCo application given here is limited to lines having a slope less than or equal to 1.
Figure 8: Color Computer Graphics Characters

This "semigraphics" mode, though, is a challenge to program. Its burst capability must be exploited in order to develop code that is efficient, and Bresenham's algorithm offers little hope in this area. Furthermore, the fact that each memory byte affects multiple pixels implies that there is nothing like the plot() function used in algorithm 1 above will be available on the CoCo. Simply writing out a character with just one of its four pixels illuminated might turn off other, previously illuminated pixels. 

Fortunately, the CoCo architecture provides a way around this issue of undesirable interaction, and algorithms 2 and 3 above offer techniques for dealing with burst most. The methods available for preventing pixel plot operations from interfering with previous, similar operations are discussed first.  

The CoCo uses 8-bit characters, and the graphics characters occupy positions 128-255 of the CoCo character set. Each graphic character displays some combination of its four pixels in one of the eight CoCo foreground colors, with the other pixels shown in black. 

Each nibble of these single-byte graphics characters has a distinct significance. The top nibble determines the foreground color, and the bottom nibble determines which pixels are illuminated. This bottom nibble is populated such that performing a bitwise OR operation between two graphics character bytes will result in a combination of each pixel's bytes. This property is very useful when drawing line segments, and was achieved by Tandy / Radio Shack by assigning each constituent pixel a power of 2. 

This works neatly for the bottom nibble of the graphics character bytes. The top byte, though, is not a bit mask, but a simple enumeration of the 8 CoCo colors. It therefore remains susceptible to corruption in cases where pixels of different colors are being drawn in close proximity to each other, and this is a fundamental limitation of the CoCo's display hardware.

For example, consider what happens when a drawing operation consists of an OR operation between a character in the display buffer having its top right pixel set and a new character with its bottom right pixel set. In the bottom nibble of the resultant character, 1 (bottom right pixel) and 4 (top right) are arguments for an OR operation that yields 5, which is the character nibble that means that both right side pixels are set. 

This is desirable, but the top nibble of the result is not so easy to calculate. If the two color nibbles are identical, then the result's color nibble will be the same as its two inputs. This is the nature of the OR operation. 

In cases where the two graphics character bytes being combined are not of the same color, though, the resultant color is basically arbitrary. Often, the entire character that results consists of pixels in one color or the other. Green is color 8 and purple is color A (hexadecimal), for example; since all bits set in 8 are set in A, the result has color nibble A, and all 1-4 pixels turned on will have purple color.

Of course, this is not always the correct result, but it is, inevitably, the CoCo result. There is simply no way to ensure that this bleeding of color between adjacent pixels will not occur in an environment where groups of four adjacent pixels must, by design, share the same color. The programmer must work around this limitation.  

Design Principles 

The specific design principles used for the CoCo implementation encompass the general line-drawing principles already described, but must augment these with some principles that are unique to the CoCo hardware. This is inevitable; a general line drawing algorithm should not encompass the idiosyncrasies of any specific computer.

The CoCo implementation is necessarily much closer to algorithm 3 than it is to algorithm 1 (Bresenham's algorithm). To rely on something like plot() on the CoCo would be wasteful. At least, this would be the case in the "semigraphics" mode under discussion.

For illustration, consider what it takes to plot four horizontally adjacent pixels in CoCo "semigraphics" mode. If one were to write a plot() operation to set single pixels, each execution of this function would require a bitwise OR operation, to preserve any previously plotted pixels, along with a load operation and a store operation.

As an alternative to executing this laborious sequence four times, consider that it is possible to write four pixels in to the CoCo "semigraphics" mode display buffer using the 6809 assembly language snippet shown below, which assumes only that the X register is pointing to the correct position in the display buffer:

 LDD #$A3A3
 STD ,X++

The OR operation is not necessary here. It is necessary when drawing a line segment using single-pixel plot operations, because otherwise each plot operation could potentially overwrite the prior plot operation's output. Here, though, we are dealing with all four pixels en masse, and there is no overwriting to worry about. In addition, we only need a single load and a single store, since four pixels, in the best case, will represent two contiguous bytes. Finally, the considerable overhead associated with setting X properly is amortized over multiple pixels if one draws four pixels at once. 

If there is some device in existence where such single-pixel operations are still optimal, algorithm 3 can be augmented with an innermost iterative loop to implement the HLIN and VLIN primitives. This seems more logical than enforcing Bresenham's (single-pixel) algorithm on the vast array of multi-pixel devices available. 

Like these devices, the CoCo can do better than plot(). The calculation of variable W, as shown in algorithm 3, provides a good foundation for fully exploiting the CoCo's four-pixel burst capabilities.  

The relatively full suite of operations available on the 6809 (versus the 6800 or the 6502, for example) also influences the CoCo application's design. In summary, the main design principles followed are enumerated below: 

* Direct Page mode is used wherever possible. In this mode, memory access within a programmable 256-byte page is faster, and the required opcodes are shorter. This is a signature feature of the 6809; its ancestor CPU, the 6800, provided a similar but non-programmable facility for bytes 0-255 only. In the CoCo code provided here, theDP register is saved at the start of the drawing routine, and is then set to a value that allows for fast access to the routine's parameters and variables. Before returning to the caller, the routine restores the DP register. 

* 16-bit operations are favored. While the 6809 is usually categorized as an 8-bit processor, examination of its datasheet [8] reveals that it is really a 16-bit device that happens to have an 8-bit data bus. In this respect, it is similar to the Intel 8088 (traditionally categorized as a 16-bit CPU despite its 8-bit data bus) or i386SX (a device with a 16-bit data bus that is nevertheless considered a 32-bit CPU). For example, the 6809's 16-bit load operations (LDXLDU, etc.) take 5 clock cycles for a direct page mode operation, just one more cycle than the equivalent 8-bit operations (LDALDB, and so on). Examination of detailed sources about the CoCo revealed no mention of any additional wait state ([9]). This single extra clock cycle is presumably mandatory to move the extra data, since the 6809 has only 8 data pins, but other than that, the 16-bit operation is just as fast as its 8-bit analog. This pattern continues across the whole 6809 instruction set, and as a result, full employment of 16-bit operations is necessary for optimal 6809 performance. 

* Algorithm 3 is implemented. Algorithm 1 does not support burst mode adequately. The gain in resolution compared to algorithm 2 was considered desirable. The consistent pixel group cardinality evident in this algorithm's output promises to better exploit burst mode compared to Bresenham's algorithm. Bresenham's algorithm more readily inserts smaller (e.g. one or two-pixel groups) into its approximation output, and these do not lend themselves quite so well to burst mode drawing.

Other compound and high-level 6809 operations are fully employed. For instance, shift operations are used to effect multiplication, but the MUL opcode is also used where this is not practical. This is a 16-bit multiply operation that completes in 11 clock cycles, i.e. about as much time as two 8-bit load or store operations would take. The 6809 also offers post-increment store operations, and these are utilized where applicable, e.g. when drawing long sequences of horizontally contiguous pixels. 

Earlier in the article, it was mentioned that repeated subtraction is used for the 16-bit division necessitated by algorithm 3. This design choice was weighed against two different shift-based implementations, using the "restoring" and "non-restoring" algorithms. However, in empirical testing using a practical application (which will be presented in a future article), the shift-based approaches offered no speed improvement, and came with considerable costs in terms of code length and difficulty.

This result relates, in part, to the nature of the test application used. The execution time of the repeated subtraction logic is directly proportional to the quotient it returns; a high quotient implies that many subtractions proved necessary. A line-drawing application with many nearly horizontal or nearly vertical lines might benefit from a shift-based implementation (assuming the developer could find room for it in RAM).

Overall, the implementation actually presented is fast, but could probably still be optimized. Some other, CoCo-specific algorithm constructed without reference to a general algorithm like the ones shown earlier might be faster. However, if one begins with the premise of starting from a general line-drawing algorithm and proceeding to a specific implementation, it is difficult to find fault with the CoCo implementation given here. It remains faithful to the general algorithm without leaving too much on the proverbial table performance-wise.  

CoCo Development Environment  

The decision to use the CoCo as an example platform for this article was influenced by the openness and simplicity of its architecture, and by the availability of free tools for CoCo development on a standard Windows PC.

Out-of-the-box, any Tandy / Radio Shack Color Computer offers the ability to write programs in Microsoft BASIC with 6809 machine language routines. The BASIC interpreter that loads at startup facilitates the loading and execution of machine language subroutines.


In particular, CoCo BASIC includes a DATA statement, which emits a series of numeric arguments directly into a BASIC program data segment.  Here, these numeric values are machine language for the 6809 CPU. These are read into a variable using the READ command, and then placed into memory at a designated location using the POKE command. Once this is done, the rest of the BASIC program can invoke the resultant machine language routine using the EXEC command. 

The combination of 6809 machine language and Microsoft BASIC is an attractive one. It offers the full power of the 6809 CPU, but allows it to be scripted in the accessible BASIC language. However, the development of machine language subroutines is tedious if some additional steps are not taken. It is much easier to develop in 6809 assembly language than in machine language, and Motorola has released a PC-hosted assembler into the public domain that allows for this to happen. 

This Motorola AS11 assembler targets a wide variety of processors, including the 6800, 6809, and 68HC11. "As9.exe" is the name of the actual executable used to build 6809 assembly language. On the author's development PC, the contents of the provided AS11 archive were unzipped to a folder named "C:\coco\as11."

The code provided with the article resides under a single top-level "coco" folder. This folder can be relocated as needed by the reader; all that is necessary is to use different cd commands in the Command Prompt sessions used in the build process described below. These commands will need to change the working directory to the reader's preferred folder. The use of "C:\coco\," "c:\coco\as11," etc. are assumed in the discussion below.

To assemble the provided assembly language code ("line.asm") on a development PC configured in this way, a Command Prompt window must be opened. There, the cd command (cd\coco\as11) should be used to go to this folder. 

Now, "as9.exe" can be invoked directly. After its name, the command line must contain a single parameter, which is the path to the input assembly language file. For this application, "line.asm" is stored in the parent folder of the "c:\coco\as11" folder, i.e. in "c:\coco" itself. So, the necessary command was as9.exe ..\line.asm.

The output printed to stdout by this command contains the sequence of numbers that must ultimately be loaded using BASIC language DATA commands. Unfortunately, these are not in a very convenient format. They are interspersed with the overall assembly language listing, including mnemonics and addresses, and are also expressed in hexadecimal. (Microsoft BASIC's DATA statement accepts base 10 by default.)

In addition to the text written to stdout, the assembler also creates file "M.OUT." On the author's development computer, this file gets created in folder "C:\coco\as11." It is in Motorola SREC format, and contains the object code generated by the assembler, in a special hexadecimal format.

A utility that converts SREC format into a CoCo BASIC loader program is provided in the article's source code archive. The default folder location for this utility is "C:\coco\coco_obj2bas" and it consists of a single C++ source file, "coco_obj2bas.cpp." The location and name of its input and output files are determined by preprocessor directives at the top of this C++ file:
#define INPUT_FILE "c:/coco/as11/M.OUT"
#define OUTPUT_FILE "c:/coco/as11/line.bas"

So, as provided, this utility program will accept input file "c:\coco\as11\m.out" (in SREC format) and output a CoCo BASIC program. This program consists of DATA statements holding the contents of the SREC file, along with a loader loop that takes this data and loads it into RAM. This is done starting at address 37AC (hexadecimal). This is an address selected to work well on most CoCos with at least 16 kilobytes of RAM.

The source code archive provided with the article includes a built version of the coco_obj2bas utility, named "coco_obj2bas.exe." This can be run directly from the "C:\coco\coco_obj2bas" folder.

A portion of the BASIC program generated by the utility is shown below. Most of the DATA statements have been omitted for the sake of brevity. This mass of integers would not have much significance to most readers anyway. Later, the assembly language code that underlies this machine language routine is discussed:
130 DATA 128,12,150,150,156,151,158,57,205
2000 FOR I=0 TO 392
2010 READ K
2020 POKE 14252+I,K
2030 NEXT I

Most of the code shown above relates to the loading of the machine language routine into memory at the designated location. The number 14,252 is the decimal equivalent of 37AC hexadecimal. The range of the main FOR loop is set by the coco_obj2bas utility. The READ statement, executed in succession as it is above, gets successive elements out of the DATAlist at the top of the program.

The BASIC program outputted by coco_obj2bas.exe, and partially shown in the code fragment above, is not sufficient to do any actual rendering to the display. All it does is load the line drawing routine into memory at address 37AC. To actually call the routine, an EXEC 14252 statement is necessary. Before this statement is executed, though, parameters must be placed in locations 14,229 through 14,234 (decimal). Counting up from 14,229, these parameters are the starting X coordinate (2 bytes), starting Y coordinate (1 byte), ending X coordinate (1 byte), ending Y coordinate (1 byte), and color (1 byte). The starting X coordinate is a 16-bit number because the algorithm implementation does 16-bit arithmetic on it. The color byte takes the form A0, B0, C0, etc.; it consists of the top nibble corresponding to the desired color, and will be combined into the ultimate graphical character values used using bitwise OR operations.

An example of an actual call into the machine language routine is shown below. Provided BASIC program "line.customized.bas", which is a modification of the output of the utility program just described, ends with several such calls:
2055 POKE 14229,0
2060 POKE 14230,11
2065 POKE 14231,2
2070 POKE 14232,2
2080 POKE 14233,4
2085 POKE 14234,224
2090 EXEC 14252

The code shown above draws a line segment from (11,2) to (2,4). The color byte is 224, or hex E0, which is purple. That is, all of the characters in the CoCo character set with E as their top nibble are purple graphics characters. 

In summary, these are the steps used to build the CoCo demonstration:
  • Assemble "line.asm" to SREC format object code using an AS9 command, e.g.
c:\coco\as11>as9 ..\line.asm
  • Use the coco_obj2bas utility to convert the SREC file into a BASIC loader program, e.g.
 c:\coco\coco_obj2bas>coco_obj2bas
  • Add calls to the line drawing routine, and other program control code, to the bottom of "c:\coco\as11\line.bas." A CLEAR statement should also be added, to set up an area for the machine language routine. Provided file "line.customized.bas" already contains all of the necessary code.
At the end of this step, you will have a complete BASIC program ready to be typed directly into any CoCo with at least 16KB of RAM is created3. Initially, this is exactly how I did things, typing my program manually into the Color Computer 2 shown in figure 9 below. but later on I discovered Jeff Vavasour's CoCo emulator, which has been released into the public domain. This emulator is provided with the article source code. Its main executable is located at "c:\coco\coco.exe."


Figure 9: Color Computer 2 (Later Production Example)

It is possible to paste a BASIC program into the emulator using the Windows clipboard. To do this, perform the following steps:
  1. Open CoCo.exe (in full screen mode).
  2. Press ESC to clear the initial dialog.
  3. Press ALT+TAB to return to Windows.
  4. Put the program text on the clipboard, e.g. by highlighting it in a text editor and selecting "Copy."
  5. Right-click CoCo.exe in the taskbar, and choose "Edit," then "Paste".
  6. Double-click CoCo.exe in the taskbar.
  7. Wait while the program gets pasted into the CoCo.exe session. This will take several seconds. If the text that got pasted did not end with a newline (or if you cannot tell), press ENTER.
  8. Type RUN at the CoCo command prompt to see actual line segment rendering. The demo supplied in "line.customized.bas" will show several lines and then exit. To continue, the user must press ENTER after each line is displayed.
Figure 10 below shows the CoCo demonstration program after it has exited and returned the user to the BASIC environment. This picture demonstrates the intermingling of text and graphics possible in "semigraphics" mode. Note that, before ENTER was pressed for the final time to exit the program, the same lines shown in figure 10 were evident against a plain black background.




Figure 10: The CoCo Demonstration After Completion

This procedure has been tested on Windows XP; it has not been tested on subsequent versions of Windows, because of the difficulty in finding a newer system that supports full screen mode. Also, on some systems, it may be advisable to paste the program in using multiple operations. Sometimes, for particularly long programs, characters can get omitted. Using multiple paste operations will involve repeating steps 3-7.

The last section covered the details of building a 6809 machine language / Microsoft BASIC line-drawing program for the CoCo on a Windows PC. This section describes the actual assembly language program behind the machine language line-drawing routine. This program is located in file "coco\line.asm" in the source code archive provided. This program, which is located in file "coco\line.asm," begins as shown below:

* Graphical characters
TOPLEFT EQU $08
BTMLEFT EQU $02
TOPRIGHT EQU $04
BTMRIGHT EQU $01
TWOTOP EQU $0C
TWOBTM EQU $03

 ORG $3795

*Parameters
HPARMX1 RMB $1
LPARMX1 RMB $1
PARMY1 RMB $1
PARMX2 RMB $1
PARMY2 RMB $1
COLPARM RMB $1

The code shown above begins with the declaration of constants. All of the constants declared here relate to the lower nibble of the graphics characters used by the line drawing routine. The zero in the top nibble of each constant value will get replaced with the color parameter passed into the routine, using a bitwise OR operation. The names used for these constants reflect their appearance, e.g. use of TWOTOP results in a character with both of its top pixels illuminated in the foreground color.

After that, the ORG directive positions the start of the parameters at location 3795 (hexadecimal). This is the same address as decimal location 14,229, which was evident in the BASIC call shown above. This location plus the length of all parameters and variables used by the line drawing routine, add up to the value 14,252 (hexadecimal 37AC), which is the entry point of the routine.

Next come the parameter declarations. For maximum flexibility, these are all given individual identifiers. Nevertheless, recall that the first X coordinate is passed as a 16-bit value. This is passed in big-endian order, as is expected by the 16-bit operations eventually performed on this value.

After these parameter declarations, the code continues with the declaration of variables:

*Variables
SAVEDPVAR RMB 1
SAVEREGVAR RMB 1
TEMPVAR RMB 1
WVAR RMB 1
DYVAR RMB 1
DXVAR RMB 1
S2VAR RMB 1
SVAR RMB 1
S1WVAR RMB 1
SIGNY1VAR RMB 1
SIGNY2VAR RMB 1
S1VAR RMB 1
S2WVAR RMB 1
EVAR RMB 1
WHCHRVAR RMB 1
WHCHLVAR RMB 1
TPRBTMVAR RMB 1

Some of these are related to low-level implementation details. SAVEDPVAR is an example; it is used to save the contents of the direct page register and restore them after the routine completes. However, many of the variables shown above relate to variables shown in algorithm 3 above. EVAR corresponds to variable e in algorithm 3, S1VAR tos1, and so on.
The CoCo implementation continues as shown below:

*SAVE DP
LINE TFR DP,A
 STA SAVEDPVAR
 LDA #$37
 TFR A,DP
*IF X2 EQ X1 INC X2
 LDA <PARMX2
 CMPA <LPARMX1
 BNE ITRR9
 INC <PARMX2
ITRR9

The segment of code shown above begins by saving the DP register, so that the line drawing routine can change this register's value without disrupting the caller. Then, the actual implementation of algorithm 3 begins, with the condition related to cases where the two X parameters are equal. This proceeds in intuitive fashion; the left comparand is loaded inth the A register, and then a CMPA operation is executed, with the right comparand as its single argument. Then, a branch is executed if the condition from the original if in algorithm 3 is not met. Finally, we have the code that executes if the if condition is true. After the value of DP has been set, all of the memory-based operands use direct page mode syntax, which means that they are prefixed by character <. 

The code goes on in similar fashion, sticking closely to the procedure outlined in algorithm 3. Once the algorithm'sHLIN call is reached, though, some very platform-specific logic becomes necessary. The beginning of the CoCo HLINroutine is shown below:

HLIN STA <SAVEREGVAR
 LDA <LPARMX1
 STA <TEMPVAR
*LOAD ADDRESSING REGISTER
*SET D TO 32 MUL PARMY1 DIV 2
STT LDB <PARMY1
 ASRB
 LDA #$20
 MUL
*SET X TO D PLUS 1024 PLUS PARMX1 DIV 2
 LDU <HPARMX1
 ASR <LPARMX1
EVNA ADDD <HPARMX1
 STU <HPARMX1
 ADDD #1024
 TFR D,X

This latest snippet begins with some code to preserve the caller's context. HLIN is destructive to WVAR, so special steps are taken to preserve and restore it. After that comes the code to initialize the X pointer with the correct 16-bit pointer into the display buffer. The X register serves as an index into the display buffer throughout the elaboration of each HLIN call. In the original nomenclature of algorithm 3, the value being calculated to initialize X is:

1024 + (Y1/2*32) + X1/2

This calculation reflects the fact that the CoCo display buffer begins at location 1,024 (decimal), with two rows and two columns per screen character (byte), and 32 pixels per row. This expression is full of potential difficulties for an 8-bit processor, in its 16-bit literal integer and its multiple division and multiplication operations. It actually works out fairly easily on the 6809, though.

Each of the arithmetic operations above ends up equating to just one 6809 opcode. The code benefits from the fact that both divisions are by powers of 2, and can therefore be implemented using shift operations.

The multiplication operation does not exhibit this characteristic, but fortunately the 6809 MUL opcode suits the purposes of algorithm 3 very well. This multiply opcode takes 8-bit registers A and B and multiplies them, leaving the result in the combined A:B register (also known as D). Here, we are multiplying two 8-bit quantities (the number 32 and a CoCo coordinate) to yield what will eventually be a 16-bit quantity (a 6809 pointer), so this opcode fits well.

Some other 16-bit operations are present in the code above. Near the end of the code, the 16-bit pairHPARMX1:LPARMX1 is shifted right to yield the value of (again, using the original nomenclature of algorithm 3) (X1)/2. This value is then added back into the main 16-bit accumulator D. This is a convenient way of doing things, but it is also destructive to HPARMX1:LPARMX1. So, the original value of this two-byte RAM variable is stored in / restored from the 16-bit U register4.

The CoCo HLIN routine proceeds as shown below:

 LDB <PARMY1
 ANDB #1
 BEQ ITAB1T
* CURRENT Y IS ODD
 LDA #TWOBTM
 ORA <COLPARM
 STA <TPRBTMVAR
 LDA #BTMRIGHT
 ORA <COLPARM
 STA <WHCHRVAR
 LDA #BTMLEFT
 ORA <COLPARM
 STA <WHCHLVAR
 BRA ITAB2T
* CURRENT Y IS EVEN
ITAB1T LDA #TWOTOP
 ORA <COLPARM
 STA <TPRBTMVAR
 LDA #TOPRIGHT
 ORA <COLPARM
 STA <WHCHRVAR
 LDA #TOPLEFT 
 ORA <COLPARM
 STA <WHCHLVAR
ITAB2T LDB <LPARMX1
 ANDB #1
 BEQ ITBB2T

Here, we see the implementation of a large if/else structure; the initial conditional branch selects between a top clause that executes when the current Y coordinate (i.e. the coordinate where the next plot will occur) as odd, and a bottom clause that executes otherwise. Each clause loads some variables with graphical characters, for future use in the remainder of the HLIN routine. If Y is odd, then these characters all have their top pixel(s) illuminated, otherwise it is the equivalent bottom pixel(s) that are illuminated.

Finally, at the very bottom of this latest code snippet, a similar conditional branch is executed. This is the start of another, similar if/else structure based on the current X coordinate. Here, though, the resultant clauses that get executed do not load variables; rather, they perform actual drawing. At this point, the address to which drawing will initially occur is known, along with whether or not X and Y are even/odd.

This is sufficient information to draw the first graphical character. If the current X position is odd, and the Y position is even, for instance, then it is the top right pixel of the character located at the address in register X that needs to be turned on. If the current X position is odd, and the Y position is also odd, then it is the bottom right pixel that needs to be turned on. The next segment of code shown below deals with either of these cases:

 BEQ ITBB2T
* CURRENT X IS ODD DRAW SINGLE RT SQUARE
 LDA <WHCHRVAR
 ORA ,X
 STA ,X+
 DEC <WVAR
 INC <LPARMX1
ITBB2T

This latest portion of the HLIN code demonstrates several important aspects of the CoCo implementation. First, it uses WHCHRVAR instead of a literal graphical character number, since this code is shared by both the case where the currentY coordinate is odd and the case where it is even. This is a pattern which repeats itself throughout the HLIN code, for all of the drawing operations.

Also evident above (in STA ,X+) is the use of post-increment addressing to move our pointer, X, through the display buffer. However, the code shown above does not use a 16-bit transfer to the display buffer, since it is only drawing a single pixel. Once it has taken care of this single pixel, necessitated by our input X coordinate, the HLIN code will move on to looking for opportunities to draw using burst mode.

One more aspect of the code shown above that is used throughout HLIN is the use of bitwise OR to draw to the display buffer. Specifically, the display buffer byte being written to is first loaded into the main accumulator, then ORed with the new pixel's appearance byte, and then written back out to the same location.

Earlier in the article, it was mentioned that it is not strictly necessary for a CoCo "semigraphics" line-drawing routine to use OR in this way. Because the four-pixel plot operation and the two-pixel plot operation handle the display buffer bytes that they affect fully, they can be omitted without affecting the integrity of the line segment approximation. These operations can still disrupt (remove) pixels plotted by previous operations that lie near the line segment, if OR is omitted.

In some cases, this overwriting behavior may actually be desirable. This might be the case if lines are being rendered in Z-dimension order, with nearer lines rendered last. In other applications, the omission of pixels caused by the overwriting behavior will be preferable to the discoloration that can occur if OR is used with multiple colors. So, comments are present in the provided 6809 assembly language code that describe how to remove these OR operations.

Some OR operations are still necessary, e.g. to build the cache of graphics character in the selected color, and to build parts of the line segment approximation where diagonal characters, like the one shown in the bottom left corner of figure 8, are required. Diagonal characters are never explicitly used by the implementation code. Rather, they are generated as a result of OR operations, and the simplified decision-making logic associated with this design yields good performance.

At is end, the last snippet of code shown above increments the X position variable and decrements WVAR, to updateHLIN's state. HLIN uses WVAR as a running count of the number of pixels that still need to be drawn, and similarly maintains the current X position in memory location HPARMX1:LPARMX1. (HPARMX1 is, of course, 0 throughout these operations, but the 16-bit storage format facilitates certain fast 16-bit operations.)

At this point, the HLIN code can rely on the fact that the current X coordinate for drawing is even. So, here we can check for the opportunity to draw four pixels at once, or two pixels at once. Three-pixel operations are not employed; it is doubtful that they would offer much performance benefit, because of the extra setup and decision logic that would be required. A considerable number of 16-bit graphical character variables would have to be initialized, for example, and this would have to be done based on additional conditional operations.

The next portion of the HLIN code checks to see if WVAR is at least four. So long as this condition holds, four-pixel plot operations are performed:

* WHILE W GT 3
XSTLT LDA <WVAR
 CMPA #3
 BLE TLTLT
*DRAW QUAD
 LDD ,X
 ORA <TPRBTMVAR
 ORB <TPRBTMVAR
 STD ,X++
 LDA <WVAR
 SUBA #4
 STA <WVAR
 LDA <LPARMX1
 ADDA #4
 STA <LPARMX1
* END WHILE
 BRA XSTLT

Most of this loop is similar to the 6809 code already shown. The while structure is similar to an if, but with a BRA at the bottom that returns to the loop's condition. The 16-bit operations are also new. The OR operation is done using two 8-bit operations, since there is no OR operation for the D register. Two plus signs are used instead of one when performing post-decrement addressing; this reflects the 16-bit nature of the transfer. The display buffer pointer X needs to move forward by two bytes (four pixels). Finally, the HLIN state is update as it was in the last drawing operation shown, but here we must account for four pixels, not just one.

After the code shown above, HLIN checks to see if a two-pixel draw can be performed. This is possible if WVAR is at least two. At this point in the code, the current X plot position is known to be even; we arrived here either because the current X position was even but no four-pixel draw was possible, or because it was even and then all possible four-pixel operations were performed. So, any remaining two-pixel (or three-pixel) block needing to be drawn will necessarily start with a single two-pixel character, and the decision logic at this point in the code is thus simplified.

Finally, HLIN ends with a check to see if a single pixel needs to be drawn after all four-pixel and two-pixel operations have completed. Again, the current X coordinate at this point is known the be even. For this final plot operation, this implies that if a single, final pixel is drawn, it will be in the left half of the graphics character, i.e. will involve graphics character WHICHLVAR.

A Symmetrical Algorithm

The work of B. Wyvil (who drew inspiration from the early work of X. Wu) deserves mention here, as it offers two major innovations compared to that of Bresenham [10]. First, Wyvil draws the line segment from both ends simultaneously, and thus cuts in half the number of outer loop iterations, and simplifies the necessary calculations. Second, each iteration of Wyvil's loop determines and plots the next three pixels on each side of the midpoint (for a total of six per iteration).

The concept of drawing a line segment symmetrically from its two endpoints is definitely an attractive one. Superficially, it promises to cut the CPU’s calculations in half. The chief drawback to this technique is the existence of real line segments whose correct approximations are not naturally symmetrical.

Wyvil works around this obstacle neatly, but his approach to the symmetry problem is tightly coupled to the three-pixel plot logic he developed. And this logic is incompatible with the scheme described here for exploiting burst mode. The idea of a contiguous row or column, which can be drawn in one operation using optimized primitives, is central to my work.

Wyvil’s algorithm does not make use of horizontally or vertically contiguous pixel groups; rather, it always calculates three pixels at a time. If two or more of these pixels happen to be contiguous in the same dimension (they constitute a pixel group), then in that case it is possible that an optimized version of Wyvil’s algorithm could exploit burst mode. But the algorithm has no ability to handle pixel groups of cardinality greater than three in an optimal fashion.

Conclusion

Bresenham’s work leaves open the question of how to exploit burst mode. There is a niche for algorithms dedicated to the fast line-drawing task on burst mode output devices, and algorithms 2 and 3 above attempt to fill this niche. 

Like algorithms 2 and 3 presented in this work, Wyvil’s algorithm does, in some sense, plan to plot more than one pixel at once. Ultimately, though, it does not exploit the multi-pixel burst mode most prevalent on modern hardware.

One way to build an algorithm that does exploit burst mode is to work with primitives like HLIN and VLIN. This is a realistic model of many raster display devices.  

Finally, one must allow for the possibility that Bresenham's definition of the set of segments to be modeled may not always be ideal. As raster devices improve in quality and achieve lower dot pitch numbers, the error exhibited at the endpoints of the Bresenham approximation will only grow larger. Modeling segments ending at pixel corners (vs. pixel centers) corrects this issue, and also offers increased simplicity and apparent resolution.

History 

This is the third major version of the article. In each version, the clarity and completeness of some of the explanatory passages was improved.  Some corrections to spelling and grammar have also been made. The third version added a CLEAR statement to the CoCo BASIC code, which is necessary for technical correctness. Otherwise, the code has remained unchanged. 

Footnotes 

1. All three algorithms presented assume the existence of system-specific graphical routines. The following code fragment is provided to allow the diligent reader to actually run the algorithms on any system. It should be inserted into the source text above the algorithm to be run. Rather than displaying graphics, these routines output the correct outcome in self-explanatory English:

#include <stdio.h> 
 
void plot(int x, int y, int c){ 
 printf( 
  “Pixel of color %d at (%d,%d)\n”, 
   c,x,y 
 );  
} 
   
void HLIN(int x, int y, int w, int c){ 
 printf( 
  “Pixel row of width %d at (%d,%d) of color %d\n”, 
    w,x,y,c 
 ); 
} 
 
void VLIN(int x, int y, int w, int c){ 
 printf( 
  “Pixel col. of height %d at (%d,%d) of color %d\n”, 
   w,x,y,c 
 ); 
} 

2. The topmost, leftmost pixel is designated (0,0) for all purposes in this article.

3. The CoCo can be placed into high-speed mode at any time by running the command POKE 65495,0. This doubles the clock speed at which the 6809 runs. This does not make much of a difference in performance for the brief demonstration program described here.

4. The reader should consider the possibility that using U in this way could disrupt the code calling into the line drawing routine. It did not seem to disrupt the CoCo BASIC environment.

References

  1. J. E. Bresenham, Algorithm for Computer Control of a Digital Plotter, IBM Systems Journal 4(1), (1965) 25-30.
  2. X. Wu, An efficient antialiasing technique, Computer Graphics 25(4), (1991) 143–152.
  3. E. Angel, Interactive Computer Graphics: A Top-Down Approach with OpenGL (Boston, Addison-Wesley, 2000). 
  4. D. F. Rogers, Procedural Elements for Computer Graphics (New York, McGraw-Hill, 1985).
  5. F. S. Hill, Computer Graphics (New York, MacMillan, 1990).
  6. K. R. Irvine, Assembly Language Programming for the IBM PC, 2nd Edition (New York, MacMillan, 1993).
  7. Anonymous, Color Computer 3 Extended Basic (Fort Worth, Tandy Corporation, 1986).
  8. Anonymous, MC6809 / MC6809E Microprocessor Programming Manual (Schaumburg, Motorola Inc., 1981).
  9. T. Ahrens, J. Brown, and H. Scales, What's Inside Radio Shack's Color Computer?, in: BYTE 6 (3), (1981).
  10. B. Wyvil, Symmetric double step line algorithm, in: A.S. Glassner, ed., Graphics Gems, ed. (San Diego, AP Professional Press, 1993) 101-104.

License

This article, along with any associated source code and files, is licensed under The GNU General Public License (GPLv3)