Saturday, February 14, 2015

Fast, Windowed 3D Graphics on the Tandy/Radio Shack Color Computer

 
Figure 1: Composite of four instances of the wireframe pyramid demo running atop the startup (BASIC) shell. The pyramid rotates about the Y axis and is rendered at approximately 45 frames-per-second.
 
 
 
Figure 2: The wireframe cube demo after completion. The user's Microsoft BASIC session has resumed.
 

Introduction       

Superficially, the the Tandy/Radio Shack Color Computer is not a device whose graphical rendering capabilities inspire much awe. Even when the original "CoCo" was released, 894 kilohertz was not an impressive clock speed. Its competitors almost universally offered a speed rating of at least 1 megahertz. The auxiliary display hardware surrounding the CoCo's CPU is also comparatively simple. Contemporary offerings from Commodore, Texas Instruments, and Apple had higher resolution and better color depth.   
 
Unlike the Commodore 64 of the TI-99/4A, the CoCo's display device uses a simple frame buffer. The CoCo's rendering hardware, for example, offers no support for anything like a sprite, and rendering activities require the CPU to move values into the frame buffer without assistance from any support device. Initial versions of the CoCo also had a calculator-like keyboard that made serious usage difficult.  
 
Ultimately, though, there are some aspects of the CoCo's design that make it a very appealing computing device, and one that I find myself still using over 30 years after its initial release, and using instead of my TI-99/4A. Here, I take up the challenge of constructing 3D wireframe applications for the Color Computer. The demonstrations provided run on all versions of the CoCo, even the most primitive, so long as at least 16KB of RAM is installed.
 
Each of the two demo programs discussed here renders a rotating 3D solid, projected using vanishing point perspective. The second of these programs makes extensive use of assembly language, and runs at approximately 45 frames-per-second. Its 3D rendering occupies a window which takes up most of the screen, but leaves the rest of it undisturbed. The solid used is a pyramid with a square base. It is rendered in any of 8 colors against a black background. 
 
During the execution of this program, the Color BASIC session from which the program is executed remains visible and undisturbed. It resumes when the user exits the demo by pressing the "Reset" button on the back of the CoCo. This provides for a high level of integration between text and graphics, and foreshadows some windowing techniques that are typical of much more advanced hardware.  
 
The screen resolution for the demo is 64x32 pixels / 32x16 characters. This is the only CoCo mode in which all nine colors (8 foreground colors plus a mandatory black background) are available. The CoCo 3D demo was prototyped using HTML5 Canvas, and this code is also discussed below. Like the CoCo demo, it uses a 64x32 resolution; this results in a very small HTML5 display compared to the ultimate CoCo output.
 

Background    

Whether one could reasonably transform such techniques into a more complete application is a challenge for the expert; however, others have already undertaken similar efforts. Tandy gave us Dungeons of Daggorath, a well-received CoCo game built around animated 3D wireframe graphics, and one of the first consumer games to attempt to present a real time 3D simulation. So, the plausibility of a real-time 3D game on the CoCo has been established. 
 
Dungeons of Daggorath used a higher-resolution display mode than is used for this article, with only 2 colors and no hardware support for text rendering. Looking at figure 2 above, though, it is not difficult to imagine how the "Semigraphics 4" mode employed in this article might enable all sorts of interesting visual effects, especially in an adventure game like Dungeons of Daggorath. The prospect of rendering things like dialog, score, and other textual messages on top of or in conjunction with 9-color 3D graphics- and doing so one or two characters at a time instead of pixel by pixel- is an intriguing one.
 
This prospect hints at the way text and graphics were eventually combined on future Motorola-based computers like the Amiga, Atari ST, and Macintosh. By that time, the seamless combination of text and graphics evident on the display reflected the fact that the computer was now fast enough to draw the entire display (text and all) as a raster bitmap. Some CoCo titles used similar raster GUIs; Tandy's Desktop Publisher was such a title.  
 
Later, the CoCo 3 ran a port of Microsoft Flight Simulator (SubLogic's Flight Simulator II), and this was a high-quality application. Its color depth was superior to most IBM-derived machines of the time.    
 
The techniques described here offer one key advantage over both Dungeons or Daggorath and Flight Simulator II: higher frame rate. A whole screen occupies just 512 bytes in Semigraphics 4 mode, and the CoCo can address these in 16-bit chunks. These factors combine to allow for a refresh rate - and a degree of realistic movement - that is just not possible in higher-resolution modes.
 
It should be noted that this article is a sequel to another of my articles, Beyond Bresenham: New Algorithms for Drawing Line Segments. There, the line-drawing algorithm used here is first laid out in general terms. Then, in the second half of the article, a CoCo implementation is described in depth.
 
Much of the work necessary to do the CoCo 3D wireframe modelling for this article was therefore already done. Readers who are mostly interested in fast CoCo line-drawing (or line drawing in general) are advised to start with my first article.     
The additions necessary to build a line-drawing routine into a 3D wireframe  demo are several: 
  • Page flipping; the line-drawing demo in my first article presented a static rendering, and thus drew directly into the display buffer. Here, such an approach would result in severe flicker. Instead, page_flipping is used, and the attendant issues of memory allocation, timing, etc. are amply discussed below.     
  • 3D Rotation and projection operations; These were prototyped using an HTML5 application (specifically the HTML5 Canvas type) before being algebraically simplified and ported to the CoCo. 
  • Window management infrastructure; methods for clearing a display page, and for clearing a designated graphics window, are described in this latest article.  These operations represent potential performance bottlenecks; the CoCo hardware has no special facility to help clear the screen or draw rectangles. Rather, many consecutive write operations into the display buffer are necessary, and it is important that these be optimal.      
Finally, note that the line-drawing routine from Beyond Bresenham: New Algorithms for Drawing Line Segments did not support the rendering of lines with slopes having a magnitude greater than one. That article was more focused on exploiting some burst mode pixel-drawing features of the CoCo, which are most useful for lines with a more horizontal slope. For this article, though, support for line segments of all types is included.
 

A Motorola Personal Computer   

If analyzed in greater detail, the CoCo design reveals a level of cohesiveness and openness that ultimately derive from its design's origin: perhaps more than any other microcomputer, the Color Computer reflects Motorola's vision for these devices, at one particular point in time. There are, of course, many microcomputers with Motorola CPUs, and these are mostly well-regarded. In the CoCo's case, though, the entire motherboard consists of a few, highly-integrated Motorola chips, and the entire design of the overall device thus seems to have been largely anticipated inside of Motorola and then manufactured by Radio Shack with only the most basic of additions. 
 
Tandy's role consisted of marketing the CoCo, but also of ensuring that it would have software, and an expansion path, and would continue to be supported. Radio Shack sold versions of the CoCo for about a decade, even after it had quickly become clear that IBM and Apple were the major platforms. This allowed for the development of a lot of software, and of peripherals. These are welcome additions to a design that was sound from the start.  
 
Figure 3: The author's Color Computer 2. This is a later production example, and its badging thus omits the "TRS-80" designation.
 

The 6809 Processor   

At the center of the design resides the 6809 CPU. Like the more popular 6502, this is a derivative of the original Motorola 6800. The 6809 really represents a third generation of the family, though. The 6502 was created as a quick rework of the 6800, by a company hoping to undercut Motorola's price for the 6800. The 6809, on the other hand, was developed within Motorola, and represented Motorola's well-organized attempt to build on the success of the 6800. To a great extent, it was designed by people who had seen - and competed with - the 6502. 
 
Motorola seems to have tried to address issues associated with developing software for the 6800 in its design for the 6809. The 6809 includes a set of features designed to facilitate the creation of position-independent code. Code created according to this model is very robust with respect to design changes, and variations in the target platform.
 
While the end result present in this article's source code archive is basically static in the way it allocates the CoCo's RAM, code was moved around frequently during development, in an effort to determine the optimal location for the display buffers, code and data segments, etc., and the position-independence provided by the 6809 helped with such chores. Some more information about this is given further below, in the "RAM Allocation" section. More so than any of its predecessors, I believe the 6809 to offer a well-documented and rational plan for software development. 
 
A more immediate benefit is paid by the 6809's many 16-bit features, which are basically absent on the 6800 and the 6502. The 6809 includes 16-bit arithmetic and move operations. Here, these move operations are used to actually draw and clear the display buffers, resulting in faster rendering. A single 16-bit move operation can draw up to 8 "Semigraphics 4" pixels, or two text characters. Beyond Bresenham: New Algorithms for Drawing Line Segments discusses the machinations necessary to fully exploit these modes of operation.
 
The 16-bit arithmetic instructions are used in various places as well. Several of the multiplications evident in the C code for algorithm 3 use the 16-bit MUL instruction. This multiplies two 8-bit accumulators, yielding a 16-bit result. There is also some 16-bit arithmetic involving pointers into video buffers.
 
Importantly, most of these operations are not inherently slower than their 8-bit counterparts. Typically1, only a single extra clock cycle is necessary, which seems mandatory given the need to multiplex 16 bits onto an 8-pin data bus. Just as the Motorola 68000 was a 32-bit CPU on a 16-bit bus, its compatriot the 6809 is really a 16-bit processor on an 8-bit bus, and this frees developers of software for these processors to think at a higher level of abstraction.
 

Color BASIC  

In the CoCo, this powerful and flexible programming model is combined with a ROM-resident implementation of Microsoft BASIC, branded as Color BASIC. Importantly, this language includes support for machine language routines. TI-99/4A BASIC lacks this feature, requiring the developer to acquire TI Extended BASIC (or an assembler package) to do serious development. 
 
Another practical development consideration is the ease with which the CoCo interfaces with peripherals. The CoCo has an RS-232C serial port, along with two analog ports. The latter were originally intended for joysticks, but are adaptable to a wide variety of control applications. The joystick ports and the serial port are both well-supported by routines in the CoCo ROM. These are essentially part of the ROM-resident BASIC, but can also be used by machine language routines and by programs written in other languages.  
 
The demo application presented here performs real number calculations using Color BASIC, and also uses BASIC to load its machine language routines into RAM. Color BASIC arrays are use to hold pre-calculated coordinate values. Taken as a whole, Color BASIC and 6809 machine language complement each other well. The latter is logical and powerful, but benefits from the richer set of data types provided by BASIC, and from its ROM-resident I/O facilities. Ultimately, BASIC ends up being a sort of scripting language for select drawing routines in the demo code provided with this article.
 

The 6847 and 6883   

The 6847 is a Motorola video display generator (VDG). Analyzed by itself, this is a fairly unexciting piece of hardware. It creates a simple frame buffer with support for some fairly limited color graphics modes. It has a maximum resolution of 256x192, and, in low resolution mode, can display 9 colors at once. The frame buffer abstraction is simple; the 6847 has no inherent concept of back buffers or any other sort of buffer other than the frame buffer itself.

In fact, the 6847 itself operates without reference to any larger machine address space. It expects to have access to a pool of the RAM sufficient for the current display mode, and it expects this to be addressable from zero to 6143 decimal (for the most advanced display modes) on its 13-bit address bus.  
The 6883 synchronous address multiplexer (SAM) exists largely to add support in areas where the VDG design is lacking. The 6883 takes care of rudiments like placing the VDG's address system into the context of the overall CoCo address space, arbitrating access to the memory bus, and generating all sorts of necessary timing signals. These signals connect variously to the CoCo's RAM and ROM chips, and the SAM is configured to properly distinguish between RAM and ROM as delineated in the CoCo memory map.

More excitingly, the way the SAM maps the VDG's framebuffer into the CPU's address space is very flexible. The SAM allows for the VDG's frame buffer to be quickly relocated throughout the entire address space of the CPU, opening the door for advanced techniques like fast scrolling, and page-flipping. The demo application developed here uses SAM page flipping to do flicker-free animation using the 6847 VDG.
 

CoCo Fast ROM Access 

One final benefit of the SAM's programmable nature bears mention: it can be configured to take advantage of the inherently faster nature of the CoCo's ROM compared to the CoCo's RAM. 

The CoCo was developed during a time when RAM costs were expensive, sufficiently so that they largely dictated a device's cost. Microcomputers were largely categorized by their RAM size, e.g. the Commodore 64, and the "64KB" and "128KB" badges Radio Shack placed atop high-end Color Computers. 
 
The SAM helped in this RAM competition by providing three different modes designed for varying types of memory. In one mode, the CPU operates at about 894 kilohertz. In another, the CPU operates at twice that speed, i.e. approximately 1,800 kilohertz. The latter mode requires faster memory chips to work correctly, in addition, of course, to a CPU rated for at least 1.8 megahertz. 
 
In the third SAM mode, the CPU operates at 894 kilohertz when RAM is being accessed, and at 1,800 kilohertz when retrieving data from ROM. The CoCo's ROM is much faster than its RAM (since it does not require a refresh cycle), and as a result this third mode is one that can be used on the CoCo.
 
When powered up, the SAM is configured such that the CoCo always runs at approximately 894 kilohertz.  A "high speed poke" (or equivalent operation) is required to enable the faster SAM mode (POKE 65495,0 on the CoCo / CoCo 2).

Wiring their system this way presumably allowed Tandy to leave open the possibility of using slower 6809s. Motorola produced versions of the CPU that were rated at 1.0 megahertz and 1.5 megahertz, and running either of these at 1.8 megahertz would represent overclocking. However, I am not aware of any evidence that anyone has experienced issues with using the faster SAM mode, or that Tandy ever used the slower 6809s in the CoCo.
 

Software Architecture

As provided by Radio Shack, any Color Computer offers the ability to write programs in Microsoft BASIC with 6809 machine language routines. The BASIC interpreter that loads at computer start facilitates the loading and execution of machine language subroutines.
 
Specifically, CoCo BASIC includes a DATA statement, which emits a series of numeric arguments directly into a BASIC program data segment. In this article's code, these numeric values are machine language for the 6809 CPU. These are read into a variable using the READ command, and then placed into memory at a designated location using the POKE command. Once this is complete, the rest of the BASIC program can invoke the resultant machine language routine using the EXEC command.
 
Pairing 6809 machine language with Microsoft BASIC yields an attractive overall result. It offers the full power of the 6809 CPU, but allows it to be scripted in the accessible BASIC language. However, the development of machine language subroutines is tedious if some additional steps are not taken. It is much easier to develop in 6809 assembly language than in machine language, and Motorola has released a PC-hosted assembler into the public domain that allows for this to happen.
 

Using the Code 

This Motorola AS11 assembler targets several 8-bit processors, including the 6800, 6809, and 68HC11. "AS9.exe" is the name of the actual executable used to build 6809 assembly language. On my development PC, the contents of the provided AS11 archive were unzipped to a folder named "C:\coco\as11."
 
The code provided with the article resides under a single top-level "coco" folder. This folder can be relocated as needed; all that is necessary is to use different cd commands in the Command Prompt sessions used in the build process described below, and adjust some C preprocessor macros discussed later in this section.
  
For this article, two assembly language source code files (and two complete demos) are provided. The demo with file names like "wirecube.asm," "wirecube.bas," etc., which I shall refer to as the "wirecube" demo, renders a full-screen wireframe cube rotating about the Y axis. The other demo, which uses file names like "pure.asm," "pure.bas," etc., is the windowed pyramid demo.
 
The "pure" aspect of this demo refers to the fact that its main animation loop contains no BASIC, only executable code written in 6809 assembly. The animation for this demo is therefore much faster than that of the "wirecube" demo.
 
The remainder of this section discusses the process of building the "wirecube" demo. To adapt the discussion below to the "pure" demo, all that is necessary is to replace the word "wirecube" with the word "pure" in all of the file paths given in the remainder of this section.
 
To assemble the provided assembly language code (e.g. "wirecube.asm") on a development PC configured in this way, a Command Prompt window must be opened. There, the cd command (cd\coco\as11) should be used to go to the correct folder where "AS9.exe" is located.
 
Then, "AS9.exe" can be invoked directly. This executable accepts a single parameter, which is the path to the input assembly language file. For this application, "wirecube.asm" is stored in the parent folder of the "c:\coco\as11" folder, i.e. in "c:\coco" itself. So, the necessary command was "as9.exe ..\wirecube.asm".
  
The output printed to stdout by this command contains the sequence of numbers that must ultimately be loaded using BASIC language DATA commands. Unfortunately, these are not in a very convenient format. They are interspersed with the overall assembly language listing, including mnemonics and addresses, and are also expressed in hexadecimal. (Microsoft BASIC's DATA statement accepts base 10 by default.)
  
Fortunately, in addition to the text written to stdout, the assembler also creates file "M.OUT." On the author's development computer, this file gets created in folder "C:\coco\as11." It is in Motorola SREC format, and contains the object code generated by the assembler, in a special hexadecimal format.
   
A utility that converts SREC format into a CoCo BASIC loader program is provided in the article's source code archive. There is one utility per demo project provided. The default folder location for this utility is "C:\coco\coco_obj2bas" and it consists of a single C++ source file. For the two demos provided with this article, these files are named "wirecube_coco_obj2bas.cpp" and "pure_coco_obj2bas.cpp." In both cases, the location and name of the input and output files are determined by preprocessor directives at the top of this C++ file:
  
#define INPUT_FILE "c:/coco/as11/M.OUT"
#define OUTPUT_FILE "c:/coco/as11/wirecube.bas" 

So, as provided, this utility program will accept input file "c:\coco\as11\m.out" (in SREC format) and output a CoCo BASIC program named "wirecube.bas." This program consists of DATA statements holding the contents of the SREC file, along with a loader loop that takes this data and loads it into RAM. This is done starting at address 3C18. On a 16KB CoCo, there is just enough room between location 3C18 and the end of RAM for the necessary assembly language routines.

Of course, "wirecube.bas" consists solely of the machine language object code (in the form of DATA statements) and the program to load this machine language code into RAM. This generated BASIC must be augmented by the application developer's own BASIC code. At a minimum, this will consist of an EXEC call to transfer execution to a machine language routine. Generally, the high-speed POKE already described (POKE 65495,0 on the CoCo / CoCo 2) should be added as well.
 
In each case, therefore, another BASIC file is provided, e.g. "wirecube.customized.bas." This is the file that must be typed into the CoCo and executed in order to actually view the demo.

The "pure" demo is larger in size than the "wirecube" demo. This necessitates some special steps during program entry. These steps are well-commented inside of "pure.customized.bas." In summary, the loader program must be entered into memory, and then executed using RUN. Then, the DATA statements containing the machine language portion of the demo must be removed from the program (freeing up their storage in RAM). The remainder of the BASIC program is then entered, and can be executed (using RUN) to run the demo.

Emulators  

A word about emulators is in order. During development, I used two emulators in addition to my actual CoCo 2: Jeff Vavasour's CoCo 2 emulator, and the Multi Emulator Super System ("MESS"). The latter is included in the archive provided with this article, and some very detailed instructions about using it are included in Beyond Bresenham: New Algorithms for Drawing Line Segments.

MESS was introduced as a development tool in order to test compatibility across a wide variety of CoCo devices. MESS can emulate a wide array of Color Computer variants, with varying memory amounts, different versions of BASIC, and so on.

MESS offers a menu item that allows BASIC code to be pasted into the CoCo emulator user's BASIC session. In fact, since MESS is not an MS-DOS program, but a true Windows program, this is easier; the special workarounds used to get text into a Vavasour emulator session are not necessary. One simply uses the "paste" option from the "edit" menu.

However, MESS also seems to only support the transfer of a limited amount of text at one time. In the author's experience, it is necessary to paste in lines 1-350 of "pure.customized.bas," for example, and then paste lines 360-2030. Then, the user must RUN the program and wait for its completion before proceeding. Attempting to paste the entire program will result in only part of line 360 getting entered into the CoCo emulator, followed by the termination of the paste operation. Attempting to paste after a RUN (e.g. the first of the two RUNs in "pure.customized.bas") can cause program corruption.

"Wirecube" Assembly Language Core  

The  assembly language core used by the "wirecube" demo exposes three functions: LINE, FCLS, and SHOW. The first of these is the fast line-drawing function. The second is a page-clearing routine. The third is used to swap the display page.

Clearing a display page is a potentially expensive operation. There is no inherent hardware support for such an operation; rather, it is necessary for the CPU to simply write cleared values to the entire buffer. The fastest way to do this is to effect all of the necessary store operations in series, using 16-bit operations, and using no looping, since this would require comparison operations, counter variable modification, and such. These move operations should use direct page mode, which will reduce both execution time and the size of the necessary object code. There will be 256 move operations required to clear the entire screen (512 bytes per screen page, divided by 2 bytes per move operation).

It is this, fastest approach which is taken by FCLS. However, to support applications that combine graphics with text, and in an effort to provide a faster alternative to clearing the whole screen, the FCLS function works in half screen pages. That is, it can clear the top of the default screen page located in addresses 400-4FF, or it can clear the bottom of this page located in addresses 500-5FF. If called twice, it can clear both, i.e. clear the whole display page. This opens up the possibility of windows, mixed-mode text/graphics applications, where raster-style line graphics are combined with text drawn at the whole-character level.

FCLS can clear the top and/or bottom page of the second display page, which happens  to be located at locations 3A00-3BFF for the "wirecube" demo. FCLS therefore accepts a page parameter that ranges from 0 to 3, with values 0 and 1 meaning the top and bottom halves of display page zero, respectively, and values of 2 and 3 representing the top and bottom half of the other display page. In the actual "wirecube" demo, two calls to FCLS per frame are used, and the demo output uses the full screen.

Source code file "wirecube.asm" begins with code that is basically the same as the code from the last article. The first major deviation is seen in the declaration of memory storage for the machine language core's routines' parameters:
 
 *Parameters
PAGEPARM RMB $1
HPARMX1 RMB $1
LPARMX1 RMB $1
PARMY1 RMB $1
PARMX2 RMB $1
PARMY2 RMB $1
COLPARM RMB $1


As before, we have parameters related to start/end coordinates and color. (Also as before, the starting X coordinate is a 16-bit parameter, and this is done because the line-drawing code does 16-bit arithmetic with this value.)

Before all of these parameters, though, is a new page parameter. For calls to LINE and SHOW, this parameter can either be 0 (indicating the default display page used by the startup BASIC shell) or 1 (indicating another display page allocated specifically by the demo code).  Calls to FCLS use values of 0 through 3, as previously discussed.

The next mention of this page parameter comes at the top of the screen-clearing routine:
 
FCLS TFR DP,A
STA SAVEDPVAR
LDA PAGEPARM
CMPA #0
BNE ZPHH
LDA #4
BRA CLD1N
ZPHH  CMPA #1
BNE ZPHG
LDA #5
BRA CLD1N
ZPHG  CMPA #2
BNE ZPHF
LDA #$3A
BRA CLD1N
ZPHF LDA #$3B
CLD1N TFR A,DP
LDD #$8080
STD <0
STD <2  
 
This code begins by saving the DP register. The "wirecube" assembly code is a general graphics library designed for many callers, so this is appropriate. (The application-specific code in "pure.asm" can assume a specific caller, and therefore a specific value for the DP register, and does not save the register's value in this way.)

Then, the code shown above continues by loading the DP register to hold a display page value. A decision lattice that will load this register with one of four possible values (4 for page 0 top, 5 for page 0 bottom, 3A for page 1 top, or 3B for page 1 bottom) is then in evidence.

Finally, we see the heart of the code used to actually clear the requisite area of RAM. The 16-bit "D" register is loaded with two byte values of 128 (decimal), which equates to a black square with no pixels set. Then, D is written out to the selected 256-byte page of RAM (half display page) using 128 16-bit store operations. The first two of these operations are evident in the snippet shown above; all 128 are present in "wirecube.asm."

All of this executable code begins after a series of variable declarations. These storage locations are used by the assembly language code, e.g. for the elaboration of the fast line-drawing routine. Before these declarations, a series of parameter declarations is present. These parameter declarations begin at location 3C00 for the "wirecube" demo. The actual entry points of the LINEFCLS, and SHOW operations are therefore all at locations greater than 3C00.

The second display page occupies the 512 bytes immediately before the parameters, at addresses 3A00 through 3BFF. The highest address available to BASIC is thus 39FF, or 14,847 decimal. This is the value passed to the BASIC CLEAR command early in "wirecube.customized.bas."

"Pure" Assembly Language Core  

The assembly language core used by the "pure" demo begins at location 3700 (hex). The highest location usable by Microsoft BASIC is 36FF, or 14,079 decimal, and this is the value in the BASIC CLEAR statement for this demo.

Less memory is left for BASIC in the "pure" demo. This relates to two things. First, there is more assembly language than there is for the "wirecube" demo, since the main animation loop is written in assembly language in the "pure" demo. Second, a 512-byte coordinate list is built for the "pure" demo. This data structure contains each coordinate used for the demo's line segments, in order.  It is loaded by the BASIC code, using the BASIC floating point and trigonometric operations (but, at the same time, must be excluded from the area used by the BASIC runtime).

This 512-byte coordinate array begins at location 3C00 and ends at location 3DFF. Then, at location 3E00, another 512-byte data structure begins. This is the second display page; note that it is located at a different address for this second demo.

The API exposed by the "pure" demo to BASIC is also much different. The only assembly language routine called directly by BASIC is MAIN3D, which is the assembly-language-based animation loop.
The display-clearing logic used by the "pure" demo is also very different from that of the "wirecube" demo. Rather than clearing the first or second half of the whole "Semigraphics 4" page, an application-specific window is cleared. This lies near the center of the display, and occupies the majority of the display, while leaving a large, peripheral area unaffected.

In contrast to the "wirecube" FCLS function, the "pure" demo's clear function must make a change to the DP register once per call. This is necessary because the demo's window straddles the page boundary at the middle of the display.

Note that the "pure" demo window appears atop the existing BASIC session. This requires the contents of the session display to be copied to the second display page before entering the main animation loop. Otherwise, the demo would not end up writing anything to certain areas of the second display page at all, and a random display would get shown in the periphery of every other frame. Because this copy operation is outside the main animation loop, it can be written in BASIC without incurring a performance penalty.

RAM Allocation 

Each demo uses its own RAM layout because I could not make the programs fit into memory otherwise. The Color BASIC program occupies space from address 0 to the address in the CLEAR statement, and if the CLEAR statement from the "pure" demo were used for the "wirecube" demo, then the "wirecube" BASIC program would not fit. The "pure" demo program's CLEAR statement does not leave enough room for the "wirecube" BASIC program.

Similarly, if the CLEAR statement from the "wirecube" demo were used for the "pure" demo, there would not be enough room at the high end of memory for all of the assembly language code and data needing to be loaded into RAM. So, the more assembly-based "pure" demo and the more BASIC-based "wirecube" demo must lay out their code and data differently.

Application Development 

The code in "wirecube.asm" can serve as a library for the development of wireframe-based or mixed-mode text/wireframe applications written in BASIC or assembly language. The half-screen clear routines, the SHOW and line-drawing functions, etc., provide a good, general basis for a wide variety of applications. Mixed-mode applications benefit from being able to render text to the display one character at a time.

If a client application is written in BASIC, some modifications are in order if the program will require a minimum RAM size greater than the 16KB required by the article demo programs. Specifically, the second page of display RAM will need to be moved higher, to allow BASIC programs to grow beyond the 16KB limit. CLEAR must provide an upper bound to the BASIC runtime, beyond which it will not use RAM; and if the second display page is left at location 3A00, BASIC will have to stop at that address and will not continue to the addresses located after the display page.

Relocating the second display page requires some logic in the SHOW, LINE, and FCLS functions to be changed. Reference [1] gives a good overview of the SAM programming required, which will differ depending on where the second display page ends up residing in a given application.

Similarly, the assembly language code and data in "wirecube.asm" will need to be relocated. This is not too difficult, though. The ORG directive needs to be changed, and the new parameter addresses will need to be calculated and incorporated into the client calls. The assembly language code itself is designed to be position-independent.

Assembly language applications do not suffer from these issues, which relate to BASIC's CLEAR statement. Assembly language applications can locate their data freely around the API implementation code and data, and this will typically be adequate. Such applications can incorporate "wirecube.asm" into their own code without modification.

Rendering 

The code dealing with 3D projection and transformation of the demo solids was prototyped using HTML5 / JavaScript before the CoCo implementation was developed.  HTML5 provides a 2D Canvas control, but no inherent 3D support.

The HTML5 Canvas is therefore fundamentally similar to the CoCo in its capabilities. Like the CoCo implementation, the HTML5 implementation contains its own projection logic. Both implementations use a left-handed coordinate system, with the origin located at the center of the monitor.



Figure 4: The HTML5 "pyramid" demo in action; the zoom level has been maximized because of the small size (64x32) of the area being rendered to.

Coordinates are not specified in pixels in this ad hoc 3D coordinate system. Rather, 1.0 unit is defined as equal to 32 display pixels. The "Z" dimension extends through the monitor, with the camera pointing in the positive direction. The "X" and "Y" dimensions are laid out the same as they are in the pixel-based raster coordinate system of the display, e.g. positive "Y" movement is from the top of the monitor to its bottom. Counterclockwise rotation about an axis is treated as positive; clockwise rotation is negative.

To emulate the CoCo as quickly as possible, the HTML5 code renders into a 64x32 pixel area. This results in a somewhat small display in a typical browser.

The JavaScript rendering code used by the HTML5 implementation begins by establishing starting (i.e. pre-rotation) coordinates in the 3D coordinate system. This is the code from the pyramid HTML5 demo, "pyramid.html":

  var worldxa=0.5; var worldya=0.5; var worldza=0.5;
  var worldxb=0.5; var worldyb=0.5; var worldzb=-0.5;
  var worldxc=-0.5; var worldyc=0.5; var worldzc=-0.5
  var worldxd=-0.5; var worldyd=0.5; var worldzd=0.5;
  var worldxe=0.0; var worldye=-0.5; var worldze=0.0;


Above, there is a pyramid vertex at (worldxa,worldya,worldza), another at (worldxb,worldyb,worldzb), and so on. The top vertex is at (0.0,-0.5,0.0). The pyramid base is a square with a side width of 1.0, and the height of the pyramid is 0.5, i.e. 32 pixels. The JavaScript continues as shown below:

  //Rotate a,b,c,d and e about the Y axis  
  var eworldxa=worldxa*Math.cos(theta)+worldza*Math.sin(theta);
  var eworldza=-worldxa*Math.sin(theta)+worldza*Math.cos(theta);
 
  var eworldxb=worldxb*Math.cos(theta)+worldzb*Math.sin(theta);
  var eworldzb=-worldxb*Math.sin(theta)+worldzb*Math.cos(theta);
 
  var eworldxc=worldxc*Math.cos(theta)+worldzc*Math.sin(theta);
  var eworldzc=-worldxc*Math.sin(theta)+worldzc*Math.cos(theta);
 
  var eworldxd=worldxd*Math.cos(theta)+worldzd*Math.sin(theta);
  var eworldzd=-worldxd*Math.sin(theta)+worldzd*Math.cos(theta);
 
  //NO need to rotate "E"  

  var eworldxe=worldxe;
  var eworldze=worldze;

  //Rotation about Y axis doesn't change Y coordinate  

  var eworldya=worldya;
  var eworldyb=worldyb;
  var eworldyc=worldyc;
  var eworldyd=worldyd;
  var eworldye=worldye;



The first eight statements above calculate the rotated position of the pyramid base in the 3D coordinate space. The sine / cosine based formulas used to effect the actual rotation can be found in any graphics text, e.g. source [2] or source [3].

The transformed coordinates have names like eworldxa, versus worldxa for the original coordinate. Note that the Y coordinates are not changed by the rotation operation, so there are calculations for the X and Z coordinates only. Variable eworldya thus ends up being a simple copy of worldya. Similarly, the top of the pyramid's coordinates are unchanged by the rotation operation.

Finally, the JavaScript code must project these vertices, which still reside in the virtual 3D coordinate space, into the real 2D coordinate system of the HTML5 Canvas. Most of the necessary work is handled by the next snippet of code shown below:

  var axprime=eworldxa*camdist/(camdist+eworldza);
  var ayprime=eworldya*camdist/(camdist+eworldza);
  var bxprime=eworldxb*camdist/(camdist+eworldzb);
  var byprime=eworldyb*camdist/(camdist+eworldzb);
  var cxprime=eworldxc*camdist/(camdist+eworldzc);
  var cyprime=eworldyc*camdist/(camdist+eworldzc);
  var dxprime=eworldxd*camdist/(camdist+eworldzd);
  var dyprime=eworldyd*camdist/(camdist+eworldzd);
  var exprime=eworldxe*camdist/(camdist+eworldze);
  var eyprime=eworldye*camdist/(camdist+eworldze);


The calculations above are very typical examples of vanishing point perspective. In essence, each X and Y coordinate is reduced in proportion to its distance in the Z dimension. This effect is attenuated by camdist, the camera distance, in that camdist is included in both the numerator and the denominator of the perspective ratio. 

The coordinates just calculated, in axprime, ayprime, etc., are now 2D, but are still expressed using the unit system of the original 3D coordinates. They must be translated from this system (which also has its origin at the monitor center) to the pixel-based system actually used by the hardware (which has its origin at the display's top left corner). So, the line-drawing code at the end of the rendering contains some additional translation and scaling:

  ctx.beginPath();
  ctx.moveTo(axprime*32.0+32.0,ayprime*16.0+16.0);
  ctx.lineTo(bxprime*32.0+32.0,byprime*16.0+16.0);
  ctx.closePath();
  ctx.lineJoin = "round";
  ctx.strokeStyle = "black";
  ctx.lineWidth = 2;
  ctx.stroke();


The code shown above renders a single pyramid side.

Before moving this implementation to the CoCo, some algebraic simplification was performed. Expressions like Math.Sin(theta) are repeated several times for each frame in the JavaScript shown above, and this is not acceptable in the CoCo implementation, since resources are much scarcer.

While these algebraic simplifications were straightforward in their execution, the result is a series of statements that no longer resemble those discussed above. The algebraic simplifications were applied to the JavaScript before the CoCo implementation was begun; the results are in files "pyramid.fast.html" and "wirecube.fast.html."

The CoCo implementation, fortunately, can be in BASIC. In the "pure" demo, all of the coordinates used in the demo are pre-calculated, so use of BASIC does not reduce the frame rate. The data structure in which these coordinates is stored is a 512-byte array (512 = 16 frames * 8 lines * 2 endpoints per line * 2 dimensions) at address 3C00. The main animation loop pulls endpoint coordinates from this array in sequence, from start to finish, in an infinite loop.

The BASIC code that loads this vertex list begins as shown below:

2100 FOR TINDX=0 TO 15
2105 THETA=TINDX/((16+1)/(6.28318531/4))
2110 MCOST=0.5*SIN(THETA+1.5708)
2120 MSINT=0.5*SIN(THETA)
2125 EXTRACT=(MCOST+MSINT)*64.0
2126 NXTRACT=(MCOST-MSINT)*64.0
2127 ACRIT=(2.0-MSINT+MCOST) : BCRIT=(2.0-MSINT-MCOST)
2128 CCRIT=(2.0+MSINT-MCOST) : DCRIT=(2.0+MSINT+MCOST)
2130 PA(TINDX)=INT(((EXTRACT)/ACRIT)+32.0)
2140 QA(TINDX)=INT((16.0/ACRIT)+16.0)
2150 PB(TINDX)=INT(((NXTRACT)/BCRIT)+32.0)
2160 QB(TINDX)=INT((16.0/BCRIT)+16.0)
2170 PC(TINDX)=INT(((-EXTRACT)/CCRIT)+32.0)
2180 QC(TINDX)=INT((16.0/CCRIT)+16.0)
2190 PD(TINDX)=INT(((-NXTRACT)/DCRIT)+32.0)
2200 QD(TINDX)=INT((16.0/DCRIT)+16.0)
2300 NEXT TINDX


Above, eight 16-element coordinate arrays are constructed. These hold the coordinates for the base vertices of the pyramid at 16 different rotation angles, representing 90 degrees of cube rotation. These arrays are allocated in addition to the 512-byte endpoint list ultimately used, and therefore do represent a concession to legibility over optimality. This design also allows for more code to be held in common with "wirecube.customized.bas." The actual calculations are modified (i.e. post-simplification) versions of the JavaScript calculations explored earlier in this section.

In the main animation loop, this 16-frame sequence is repeated ad infinitum, creating an effect identical to full 360-degree rotation. In the next section of code, these array values are combined with constants for the pyramid top vertex coordinates at screen coordinate (32,8) (values which were pre-calculated by the author and then coded using literal values). These data loads are done in the proper order to create the line endpoint list:

3000 PT1R=15360
3010 FOR TINDX=0 TO 15
3011 POKE PT1R,PA(TINDX) : PT1R=PT1R+1
3012 POKE PT1R,QA(TINDX) : PT1R=PT1R+1
3013 POKE PT1R,PB(TINDX) : PT1R=PT1R+1
3014 POKE PT1R,QB(TINDX) : PT1R=PT1R+1
3015 POKE PT1R,PB(TINDX) : PT1R=PT1R+1
3016 POKE PT1R,QB(TINDX) : PT1R=PT1R+1
3017 POKE PT1R,PC(TINDX) : PT1R=PT1R+1
3018 POKE PT1R,QC(TINDX) : PT1R=PT1R+1
3019 POKE PT1R,PC(TINDX) : PT1R=PT1R+1
3020 POKE PT1R,QC(TINDX) : PT1R=PT1R+1
3021 POKE PT1R,PD(TINDX) : PT1R=PT1R+1
3022 POKE PT1R,QD(TINDX) : PT1R=PT1R+1
3023 POKE PT1R,PD(TINDX) : PT1R=PT1R+1
3024 POKE PT1R,QD(TINDX) : PT1R=PT1R+1
3025 POKE PT1R,PA(TINDX) : PT1R=PT1R+1
3026 POKE PT1R,QA(TINDX) : PT1R=PT1R+1
3027 POKE PT1R,PA(TINDX) : PT1R=PT1R+1
3028 POKE PT1R,QA(TINDX) : PT1R=PT1R+1
3029 POKE PT1R,32 : PT1R=PT1R+1
3030 POKE PT1R,8 : PT1R=PT1R+1
3031 POKE PT1R,PB(TINDX) : PT1R=PT1R+1
3032 POKE PT1R,QB(TINDX) : PT1R=PT1R+1
3033 POKE PT1R,32 : PT1R=PT1R+1
3034 POKE PT1R,8 : PT1R=PT1R+1
3035 POKE PT1R,PC(TINDX) : PT1R=PT1R+1
3036 POKE PT1R,QC(TINDX) : PT1R=PT1R+1
3037 POKE PT1R,32 : PT1R=PT1R+1
3038 POKE PT1R,8 : PT1R=PT1R+1
3039 POKE PT1R,PD(TINDX) : PT1R=PT1R+1
3040 POKE PT1R,QD(TINDX) : PT1R=PT1R+1
3041 POKE PT1R,32 : PT1R=PT1R+1
3042 POKE PT1R,8 : PT1R=PT1R+1
3043 NEXT TINDX


Above, note that 15,360 decimal is 3C00 hexadecimal, the starting location for the coordinate list.  At the very start of the sequence shown above, variable PTR1 is initialized to this value, and then it is incremented throughout the remaining code, with each addition to the array. Lines 3011, 3012, etc. are similar in effect to the C statements *pt1r++=pa[tindx];*pt1r++=qa[tindx];, etc. albeit not nearly as fast.

Each iteration of the FOR loop begins by loading the endpoints for the base itself, then loads the endpoints necessary to draw the vertical edges of the pyramid, each of which extends from a base vertex to (32,8).

The main animation loop is entered shortly after the code shown above. This is done by way of an EXEC call which never returns.

The "wirecube" demo uses a BASIC main animation loop. Also, it uses arrays PA, QA, etc. directly, as opposed to loading an endpoint coordinate list. However, the code used to effect the necessary real number calculations is very similar to the code shown above.

Color Selection 

The color of the pyramid in the "pure" demo can be changed by altering line 3200 of "pure.customized.bas." As provided, a value of 64 (decimal) is passed to POKE on this line. Replacing this with other even multiples of 16, from 16 up to 240, will result in the pyramid being drawn in a different color.

In "wirecube.customized.bas," a similar role is played by line 17,085.

Timing  

As described thus far, the CoCo "pure" demo still has one major problem: it is too fast. In particular, the main animation loop ends each iteration by flipping the currently visible display page. Then, at the top of the next iteration, the display page that is not visible is cleared. If this sequence of events happens too quickly, though, the resultant display will exhibit flicker and screen tearing.

The problem is that the page change is not actually completed instantaneously simply because the associated SAM registers are altered. The display hardware is rendering video frames at a fixed rate of 60 hertz, and when the page change register write operation is executed, it could very well be right in the middle of rendering the visible area of the screen. This is in fact quite likely, and in such a situation, the display hardware will complete the current frame before switching to the new display page requested by the SAM register change(s).

So, what is necessary is to make sure that the page change action takes place in between frames, during a short, designated period of time called the "vertical blanking" interval.  In fact, it is necessary to make sure that the page change takes place as close to the beginning of the vertical blanking interrupt as possible. Otherwise, the application will be susceptible to some form of flicker and/or tearing. These are not CoCo-specific issues; rather, they are very typical of first generation (NTSC/PAL) television-based hardware designs in general.

Like many early home computers and video game systems, the CoCo application development environment contains facilities that allow the developer to detect the start of the "vertical blanking" interval. These facilities rely on the interrupt capabilities of the 6809, in particular on polling an interrupt request bit associated with the vertical blanking interval.

Some versions of CoCo BASIC handle this interrupt themselves, i.e. rely on a transfer of control to an interrupt handler, and this behavior is disabled by line 3110 of "pure.customized.bas." In addition, a flag is set in "pure.asm," near the top of the MAIN3D function, that makes the interrupt associated with the vertical blanking interval happen on the "leading edge" of the vertical blanking signal, i.e. at its start.

Other than these configuration items, all that is required to achieve proper timing is some throttling code. This is the code that actually delays the page change operation, and it is present in the SHOW function:

LDA $FF02
HS2NK LDA $FF03
ANDA #$80
BEQ HS2NK
LDA $FF02


This code begins by clearing any pending interrupt request bit. This is done to allow any ongoing vertical blanking interval to complete (since we must wait for the top of a full interval before proceeding with the page change). The request bit is cleared by reading location FF02.

Then, with the bit cleared, the code shown above enters a while loop which waits for the bit to turn on again. This indicates another vertical blanking interval interrupt request. At that point, all that is necessary is to clear the interrupt request bit again, and proceed with the page change.

Conclusion  

It must be left to readers to determine for themselves whether these CoCo demonstrations are awe-inspiring or ordinary; but I have basically realized what I had hoped was possible from a well-designed CoCo 3D program. The ability to simulate reality at a more ambitious level than the static, 2D imagery typical of real computer games of the early 1980s is at least hinted at.

I also think that the programming environment offered by the CoCo was vindicated by these activities. The 6809, its Motorola support chips, and Microsoft BASIC combine to create a platform for application development that is rational, flexible, and open.

References  

1. T. Ahrens, J. Brown, and H. Scales, What's Inside Radio Shack's Color Computer?, in: BYTE 6 (3), (1981). 
2. M. Smithwick, M. Verma, Pro OpenGL ES for Android (New York, Apress, 2012). 
3. F. S. Hill, Computer Graphics (New York, MacMillan, 1990).

This article, along with any associated source code and files, is licensed under The GNU General Public License (GPLv3)