Coco Project

I started a while back writing an assembler for 6809 and 6309 CPUs. While there are lots of decent assemblers for those CPUs available, none of them behave quite the way I want them to. The ones that handle forward references in a useful manner don’t support macros. The ones that support macros and forward references target a specific operating system. Many suffer phasing errors when forward references are present which is, to say the least, annoying. The one I had been using seems to handle most of the situations well enough but it doesn’t have macros and it is not open source so I can’t add them.

The upshot of all of that is that I decided to write my own assembler that will support the features I want. Today, I have it assembling a real source file from one of my projects almost correctly, with only two bytes wrong (on indexed instructions) which will, of course, require debugging. But the rest of the code is completely correct.

There’s still a long way to go to get it to be truly useful, including adding a LOADM file format for the output and implementing all the pseudo operations correctly. But, hey, it produces semi-pretty program listings.

Writing an Assembler

My first real experience with computers at any level was with the TRS80 Color Computer back in the 1980s. That machine has a Motorola 6809 CPU in it (which is an 8 bit cpu with a 16 bit address space). It was on that machine I first learned about assembly language programming. (NOTE: it is incorrect to call it “assembler programming” or “programming in assembler”.) As a result, I have something of a soft spot for that old machine and it has become something of a hobby.

Over the years, a great deal of work has been put into new hardware and software for that system. This includes a port of the Donkey Kong arcade game as recently as 2007. Yes, that’s right. This year. There has also been a replacement CPU from Hitachi (the 6309) which is a drop in pin-compatible and instruction compatible chip. It has a few extensions, however, which make it somewhat interesting to play with.

Recently, I thought it would be interesting to write an operating system for that system. I wanted a somewhat unix-like system. So I geared up with a cross-assembler (that’s an assembler that runs on a different platform than the one it assembles code for) and other useful tools and got to work. I soon found that the assembler I was using had a few things that I didn’t like, including a lack of macros and throwing phasing errors (more on those later) in cases where I thought it shouldn’t. Oh, I could make do with that assembler and make everything work, but I thought, why? So I undertook to write my own assembler and I decided I was going to make it work really hard to avoid phasing errors.

Now there are a number of types of assembler. Most assemblers are two pass assemblers. That is, the first pass resolves the addresses of all the symbols in the program and the second pass actually generates the code. Now this works really well until you have a forward reference (referencing a symbol that is defined later). On the first pass, the assembler will have to assume the largest possible size for the addressing mode used to refer to the symbol. On the second pass, however, it may generate a smaller addressing mode which will then cause the symbol to have a different address on the second pass than it did on the first pass. This is known as a phasing error. Phasing errors can be completely avoided by always using the largest possible addressing mode when there is any doubt. However, this is not ideal. (Note that the EDTASM+ native assembler for the TRS80 Color Computer (CoCo for short) does just that.)

Now I figured it must be possible for the assembler to resolve many forward references to the smallest possible addressing mode without causing phasing errors if an intermediate representation of the entire program is maintained and then multiple passes through it are performed to resolve the symbol location and instruction mode problems. In reality, this means injecting an optimization pass between the first and second passes of the assembler. Let’s call this pass 1.5. Note that in the case of my assembler, it will read the source code exactly once. Once it has done so, the entire program is stored in memory in some form and all subsequent processing occurs in memory. This is not ideal and can cause a very large working set. However, it is unlikely that a program for the 6809 which has only a 64K address space will generate a working set that is too large for main memory on a modern Linux system.

Pass 1 is relatively straight forward. It simply involves reading all the source lines, figuring out the opcodes instruction sizes and assigning addresses to symbols. However, because we don’t know the correct size of an instruction at this stage in the face of a forward reference, the assembler must maintain a range of addresses for each instruction and a range of values for each symbol. Interestingly enough, it is often possible to determine the precise size of an instruction even in the face of an uncertain symbol value and and uncertain address for the instruction and in any case where this is feasible, the assembler will do so.

Pass 2 still generates the code. However, since pass 2 runs after pass 1.5 has resolved the phasing problems, there cannot be any phasing errors at this point and the code can be generated without trouble (unless, of course, there are other problems with the source code.)

Pass 1.5 is where the real complexity and trouble appears. Many phasing problems can be resolved by a second pass through the code now that symbol locations are approximately known. However, this will often not be enough. As a result, pass 1.5 actually makes multiple sweeps through the code until it finds no phasing problems that have resolved themselves. Once this happens, it makes another sweep through the code and looks for the first ambiguous instruction. If it finds none, it knows that the phase correction process is done. If it does find one, it forces it to the maximum adressing mode size and then starts pass 1.5 over again to see if this single change allows correcting other remaining problems.

Now this will not solve all addressing mode problems in the most optimal way and it is still useful to receive a warning about addressing modes that are larger than they need to be. However, most of the simpler situations should be resolved by this method and more complex situations really do need a human touch to get right anyway. After all, if the code is that complex, maybe there’s something else wrong with it anyway?

In any event, I plan to release this assembler as open source once I have it functional. Hopefully I will be able to finish it in the next week or so but it is a non-trivial undertaking. It is further complicated by the fact that the 6309 extensions to the 6809 instruction set are not particularly regular and add additional types of operand and additional confusion to the whole situation. In other words, there is a great deal of work involved in supporting all features of the processor.

My advice to anyone who wants to write an assembler for their favourite cpu is to think carefully about whether you really need to. The 6309 is a relatively simple cpu compared to, say, the 80386 or the AMD Opteron. If things are this complex for a relatively simple 8 bit cpu, imagine the modern 64 bit processors that support legacy 16 and 32 bit instruction sets! So if you don’t have a good reason to write an assembler, you probably shouldn’t.

‘Fest Trip, Day 5

It’s nearing the end of day 2 of the CoCoFest and we’ve had a good discussion about a virtual CoCo4 created using MESS and a first-timer won one of the door prizes. Couldn’t have gone to a better guy what with his luck the other day. We’re now into the auction and when that will finish, the ‘fest will be basically finished.

It’s been a blast this year. There’s been much better turnout than last year and there might be some renewed interest in some of these things. it will be interesting to see how things go next year.

I think maybe I’ll bring my CoCo setup down next year and rent a table. This year’s rates for a table were not out of reach so I think it can be manageed. That could really make the ‘fest interesting if a whole bunch of folks bring their setup down and get it running.

Well, I’m going to sign off now and pay more attention to the auction.

‘Fest Trip, Day 4

I’m writing this from the CoCoFest floor. Turnout has been quite good this year and there are a lot of folks who showed up for the first time or the first time in a long time this year. In particular are such notables as Mary Kramer (CoCoNutz newsletter) and Steve Bjork (programmer extra-ordinaire) are in attendance this year.

We had a couple of interesting presentations today along with a panel discussion of the life and times of the CoCo. The Cloud 9 guys did a good commercial on their main products (and they have some good ones).

The real cool thing is the Sockmaster has unleashed his latest secret project on the world today. He has ported Donkey Kong to the CoCo. Now that’s really cool. Seeing the CoCo running Donkey Kong brought back childhood memories. He got it exactly right. Of course, he had original Z80 code to work from so I suppose it isn’t surprising that he managed to get such a close correspondence. Even so, he does admit it was a difficult task.

Well, it’s time to sign off as the auction is getting underway. I wonder if there’s anything I need.

Update at 9:55PM CDT

I didn’t buy anything at the auction and supper was quite good even though the service was terrible initially. The real run was finding out about twenty minutes ago that a tornado watch is in effect for the area until about midnight. That’s always a great thing to hear about. Of course it doesn’t mean a twister will appear but it’s never a good sign. That’s two out of three years I’ve come down here that a twister or something resembling a twister has been involved in the trip. Interesting.

Now it’s off to relax until bed so I can take on the day tomorrow with renewed vigour.