Allen Huffman has been posting a series of articles on optimizing Color Basic as found on the TRS80 Color Computer. This is a response to his most recent (as of this writing) entry. Note that I’m not criticizing his article or the conclusions he reaches in it. Instead, this is intended to provide some more detailed background information. It also provides a good starting point to discuss some points about optimization, so even if you aren’t into programming on the Coco, you might find it interesting.
In his article, Allen talks about using a trick involving INSTR() and INKEY$ to optimize a key press handling loop. I won’t go into details on that aspect of it. Suffice to say that the trick works nicely and is something that should have occurred to many people back in the day but apparently did not. For a detailed analysis of the trick, head on over to Allen’s article.
After explaining the trick, he goes on to discuss two possible variants which take advantage of either ON…GOTO or ON…GOSUB. He goes through a number of benchmarks and comes up with some results. However, he was a bit surprised by some of his benchmark results. I am going to run through some implementation details for GOTO and GOSUB in Color Basic. Then I am going to examine how that affects his benchmarks.
Under the Hood
The first implementation detail that needs explaining is finding the destination line in the program. This affects GOTO and GOSUB equally. To understand this, we need to know how programs are stored in memory. Each program line have a 4 byte header which consists of a 16 bit line number and a 16 bit pointer to the start of the next line. Each line also ends with a single NUL byte. Lines are stored sequentially in memory in line number order. The next line pointer makes scanning for a particular line number fairly efficient since it is a constant time operation to find the start of the next line. However, it also means that the worst case scenario means that we must examine every single line in the program to find the destination line.
Because line number lookups are so common, there are a couple of optimizations to the process. If the requested line number is higher than the number of the line currently executing, the search starts from the next line in the program rather than the start of the program. Also, the search stops as soon as a line number that is higher (or equal, obviously) to the requested line number is encountered. For the sake of optimization, the not found case doesn’t matter. However, the practical upshot is that a GOTO to the next line of the program has the same real cost as a GOTO to the first line of the program. In fact, a goto N lines ahead in the program is equivalent to a GOTO to the Nth line of the overall program (unless the Nth line is ahead of the GOTO statement).
Another point is that both GOTO and GOSUB will skip to the end of the current line so that the line search can happen using the line headers described above.
The final important bit is how GOSUB actually works. GOSUB saves the current line number and execution pointer on the stack and then processes the remainder of the command as a GOTO. Then, when RETURN happens, the line number and execution pointer are restored from the stack and execution resumes as though a DATA command is executing. That means any extra stuff at the end of the GOSUB command is skipped. This is necessary for ON…GOSUB to work. As a matter of fact, the actual transfer of control is handled by exactly the same code as a plain GOTO or GOSUB.
Finally, we should look at the stack situation. Both FOR/NEXT and GOSUB/RETURN store information on the stack. Each record on the stack is prefixed a byte that indicates the entry type. For FOR/NEXT, this yields 18 bytes per entry. For GOSUB, this yields 5 bytes per entry. The stack search looks for the first entry that isn’t a FOR/NEXT record. If that happens to be something saved by GOSUB, then the stack is reset and control resumes at the calling point. Note that this means that any open FOR/NEXT loops will also be purged, which means that RETURN in the middle of a FOR/NEXT loop behaves as expected.
Implications
The implications of all of those details are interesting. There are four possible cases arising from Allen’s trick. First, we have the GOTO/GOTO option which simply restarts the INKEY loop:
100 ON INSTR(" ABC", INKEY$) GOTO 100,200,300,400:GOTO 100
200 REM STUFF
210 GOTO 100
300 REM STUFF
310 GOTO 100
400 REM STUFF
410 GOTO 100
Then we have the GOTO/GOTO option that resumes AFTER the INKEY loop (note the dummy handler branches to the line following the INKEY check):
100 ON INSTR(" ABC", INKEY$) GOTO 110, 200, 300, 400
110 REM STUFF
120 GOTO 100
200 REM STUFF
210 GOTO 110
300 REM STUFF
310 GOTO 110
400 REM STUFF
410 GOTO 110
Then we have the GOSUB/RETURN that simply restarts the INKEY loop (note the dummy handler branches to a RETURN statement):
100 ON INSTR(" ABC", INKEY$) GOSUB 210, 200, 300, 400:GOTO 100
200 REM STUFF
210 RETURN
300 REM STUFF
310 RETURN
400 REM STUFF
410 RETURN
And the GOSUB/RETURN variant that continues execution after the INKEY check (note the dummy handler branches to a RETURN statement):
100 ON INSTR(" ABC", INKEY$) GOSUB 210, 200, 300, 400
110 REM STUFF
120 GOTO 100
200 REM STUFF
210 RETURN
300 REM STUFF
310 RETURN
400 REM STUFF
410 RETURN
Allen’s benchmarking examples all have the INKEY check wrapped in a FOR loop in order to run enough times to get a useful benchmark using the 60Hz TIMER feature of Extended Basic. That means he is not testing the case where the INKEY loop is simply restarted after processing a key press.
He did benchmark a simple GOTO/GOTO setup and a simple GOSUB/RETURN setup to determine whether GOTO/GOTO or GOSUB/RETURN was, in fact, faster. His result showed the GOSUB/RETURN was faster than GOTO/GOTO. This is expected because there is only one line number search in the GOSUB/RETURN case while the GOTO/GOTO case has two, one of which must start at the beginning of the program because at least one has to jump backwards. He did some additional tinkering to determine that the size of the program affects the line number lookup time which is expected based on the internals described above. His result showed the GOTO/GOTO was about 7% slower than GOSUB/RETURN.
He also discovered that ON…GOTO and ON…GOSUB were markedly slower than plain GOTO/GOSUB by testing variants using “ON 1” with a single destination line number. This would be exactly equivalent to a plain GOTO or GOSUB as far as execution result goes. That these variants were markedly slower makes sense when you think about it because the ON variant has to evaluate an expression to determine which of the destination line number options to use. He was surprised to find no actual difference between ON…GOTO and ON…GOSUB in this test, though. His tests were using 1000 iterations using a 60Hz timer. A measurable difference might appear using a larger iteration count. However, that doesn’t explain why there is no measurable difference between GOSUB and GOTO in this case when in the base case, the difference was measurable at 7%.
Let’s consider what is going on. In the ON case, the following steps occur. First, an expression is evaluated. Then a check for GOTO or GOSUB is performed. Then the line number options are skipped until there are no more or the correct one is found. Then control is transferred to the base GOTO/GOSUB code. What is likely happening is that the cost of expression evaluation is swamping all other variables in this particular situation.
Because that seemed odd, I tried the same benchmarks. I got some variation between runs of the same benchmark. Taking rough averages of the results across multiple runs, I found that bare GOTO/GOTO was about 6% slower than bare GOSUB. I also found no difference between the “ON 1” variants. This was using mess to run the benchmarks so some of the variation between runs probably came from there. However, it does seem to confirm Allen’s results.
So what does this tell us about the overhead of GOSUB/RETURN vs GOTO/GOTO? I think it tells us that in a trivial program, the overheads are not significantly different between RETURN followed by a “skip to the end of the statement” compared to a “look up a line number and transfer control directly” operation. I think you will find that the actual cost in instructions executed is relatively close in the case of a fairly trivial program. For a much larger program, I would expect the GOSUB/RETURN scenario to be better.
Allen also tested the results with more representative program structure (like those in the examples above) and with some extra lines. His result was that the GOTO/GOTO situation depends on factors like program size while the GOSUB/RETURN is less impacted by that. Based on the internals described above, that would be the expected result.
So far, this only examines the case where additional processing is going to occur after the INKEY check. In the case where the INKEY check is simply going to run until a key is pressed, the key handler is dispatched, and then the INKEY loop is restarted will always be somewhat faster using the GOTO/GOTO method even though optimizing that case is pretty pointless. This simply must be the case because you must always jump back to the start of the INKEY loop, regardless what happens as a result. If you are using GOSUB/RETURN, then there must be a RETURN statement prior to that necessary GOTO. If, instead, you simply use GOTO directly, you can skip all the stack manipulation. Either way, the GOTO has to start from the beginning of the program and will be going to the same destination. That means it will take the same length of time to execute that GOTO regardless of its proximity to its destination.
Essentially, based on these results and Allen’s results, it seems that there is no real benefit to using the GOTO/GOTO option. However, there is a good chance that the benchmark structure we have used is introducing enough noise in the results that we are missing some critical detail. It would probably be worth running equivalent benchmarks with 10,000 or 100,000 iterations for each test and running the tests a couple of dozen times each in order to see if there is any signfiicant difference between the ON…GOTO and ON…GOSUB cases. However, that starts getting into statistical analyisis and is probably not worth it unless your program really depends on squeezing the absolute maximum performance out of what you are benchmarking.
Additional Comments
There are a couple of additional comments I want to make before ending this.
First, you may recall that during a line number search, the actual cost of depends only on the number of lines to search. The length of the lines has no impact. That would suggest that compacting the program into as few lines as possible would be a total win. That will often be the case, of course. However, executing either a GOTO or GOSUB (or an implied GOTO as part of THEN or ELSE) requires scanning byte by byte to the end of the line in order to determine if a forward scan would be more beneficial. That means if the GOTO or GOSUB is near the start of a long line, any speed benefit from compacting the program into as few lines as possible may be negated.
Next, you should always benchmark any change you make to improve performance. As seen above in the benchmarks, sometimes something that should make things faster doesn’t have a measurable impact. You would expect from the tests with GOSUB/RETRN vs GOTO/GOTO, that ON…GOTO/GOTO would be slower and ON…GOSUB/RETURN and yet the particular benchmark used by both Allen and me shows no measurable difference. That is not to say it wouldn’t show a difference in a different coding environment. However, you shouldn’t assume that just because one has more overhead than the other that it is that obvious overhead that is the important factor. In fact, the overhead of executing GOSUB vs GOTO is pretty small when you look at the actual ROM code so it’s relatively easy to understand why that would disappear in the noise when combined with a much more expensive operation like evaluating an expression.
Basically, the overall point is that just because it seems logical that something should be faster, it doesn’t mean that is, or that the speed increase will be enough to be worth it.
Finally, you need to consider whether your benchmarking process is actually affecting the results of the benchmark. If you can think of multiple ways to benchmark something, do it. That way, you will have a better idea of the real impact of a change. Also related to this, you should consider whether your benchmark yielded a real difference or not, or if the lack of difference is real. This may require statistical analysis and, so, may not be worth it. In this case, for instance, it would likely be overkill.