Optimizing Color Basic – ON GOTO vs ON GOSUB

Allen Huffman has been posting a series of articles on optimizing Color Basic as found on the TRS80 Color Computer. This is a response to his most recent (as of this writing) entry. Note that I’m not criticizing his article or the conclusions he reaches in it. Instead, this is intended to provide some more detailed background information. It also provides a good starting point to discuss some points about optimization, so even if you aren’t into programming on the Coco, you might find it interesting.

In his article, Allen talks about using a trick involving INSTR() and INKEY$ to optimize a key press handling loop. I won’t go into details on that aspect of it. Suffice to say that the trick works nicely and is something that should have occurred to many people back in the day but apparently did not. For a detailed analysis of the trick, head on over to Allen’s article.

After explaining the trick, he goes on to discuss two possible variants which take advantage of either ON…GOTO or ON…GOSUB. He goes through a number of benchmarks and comes up with some results. However, he was a bit surprised by some of his benchmark results. I am going to run through some implementation details for GOTO and GOSUB in Color Basic. Then I am going to examine how that affects his benchmarks.

Under the Hood

The first implementation detail that needs explaining is finding the destination line in the program. This affects GOTO and GOSUB equally. To understand this, we need to know how programs are stored in memory. Each program line have a 4 byte header which consists of a 16 bit line number and a 16 bit pointer to the start of the next line. Each line also ends with a single NUL byte. Lines are stored sequentially in memory in line number order. The next line pointer makes scanning for a particular line number fairly efficient since it is a constant time operation to find the start of the next line. However, it also means that the worst case scenario means that we must examine every single line in the program to find the destination line.

Because line number lookups are so common, there are a couple of optimizations to the process. If the requested line number is higher than the number of the line currently executing, the search starts from the next line in the program rather than the start of the program. Also, the search stops as soon as a line number that is higher (or equal, obviously) to the requested line number is encountered. For the sake of optimization, the not found case doesn’t matter. However, the practical upshot is that a GOTO to the next line of the program has the same real cost as a GOTO to the first line of the program. In fact, a goto N lines ahead in the program is equivalent to a GOTO to the Nth line of the overall program (unless the Nth line is ahead of the GOTO statement).

Another point is that both GOTO and GOSUB will skip to the end of the current line so that the line search can happen using the line headers described above.

The final important bit is how GOSUB actually works. GOSUB saves the current line number and execution pointer on the stack and then processes the remainder of the command as a GOTO. Then, when RETURN happens, the line number and execution pointer are restored from the stack and execution resumes as though a DATA command is executing. That means any extra stuff at the end of the GOSUB command is skipped. This is necessary for ON…GOSUB to work. As a matter of fact, the actual transfer of control is handled by exactly the same code as a plain GOTO or GOSUB.

Finally, we should look at the stack situation. Both FOR/NEXT and GOSUB/RETURN store information on the stack. Each record on the stack is prefixed a byte that indicates the entry type. For FOR/NEXT, this yields 18 bytes per entry. For GOSUB, this yields 5 bytes per entry. The stack search looks for the first entry that isn’t a FOR/NEXT record. If that happens to be something saved by GOSUB, then the stack is reset and control resumes at the calling point. Note that this means that any open FOR/NEXT loops will also be purged, which means that RETURN in the middle of a FOR/NEXT loop behaves as expected.

Implications

The implications of all of those details are interesting. There are four possible cases arising from Allen’s trick. First, we have the GOTO/GOTO option which simply restarts the INKEY loop:

100 ON INSTR(" ABC", INKEY$) GOTO 100,200,300,400:GOTO 100
200 REM STUFF
210 GOTO 100
300 REM STUFF
310 GOTO 100
400 REM STUFF
410 GOTO 100

Then we have the GOTO/GOTO option that resumes AFTER the INKEY loop (note the dummy handler branches to the line following the INKEY check):

100 ON INSTR(" ABC", INKEY$) GOTO 110, 200, 300, 400
110 REM STUFF
120 GOTO 100
200 REM STUFF
210 GOTO 110
300 REM STUFF
310 GOTO 110
400 REM STUFF
410 GOTO 110

Then we have the GOSUB/RETURN that simply restarts the INKEY loop (note the dummy handler branches to a RETURN statement):

100 ON INSTR(" ABC", INKEY$) GOSUB 210, 200, 300, 400:GOTO 100
200 REM STUFF
210 RETURN
300 REM STUFF
310 RETURN
400 REM STUFF
410 RETURN

And the GOSUB/RETURN variant that continues execution after the INKEY check (note the dummy handler branches to a RETURN statement):

100 ON INSTR(" ABC", INKEY$) GOSUB 210, 200, 300, 400
110 REM STUFF
120 GOTO 100
200 REM STUFF
210 RETURN
300 REM STUFF
310 RETURN
400 REM STUFF
410 RETURN

Allen’s benchmarking examples all have the INKEY check wrapped in a FOR loop in order to run enough times to get a useful benchmark using the 60Hz TIMER feature of Extended Basic. That means he is not testing the case where the INKEY loop is simply restarted after processing a key press.

He did benchmark a simple GOTO/GOTO setup and a simple GOSUB/RETURN setup to determine whether GOTO/GOTO or GOSUB/RETURN was, in fact, faster. His result showed the GOSUB/RETURN was faster than GOTO/GOTO. This is expected because there is only one line number search in the GOSUB/RETURN case while the GOTO/GOTO case has two, one of which must start at the beginning of the program because at least one has to jump backwards. He did some additional tinkering to determine that the size of the program affects the line number lookup time which is expected based on the internals described above. His result showed the GOTO/GOTO was about 7% slower than GOSUB/RETURN.

He also discovered that ON…GOTO and ON…GOSUB were markedly slower than plain GOTO/GOSUB by testing variants using “ON 1” with a single destination line number. This would be exactly equivalent to a plain GOTO or GOSUB as far as execution result goes. That these variants were markedly slower makes sense when you think about it because the ON variant has to evaluate an expression to determine which of the destination line number options to use. He was surprised to find no actual difference between ON…GOTO and ON…GOSUB in this test, though. His tests were using 1000 iterations using a 60Hz timer. A measurable difference might appear using a larger iteration count. However, that doesn’t explain why there is no measurable difference between GOSUB and GOTO in this case when in the base case, the difference was measurable at 7%.

Let’s consider what is going on. In the ON case, the following steps occur. First, an expression is evaluated. Then a check for GOTO or GOSUB is performed. Then the line number options are skipped until there are no more or the correct one is found. Then control is transferred to the base GOTO/GOSUB code. What is likely happening is that the cost of expression evaluation is swamping all other variables in this particular situation.

Because that seemed odd, I tried the same benchmarks. I got some variation between runs of the same benchmark. Taking rough averages of the results across multiple runs, I found that bare GOTO/GOTO was about 6% slower than bare GOSUB. I also found no difference between the “ON 1” variants. This was using mess to run the benchmarks so some of the variation between runs probably came from there. However, it does seem to confirm Allen’s results.

So what does this tell us about the overhead of GOSUB/RETURN vs GOTO/GOTO? I think it tells us that in a trivial program, the overheads are not significantly different between RETURN followed by a “skip to the end of the statement” compared to a “look up a line number and transfer control directly” operation. I think you will find that the actual cost in instructions executed is relatively close in the case of a fairly trivial program. For a much larger program, I would expect the GOSUB/RETURN scenario to be better.

Allen also tested the results with more representative program structure (like those in the examples above) and with some extra lines. His result was that the GOTO/GOTO situation depends on factors like program size while the GOSUB/RETURN is less impacted by that. Based on the internals described above, that would be the expected result.

So far, this only examines the case where additional processing is going to occur after the INKEY check. In the case where the INKEY check is simply going to run until a key is pressed, the key handler is dispatched, and then the INKEY loop is restarted will always be somewhat faster using the GOTO/GOTO method even though optimizing that case is pretty pointless. This simply must be the case because you must always jump back to the start of the INKEY loop, regardless what happens as a result. If you are using GOSUB/RETURN, then there must be a RETURN statement prior to that necessary GOTO. If, instead, you simply use GOTO directly, you can skip all the stack manipulation. Either way, the GOTO has to start from the beginning of the program and will be going to the same destination. That means it will take the same length of time to execute that GOTO regardless of its proximity to its destination.

Essentially, based on these results and Allen’s results, it seems that there is no real benefit to using the GOTO/GOTO option. However, there is a good chance that the benchmark structure we have used is introducing enough noise in the results that we are missing some critical detail. It would probably be worth running equivalent benchmarks with 10,000 or 100,000 iterations for each test and running the tests a couple of dozen times each in order to see if there is any signfiicant difference between the ON…GOTO and ON…GOSUB cases. However, that starts getting into statistical analyisis and is probably not worth it unless your program really depends on squeezing the absolute maximum performance out of what you are benchmarking.

Additional Comments

There are a couple of additional comments I want to make before ending this.

First, you may recall that during a line number search, the actual cost of depends only on the number of lines to search. The length of the lines has no impact. That would suggest that compacting the program into as few lines as possible would be a total win. That will often be the case, of course. However, executing either a GOTO or GOSUB (or an implied GOTO as part of THEN or ELSE) requires scanning byte by byte to the end of the line in order to determine if a forward scan would be more beneficial. That means if the GOTO or GOSUB is near the start of a long line, any speed benefit from compacting the program into as few lines as possible may be negated.

Next, you should always benchmark any change you make to improve performance. As seen above in the benchmarks, sometimes something that should make things faster doesn’t have a measurable impact. You would expect from the tests with GOSUB/RETRN vs GOTO/GOTO, that ON…GOTO/GOTO would be slower and ON…GOSUB/RETURN and yet the particular benchmark used by both Allen and me shows no measurable difference. That is not to say it wouldn’t show a difference in a different coding environment. However, you shouldn’t assume that just because one has more overhead than the other that it is that obvious overhead that is the important factor. In fact, the overhead of executing GOSUB vs GOTO is pretty small when you look at the actual ROM code so it’s relatively easy to understand why that would disappear in the noise when combined with a much more expensive operation like evaluating an expression.

Basically, the overall point is that just because it seems logical that something should be faster, it doesn’t mean that is, or that the speed increase will be enough to be worth it.

Finally, you need to consider whether your benchmarking process is actually affecting the results of the benchmark. If you can think of multiple ways to benchmark something, do it. That way, you will have a better idea of the real impact of a change. Also related to this, you should consider whether your benchmark yielded a real difference or not, or if the lack of difference is real. This may require statistical analysis and, so, may not be worth it. In this case, for instance, it would likely be overkill.

13 thoughts on “Optimizing Color Basic – ON GOTO vs ON GOSUB”

  1. Awesome work, William! I would suspect that GOSUB/RETURN should win provided that the time it takes to search to find the return GOTO is less than the time it takes to push the GOSUB return value, and retrieve it and return there. I’m going to spin out a 1000 line program and see if that shows it better.

    990 GOTO 1000 – fast
    1000 GOTO 990 – should be slow

    990 GOSUB 1000 – fast but is doing more work so a tad slower
    1000 RETURN – should be faster than scanning lines 10-980 that GOTO does

    In the case of INKEY, for a game the background operation would continue so in both cases, an empty INKEY should keep processing. But, if it were going to wait until there is a key, I wonder how to do this:

    100 ON INSTR(“*ABC”,INKEY$) GOTO 100,200,300,400

    …using GOSUB/RETURN. But, if my 1000 line test doesn’t show GOTO is much slower than RETURN, it doesn’t matter 😉

    But I sure would love to avoid using GOTO as much as possible.

    Thanks for looking at this! I was also considering doing an outer FOR/NEXT to average out the inside results. I also may have some bad results because I was emulating in a default PAL mode, and the 50mhz through off the TIMER results. That was a head scratcher (“why is this suddenly faster?”) until I realized I was in PAL mode!

    1. Actually, a 50Hz timer wouldn’t make any difference to the validity as long as all the tests used the same setting. Now if you were comparing 60Hz results to 50Hz results, that would invalidate comparisons. However, my results agreed with yours and I know I didn’t mess with the emulation mode.

      Your “wait for a key” variant needs to have the “idle” case GOSUB to a RETURN statement (like one of my four examples). That’s going to be slower in the idle case but if you’re waiting for a key press, it’s not likely to be problematic for two reasons. First, people aren’t going to type that fast. Second, between statements, Color Basic checks for key presses to detect BREAK and SHIFT-@. It caches the result and if there is a saved key, INKEY$ returns that instead of polling the keyboard.

      1. I think one of my original tests was for the “” case to GOSUB to a RETURN. If GOSUB is faster for programs with more lines, then that seems a reasonable solution. Thanks! (Poor, poor interpreter.)

  2. What means this?

    “Then, when RETURN happens, the line number and execution pointer are restored from the stack and execution resumes as though a DATA command is executing.”

    1. When the interpreter encounters a DATA statement, it executes it by skipping to the end of the statement while respecting string delimiters. Basically, it skips to either the end of the line or a colon unless the colon is in the middle of a string. That is as opposed to REM (or ELSE) which skips to the end of the line unconditionally.

      Because the execution pointer stored on the stack during GOSUB is immediately after the destination line number, there may be additional line numbers that need to be skipped when the RETURN happens (in the ON…GOSUB case) so it has to skip to the end of the statement before resuming execution to prevent a spurious syntax error. Because GOSUB and ON…GOSUB share code, it also has the side effect that you can put random stuff after a plain GOSUB statement without triggering a syntax error. (For instance, “GOSUB 500 COMMENT HERE:REM more commands”) Likely they did it this way to save code space in the ROM even though it would be more correct to handle saving the return address differently between the ON…GOSUB and GOSUB commands.

      1. “GOSUB 500 COMMENT HERE:REM more commands”

        Wow. That’s wild. I just trying that and was surprised, but it would make sense if they did give an ?SN ERROR there anyway.

        If I understand, when I do something like this:

        100 GOSUB 1000:PRINT”BACK AGAIN!”:FORA=1TO5000

        …when the GOSUB is processed, if it skips stuff after it, are you saying it scans forward until the end of line or colon before storing the return pointer?

        Thus, this foolish code would be slower to get to 1000 because of the extra useless scanning:

        100 GOSUB 1000 I AM JUST TRYING TO SLOW THINGS DOWN:PRINT”BACK AGAIN!”:FORA=1TO5000

        This is cool. It makes me want to write my own BASIC.

        1. Wow. I just did a test. 1482 versus 160 just by packing a full line of junk directly after the line number in the GOSUB. Interesting.

          I did a test with

          100 GOSUB100 JUNK TO THE END OF THE BUFFER
          ~1515

          100 GOSUB100:REM JUNK TO THE END OF THE BUFFER
          ~1525

          100 GOSUB100
          110 REM JUNK TO THE END OF THE BUFFER
          ~867

          Thus, GOSUB is faster if it’s the last thing on a line. I will add this to my list!

          Thanks, William!

          1. That is the expected result based on the ROM code. Nice to know the universe is behaving predictably in this case….

            Actually, it stores the return pointer immediately such that the return pointer is actually to the byte immediately following the line number. Then it scans to the end of the entire line so it can start looking for lines using the 4 byte line headers (which contain the line number and the pointer to the next line).

            Upon RETURN, the input pointer is restored to point to the end of the original destination line number at which point it has to skip anything following that to avoid throwing a syntax error in the case of ON…GOSUB. However, in that case, it only has to skip to the end of the statement (end of line or colon).

            Thus, GOSUB will always trigger a forward scan to the end of the line. GOTO will have the same effect. So will “THEN” or “ELSE” followed by a line number if I recall correctly. Either way, it is definitely faster to have GOTO or GOSUB at the end of the line compared to the start of the line. However, if you don’t have any extra junk after the line number in a GOSUB, the scan to the end of the statement will be nearly instant.

        2. On writing your own Basic: yeah, that can be fun. You very quickly understand why the Color Basic ROMs do some of the things they do.

  3. 10 REM

    1000 REM
    2000 TIMER=0:TM=TIMER
    2010 FOR A=1 TO 1000
    2020 GOSUB 3000 /vs/ GOTO 3000
    2030 NEXT
    2040 PRINT TIMER-TM:END
    3000 RETURN /vs/ GOTO 2030

    GOSUB … 170
    GOTO … 372

  4. I am going to try something like this for averaging out benchmark runs. I try to snapshot the time as close to the outer FOR as possible, and immediately after the NEXT (not that it would matter). Not sure about the order of the DIM variables, since none of those are done inside the timing loop.

    5 DIM TE,TM,B,A,TT
    10 FORA=0TO4:TIMER=0:TM=TIMER
    20 FORB=0TO1000
    30 REM
    40 REM PUT CODE TO BENCHMARK
    50 REM HERE.
    60 REM
    70 NEXT:TE=TIMER-TM
    80 TT=TT+TE:PRINTA,TE
    90 NEXT:PRINTTT/A:END

    1. There isn’t much point pre-declaring variables in this case. You don’t save much, if anything, unless there are arrays in play. However, if you are going to do that, you want to put them in the order of most referenced to least referenced. So a variable that is referenced once every time through a loop would come before a variable that is referenced only once. Variables are added to the variable table in the order they are encountered (outside of expression evaluation) and the variable search starts at the beginning of the table.

  5. Try this:
    0 GOTO100
    1 REM STUFF
    2 NEXT:GOTO110
    3 REM STUFF
    4 NEXT:GOTO110
    5 REM STUFF
    6 NEXT:GOTO110
    7 I=.:NEXT:GOTO110
    100 FORI=0TO1STEP0:ONINSTR(” ABC”, INKEY$)GOTO7,1,3,5:NEXT
    110 REM CONTINUE PROGRAM EXECUTION

Leave a Reply

Your email address will not be published. Required fields are marked *