MasterCard Enables Fraud

Yes, the headline is clickbait. However, it is also accurate.

So I had some fraudulent charges on my MasterCard back in June. That did not unduly alarm me. I knew I needed to call my card issuer and disput the charges. I did so and they reversed them, cancelled the card, and issued a new one. All was well with the world. This is what should happen, after all. Alas….

TL;DR: Cancelling a card and getting a replacement after a fraudulent doesn’t necessarily stop the fraudulent charges due to some fuckwit at MasterCard thinking that “force billing” (allowing a merchant to obtain the new card number) is a good idea. My conclusion: “force billing” should be illegal.

Continue reading “MasterCard Enables Fraud”

SourceCop Redux

Quite some time back, I mentioned SourceCop in a diatribe on source obfuscation. Today someone apparently representing SourceCop wrote a comment on that post which reads very much like a commercial for their product. I did not approve the comment because my blog is not a sales platform and also because it was quite long. I have, however, chosen to reproduce most of it here and address the points it makes. You may want to read the previous article for context. Continue reading “SourceCop Redux”

Color Basic and String Handling

In his recent article on Color Basic String Theory, Allen Huffman raises some interesting points. It turns out that Color Basic (on the TRS-80 Color Computer) has a very straight forward way of handling strings but there are some details that arise from it that may prove a bit surprising. Additionally, there are requirements and limitations that drive some of the design of the string handling system. I’m going to go into detail on just how Color Basic handles strings behind the scenes.

Continue reading “Color Basic and String Handling”

Optimizing Color Basic – ON GOTO vs ON GOSUB

Allen Huffman has been posting a series of articles on optimizing Color Basic as found on the TRS80 Color Computer. This is a response to his most recent (as of this writing) entry. Note that I’m not criticizing his article or the conclusions he reaches in it. Instead, this is intended to provide some more detailed background information. It also provides a good starting point to discuss some points about optimization, so even if you aren’t into programming on the Coco, you might find it interesting.

In his article, Allen talks about using a trick involving INSTR() and INKEY$ to optimize a key press handling loop. I won’t go into details on that aspect of it. Suffice to say that the trick works nicely and is something that should have occurred to many people back in the day but apparently did not. For a detailed analysis of the trick, head on over to Allen’s article.

After explaining the trick, he goes on to discuss two possible variants which take advantage of either ON…GOTO or ON…GOSUB. He goes through a number of benchmarks and comes up with some results. However, he was a bit surprised by some of his benchmark results. I am going to run through some implementation details for GOTO and GOSUB in Color Basic. Then I am going to examine how that affects his benchmarks.

Under the Hood

The first implementation detail that needs explaining is finding the destination line in the program. This affects GOTO and GOSUB equally. To understand this, we need to know how programs are stored in memory. Each program line have a 4 byte header which consists of a 16 bit line number and a 16 bit pointer to the start of the next line. Each line also ends with a single NUL byte. Lines are stored sequentially in memory in line number order. The next line pointer makes scanning for a particular line number fairly efficient since it is a constant time operation to find the start of the next line. However, it also means that the worst case scenario means that we must examine every single line in the program to find the destination line.

Because line number lookups are so common, there are a couple of optimizations to the process. If the requested line number is higher than the number of the line currently executing, the search starts from the next line in the program rather than the start of the program. Also, the search stops as soon as a line number that is higher (or equal, obviously) to the requested line number is encountered. For the sake of optimization, the not found case doesn’t matter. However, the practical upshot is that a GOTO to the next line of the program has the same real cost as a GOTO to the first line of the program. In fact, a goto N lines ahead in the program is equivalent to a GOTO to the Nth line of the overall program (unless the Nth line is ahead of the GOTO statement).

Another point is that both GOTO and GOSUB will skip to the end of the current line so that the line search can happen using the line headers described above.

The final important bit is how GOSUB actually works. GOSUB saves the current line number and execution pointer on the stack and then processes the remainder of the command as a GOTO. Then, when RETURN happens, the line number and execution pointer are restored from the stack and execution resumes as though a DATA command is executing. That means any extra stuff at the end of the GOSUB command is skipped. This is necessary for ON…GOSUB to work. As a matter of fact, the actual transfer of control is handled by exactly the same code as a plain GOTO or GOSUB.

Finally, we should look at the stack situation. Both FOR/NEXT and GOSUB/RETURN store information on the stack. Each record on the stack is prefixed a byte that indicates the entry type. For FOR/NEXT, this yields 18 bytes per entry. For GOSUB, this yields 5 bytes per entry. The stack search looks for the first entry that isn’t a FOR/NEXT record. If that happens to be something saved by GOSUB, then the stack is reset and control resumes at the calling point. Note that this means that any open FOR/NEXT loops will also be purged, which means that RETURN in the middle of a FOR/NEXT loop behaves as expected.

Implications

The implications of all of those details are interesting. There are four possible cases arising from Allen’s trick. First, we have the GOTO/GOTO option which simply restarts the INKEY loop:

100 ON INSTR(" ABC", INKEY$) GOTO 100,200,300,400:GOTO 100
200 REM STUFF
210 GOTO 100
300 REM STUFF
310 GOTO 100
400 REM STUFF
410 GOTO 100

Then we have the GOTO/GOTO option that resumes AFTER the INKEY loop (note the dummy handler branches to the line following the INKEY check):

100 ON INSTR(" ABC", INKEY$) GOTO 110, 200, 300, 400
110 REM STUFF
120 GOTO 100
200 REM STUFF
210 GOTO 110
300 REM STUFF
310 GOTO 110
400 REM STUFF
410 GOTO 110

Then we have the GOSUB/RETURN that simply restarts the INKEY loop (note the dummy handler branches to a RETURN statement):

100 ON INSTR(" ABC", INKEY$) GOSUB 210, 200, 300, 400:GOTO 100
200 REM STUFF
210 RETURN
300 REM STUFF
310 RETURN
400 REM STUFF
410 RETURN

And the GOSUB/RETURN variant that continues execution after the INKEY check (note the dummy handler branches to a RETURN statement):

100 ON INSTR(" ABC", INKEY$) GOSUB 210, 200, 300, 400
110 REM STUFF
120 GOTO 100
200 REM STUFF
210 RETURN
300 REM STUFF
310 RETURN
400 REM STUFF
410 RETURN

Allen’s benchmarking examples all have the INKEY check wrapped in a FOR loop in order to run enough times to get a useful benchmark using the 60Hz TIMER feature of Extended Basic. That means he is not testing the case where the INKEY loop is simply restarted after processing a key press.

He did benchmark a simple GOTO/GOTO setup and a simple GOSUB/RETURN setup to determine whether GOTO/GOTO or GOSUB/RETURN was, in fact, faster. His result showed the GOSUB/RETURN was faster than GOTO/GOTO. This is expected because there is only one line number search in the GOSUB/RETURN case while the GOTO/GOTO case has two, one of which must start at the beginning of the program because at least one has to jump backwards. He did some additional tinkering to determine that the size of the program affects the line number lookup time which is expected based on the internals described above. His result showed the GOTO/GOTO was about 7% slower than GOSUB/RETURN.

He also discovered that ON…GOTO and ON…GOSUB were markedly slower than plain GOTO/GOSUB by testing variants using “ON 1” with a single destination line number. This would be exactly equivalent to a plain GOTO or GOSUB as far as execution result goes. That these variants were markedly slower makes sense when you think about it because the ON variant has to evaluate an expression to determine which of the destination line number options to use. He was surprised to find no actual difference between ON…GOTO and ON…GOSUB in this test, though. His tests were using 1000 iterations using a 60Hz timer. A measurable difference might appear using a larger iteration count. However, that doesn’t explain why there is no measurable difference between GOSUB and GOTO in this case when in the base case, the difference was measurable at 7%.

Let’s consider what is going on. In the ON case, the following steps occur. First, an expression is evaluated. Then a check for GOTO or GOSUB is performed. Then the line number options are skipped until there are no more or the correct one is found. Then control is transferred to the base GOTO/GOSUB code. What is likely happening is that the cost of expression evaluation is swamping all other variables in this particular situation.

Because that seemed odd, I tried the same benchmarks. I got some variation between runs of the same benchmark. Taking rough averages of the results across multiple runs, I found that bare GOTO/GOTO was about 6% slower than bare GOSUB. I also found no difference between the “ON 1” variants. This was using mess to run the benchmarks so some of the variation between runs probably came from there. However, it does seem to confirm Allen’s results.

So what does this tell us about the overhead of GOSUB/RETURN vs GOTO/GOTO? I think it tells us that in a trivial program, the overheads are not significantly different between RETURN followed by a “skip to the end of the statement” compared to a “look up a line number and transfer control directly” operation. I think you will find that the actual cost in instructions executed is relatively close in the case of a fairly trivial program. For a much larger program, I would expect the GOSUB/RETURN scenario to be better.

Allen also tested the results with more representative program structure (like those in the examples above) and with some extra lines. His result was that the GOTO/GOTO situation depends on factors like program size while the GOSUB/RETURN is less impacted by that. Based on the internals described above, that would be the expected result.

So far, this only examines the case where additional processing is going to occur after the INKEY check. In the case where the INKEY check is simply going to run until a key is pressed, the key handler is dispatched, and then the INKEY loop is restarted will always be somewhat faster using the GOTO/GOTO method even though optimizing that case is pretty pointless. This simply must be the case because you must always jump back to the start of the INKEY loop, regardless what happens as a result. If you are using GOSUB/RETURN, then there must be a RETURN statement prior to that necessary GOTO. If, instead, you simply use GOTO directly, you can skip all the stack manipulation. Either way, the GOTO has to start from the beginning of the program and will be going to the same destination. That means it will take the same length of time to execute that GOTO regardless of its proximity to its destination.

Essentially, based on these results and Allen’s results, it seems that there is no real benefit to using the GOTO/GOTO option. However, there is a good chance that the benchmark structure we have used is introducing enough noise in the results that we are missing some critical detail. It would probably be worth running equivalent benchmarks with 10,000 or 100,000 iterations for each test and running the tests a couple of dozen times each in order to see if there is any signfiicant difference between the ON…GOTO and ON…GOSUB cases. However, that starts getting into statistical analyisis and is probably not worth it unless your program really depends on squeezing the absolute maximum performance out of what you are benchmarking.

Additional Comments

There are a couple of additional comments I want to make before ending this.

First, you may recall that during a line number search, the actual cost of depends only on the number of lines to search. The length of the lines has no impact. That would suggest that compacting the program into as few lines as possible would be a total win. That will often be the case, of course. However, executing either a GOTO or GOSUB (or an implied GOTO as part of THEN or ELSE) requires scanning byte by byte to the end of the line in order to determine if a forward scan would be more beneficial. That means if the GOTO or GOSUB is near the start of a long line, any speed benefit from compacting the program into as few lines as possible may be negated.

Next, you should always benchmark any change you make to improve performance. As seen above in the benchmarks, sometimes something that should make things faster doesn’t have a measurable impact. You would expect from the tests with GOSUB/RETRN vs GOTO/GOTO, that ON…GOTO/GOTO would be slower and ON…GOSUB/RETURN and yet the particular benchmark used by both Allen and me shows no measurable difference. That is not to say it wouldn’t show a difference in a different coding environment. However, you shouldn’t assume that just because one has more overhead than the other that it is that obvious overhead that is the important factor. In fact, the overhead of executing GOSUB vs GOTO is pretty small when you look at the actual ROM code so it’s relatively easy to understand why that would disappear in the noise when combined with a much more expensive operation like evaluating an expression.

Basically, the overall point is that just because it seems logical that something should be faster, it doesn’t mean that is, or that the speed increase will be enough to be worth it.

Finally, you need to consider whether your benchmarking process is actually affecting the results of the benchmark. If you can think of multiple ways to benchmark something, do it. That way, you will have a better idea of the real impact of a change. Also related to this, you should consider whether your benchmark yielded a real difference or not, or if the lack of difference is real. This may require statistical analysis and, so, may not be worth it. In this case, for instance, it would likely be overkill.

Leap Second on Dec 31. Sigh.

Yet again, we have a leap second being added to UTC to further complicate everyone’s lives. Well, that might be overstating it, but it sure complicates the lives of server and network administrators, among others. The notion is that leap seconds are required to keep UTC in sync with Earth’s rotation and to prevent our clocks eventually being so far out of sync that solar noon will be at midnight. That notion is wrongheaded in the extreme, though.We would simply use some adjustment to get from UTC to local time once it started getting far enough out of sync. Local time would still continue to be approximately related to mean solar time. Continue reading “Leap Second on Dec 31. Sigh.”

The “St. Ives” Riddle

I’m sure almost everyone has heard the “St. Ives” riddle in one form or another. It goes as follows:

As I was going to St. Ives,
I met a man with seven wives,
Each wife had seven sacks,
Each sack had seven cats,
Each cat had seven kits:
Kits, cats, sacks, and wives,
How many were there going to St. Ives?

There are a few variations to the above. This discussion is based on the above text as written so any criticism bringing in other versions or what have you is not relevant.

There are a few different answers for it. The general consensus seems to be that the correct answer is one. My assertion is that the general consensus is wrong. My reasoning generally parallels the reasoning that leads to the conclusion that one is the correct answer but the final conclusion differs.

First, the narrator is going to St. Ives. Normally, if you meet someone on the road, it’s because they are going in a different direction or are not going anywhere at all. (It seems that “meet” had a much tighter definition in the time when the riddle was first framed so this is actually reasonable.) In either of those cases, the only mentioned person or thing going to St. Ives would be the narrator. Thus, the answer is one, correct? After all, we know the narrator is going to St. Ives. Except that doesn’t fit. The rhyme specifically calls out “kits, cats, sacks, and wives” in the question. Note that it does not include the man or the narrator! That means that neither the narrator nor the man with the wives can be included in the answer.

The other assumptions I made to arrive at the above are:

  • The narrator is not a wife. That is a reasonable assumption but there is no actual evidence to support it. If the narrator happens to be a wife, that allows you to justify an answer of one. However, bringing in unstated information is generally not considered valid for a riddle since that would allow any random answers to be justified.
  • The second to last line is not there for mere decoration or to fill out the rhyme. That is a reasonable assumption since doing anything other than considering the entire text is cherry picking and that can be used to defend all manner of answers.

You could argue that everyone is going to St. Ives depending how you interpret “met”. Considering the age of this particular riddle, it’s reasonable to assume that “met” refers to oncoming traffic. If, however, we apply a looser modern interpretation of “met”, perhaps the narrator caught up with the man’s party, which is not unreasonable if he is travelling with seven wives. That would mean everyone is going to St. Ives. In that case, you would have to do the calculation and arrive at 2800 (the total number of kits, cats, sacks, and wives). Again, one suggested answer for this circumstance is 2801 but that’s not defensible at all, even if you do interpret things to include the narrator and the man. In that case, the answer would be 2802. However, as noted above, the question specifically enumerates the kits, cats, sacks, and wives so the man and the narrator should not be included. That means only 2800.

I should note at this point that there is a variation where the narrator only meets the seven wives and there is no mention of the man. In that case, if you count everyone and everything, then 2801 would be valid. However, as long as the man is mentioned, 2801 cannot be defended.

Civilization V – Venice

With the impending Civilization VI release, I thought it would be amusing to do a few posts about Civilization V, particularly because it looks like I won’t be able to purchase Civilization VI when it is released because all signs point to there being no Linux port. In fact, rumour has it there won’t ever be one and the reported reasons for that are complete BS. But that’s beside the point.

Just recently, I decided to finally play a game as Venice. I set it up as the one city challenge since, why not. Venice is only allowed to found one city anyway. The only difference for Venice in the one city challenge is that it is not possible to control additional cities (puppets). Continue reading “Civilization V – Venice”

Apache mod_proxy_fcgi and php-fpm

To anyone who’s been following developments in the web hosting world, the existence of FastCGI shouldn’t come as a surprise. Neither should the fact that the PHP folks provide a FastCGI process manager called php-fpm. Apache 2.4 comes with mod_proxy_fcgi which makes connecting all of that to Apache relatively easy (for sufficiently recent versions of Apache 2.4). What isn’t easy to find is a nice, simple, recipe for plumbing it all together without undesirable side effects. Continue reading “Apache mod_proxy_fcgi and php-fpm”