Multiplayer Games and Floating Point – William Astle's General Clutter

Multiplayer games are quite popular. That statement is not likely to be controversial. What might be controversial is my assertion that game developers are implementing network based multiplayer incorrectly. The sheer number of bugs related to desynchronization, especially between players on different platforms, on some games I’ve been familiar with over the years leads me to believe this is a much harder problem for many developers than it would seem on the surface. Here, I’m going to discuss one major source of problems: floating point numbers.

I suppose it is worth a brief explanation of what floating point numbers are. If you think back to your dimly remembered grade school days, you may remember encountering something about a thing called scientific notation. In scientific notation, you have a mantissa, which is usually a relatively small number of digits with the decimal point after the first digit, followed by multiplcation by a power of 10. These numbers look something like 1.74×10⁵ or 9.3201×10³². The advantage of this system is that it can represent numbers that would otherwise have dozens or hundreds of digits in a compact form. The tradeoff is loss of precision. Scientific notation comes along with the notion of significant digits which means some number of digits are considered to be significant, or important, and numbers are rounded to that number of digits. For many uses of scientific notation, that is useful and sensible, but it does introduce errors in calculations relying on the rounded values, an error that accumulates as more steps are introduced.

Floating point numbers in computers are represented exactly the same way except the mantissa is in binary and it is multiplied by a power of 2. A fixed number of bits are allocated to the mantissa giving a built in limit to significant digits. A standard 32 bit IEEE 754 floating point value has 8 bits for the binary exponent, 23 bits for the mantissa and 1 bit to represent the sign. Due to some technical details, that gives 24 bits of mantissa precision. The critical point there is the 24 bits of precision which means there are only 2²⁴ possible sets of digits that can be represented for each possible binary exponent. I won’t go into the details here, however. The critical point is that there are limited values possible due to the limited number of bits.

To further complicate things with floating point, decimal fractions that we expect to be representable nicely as terminating decimals end up being repeating binaries. For instance, 1/10 ends up being 0.00011001100110011… with the 0011 part repeating forever. Depending on the number of bits available for the fraction in a floating point, the end result could be slightly high or slightly low. Normally, printing out floating point numbers rounds them to a useful number of digits which tends to hide this type of weirdness. However, this contributes to accumulated errors over time in calculations relying on such numbers.

There is one final wrinkle to add in. Different platforms may use different floating point hardware with different behaviour for edge cases. It may even behave slightly differently for such mundane things as converting floating point values to and from text, or exactly how rounding happens. And that isn’t including bugs in floating point implementations, compiler variances such as generating different bit patterns for the same floating point constant, and so on.

Okay, so I’ve gone on for some length about floating point numbers. They are not the root of all evil or anything like that. They are extremely useful, and even necessary in a lot of cases. However, when you need a deterministic result across multiple platforms over a network, floating point should be avoided. That’s not to say that floating point should never be used, but when used naïvely, it will almost certainly lead to problems or hard to trace bugs.

Now let me look at several multiplayer designs that I have seen in use.

Central Server With Thin Clients. In this case, the main game controller is on a central server and the game client is just serving as an interface with the user. In this case, go ahead and use floating point all you want when building the central server since nothing needs to arrive at the same result independently. However, using floating point in the network communication protocol is asking for trouble that is mysterious and very difficult to debug.
Server With Heavy Clients. In this case, there will be a server that arbitrates the multiplayer game (which might be one of the players) but each player’s game program runs the game for itself at some level. This is usually done to reduce the amount of traffic on the network or to reduce the processing load on the server. However, it does require all the different client software to arrive at the same state from the same initial conditions with no drift due to accumulated calculation errors. This model is used by such games as OpenTTD and Civilization VI. When things go wrong, you get desynchronizations and other mysterious inconsistencies that are nearly impossible to debug. In fact, in the case of Civilization VI, simply using a different compiler to build the game for different platforms led to massive incompatibilities for cross platform multiplayer.
Shared Save Game. You would think it wouldn’t be an issue here at all, but if you serialize any data in the save game in floating point, you could end up with some subtble variations. However, it will generally be less of a problem as long as it doesn’t introduce outright bugs in game play. Even so, writing floating point out to the save game is asking for trouble if it’s anything important to the game mechanics.

Anyway, my point here is that using floating point for everything is a bad idea, especially when you need predictabilty across multiple possible platforms, which can include different compilers or different versions of the same operating system. Make sure you understand floating point before you use if for something important, especially when an integer data type will do instead.

This is not helped by the fact that Lua, a very popular embedded scriptiong language used by a lot of games, only has floating point. The same is true for a lot of game engines which are built entirely around floating point. With some careful programming, however, you can avoid most of the pitfalls even if you’re forced to use it or choose to for whatever reason. If you keep the values within integer values that can be represented in a signed 24 bit value, you probably won’t run into trouble there since those values will be representable exactly. Otherwise, carefully round values in a predictable way, possibly to powers of two to avoid non-terminating values, or keep your internal values multiplied by, say, 100 to represent two decimal places, and convert it back on display. Other things may also occur to you. And, above all, do not ever dump the internal binary floating point representation to data files, save games, or network protocols. Instead, use a format suitable for your needs where the reading and writing is very carefully implemented such that it is as independent as possible from the underlying binary implementation of floating point.

So, TL;DR, don’t use floating point if you can avoid it. Period. That way lies madness.

Leave a Reply Cancel reply