Wednesday, July 28, 2010

Galloping errors

kw: observations, analysis, mathematics

I had occasion to consider the Logistic Equation today, recalling an analysis I began to do some years ago. At the time I had a DEC VAX 7600 available to me, and its FORTRAN language supported quad precision calculations. I am presently confined to ordinary "double precision", the 48-bit-plus-exponent reals used in Excel and most implementations of C or Perl. So I took a new approach.

I was interested in error propagation. Forty-eight bits of precision seems like a lot, and it approximates 15 decimal digits. But equations like the Logistic Equation can be pathological because you are subtracting quantities in the range [0,1] from 1.0, over and over. The equation has the form:
Xn+1 = r Xn (1 - Xn)
As long as you choose a starting X between 0 and 1 and an r between 0 and 4, the equation won't blow up, because X(1-X) will be of the order of 0.25 or less.

This equation is famous for displaying chaotic behavior when r exceeds 3.57. For smaller values, it has various "attractors" that either settle down to an asymptote, or jump between two, four, or more numbers. But in the chaotic region, it never settles down (See this Wikipedia article for a full explanation).

When I began this analysis, I was wondering to what extent the chaotic behavior is a function of rounding error. As I wrote above, fifteen digits of precision is a lot. But in a recursive process, rounding errors accumulate. To see how quickly they might show up, and how quickly they would dominate the process, I started by calculating 100 recursions using r = 3.625 and X0 = 0.5, using Excel 2010. Then I did the same thing using the ROUND function, first rounding every calculation to seven digits, then rounding to twelve digits. An example function, for those who want to check, is
=$B$6*ROUND(E12,12)*ROUND(1-E12,12)
B6 is where r was, and E12 was the prior X. I subtracted the results from the "full precision" column to produce this chart:
I had originally planned to increase the table size as needed, but found that 100 rows was plenty. The 7-digit-precision rounding errors become 1% of the data values at the 55th iteration, and the 12-digit-precision errors get that large at the 98th iteration. By the 66th iteration, rounding error has taken over the 7-digit calculation, and there is little doubt that the 12-digit calculation will be essentially taken over at about the 110th iteration.

Can there be any doubt that the 15-digit calculation has little chance of remaining reasonably error-free as long as 150 iterations? Yet I frequently see graphs showing the logistic equation's results for thousands of iterations. The majority of the results on such charts are clearly bogus. We need better ways of handling such calculations. I chose 0.5 and 3.625 carefully, since both are exactly representable in binary real variables. Yet the equations were swamped by rounding rather quickly. I suspect even a 16x precision (~480 bits) calculation using traditional styles of "real" arithmetic will not survive to the 1000th iteration.

I am thinking about ways to resolve the dilemma, but for the moment, I can do little more than present it.

No comments: