Sunday, October 06, 2013

Looking too hard, and not looking

kw: book reviews, nonfiction, forecasting, prediction, statistics

We are remarkably good at cutting through the clutter in many situations. For example, we can talk to someone at a crowded party and pick out what they are saying in spite of the noise all around; and we can often spot a familiar face in a crowd. However, we sometimes see (or hear, etc.) things that are not there. When I was a child we would look for faces or other shapes in clouds. In a few minutes of looking, something suggestive is bound to appear. And there is a painting by my father of waves breaking on a rocky seashore. One of the big rocks looks like a leopard's head, and once I'd seen it, ever since I always see that leopard's head whenever I glance at the painting.

My father had no intention to hide faces in his paintings. Seeing the leopard's head is an example of a Type 1 error. If my father did actually hide faces in all his paintings, and I have noticed only this one (I have several others), then missing the faces that are there would be Type 2 errors. If I become so rapt in searching clouds for faces that I don't notice a friend approaching until he taps me on the shoulder, I have fallen victim to both kinds of error! We lazy, sedentary Westerners tend to do this frequently. Not so someone living hand-to-mouth in the woods.

For nearly everyone, through all the one or two million years of our evolution as brainy apes, hyper-alertness was required. Where it matters most, a Type 1 error does no harm, but a Type 2 error might be fatal. Running from a rock that looks like a leopard can make you look silly, but not running from a leopard that looks like a rock will probably get you eaten. Strangely, though we have kept our strong propensity to make Type 1 errors, as the risk of not noticing a real leopard has fallen, we are more and more likely to make Type 2 errors. In our modern world, in which we increasingly rely on forecasts and predictions, this leads to trouble.

Nate Silver, in his new book The Signal and the Noise: Why So Many Predictions Fail – But Some Don't, presents a number of similar examples that display our modern tendency to pick faces out of clouds while ignoring the approaching friend (or foe). I'll simplify matters and mention that he finds successful forecasting in only two areas: weather and baseball. Politics and stock picking and a number of other areas come in for a drubbing.

This simple diagram tells me all I need to know about "technical analysis" of stock prices. The data are the day-to-day percent change in the price of DuPont stock, from 1962 to mid September of this year. That's just over 13,000 data points. The X axis is the change on any particular day, and the Y axis is the change on the following day. This diagram shows perfect non-correlation! It is a 2-D bell curve, though with thicker tails than a Gaussian bell curve.

During those 51 years, the stock rose nearly 4,200%. That averages out to 7.7% per year but only 0.032% daily. Someone who bought $1,000 of DD stock in early January 1962 would have $43,000 today. Now, there's been a lot of inflation. That $1,000 in 1962 had the buying power of $7,740 today. So a half-century of waiting produced an effective multiplier of 5.5. That's 3% yearly after adjusting for inflation. Better than the bank.

The most extreme daily jumps are -20% and +10%. Stock speculators, particularly day traders, dream of taking advantage of the many days that a stock's price changes more than a percent or two. And such days are more common than if the distribution were strictly Gaussian. DuPont stock moves up at least 2.5% in a day about 5% of the time, and downward with similar frequency. That means, if you could pick just those up days, about 12 days each year, you could earn at least a 20% return yearly. That's 2-3 times what a buy-and-hold strategy will earn. Then, look at this:


The chart shows the historical record of DuPont stock, adjusted for splits. Focus on late 1974, late 1987, and late 2008 to early 2009. These show DD following the herd during market crashes, and represent downturns of 50%, 41% and 65%, respectively. If you could have avoided them, by selling just at the peak and buying back in at the bottom, your final return would be 9.69 times greater, for a total value of $416,000! Adjusted for inflation, that's over 8% return yearly (12.5% dollar-for-dollar yearly return).

Such figures stoke the dreams of day traders. But the first chart, showing no day-to-day correlation, dashes those dreams. Day traders work very hard for little return, and most lose. Some lose, big time, and some gain, but it is by accident either way. There are millions of day traders and other stock speculators. As Churchill wrote, "Even a fool is right once in a while."

Now we must differentiate prediction from forecasting. A prediction is a flat statement that a specific happening will or will not occur at some time or in some time horizon. For example, "There will be a magnitude 7 earthquake in Fremont within the coming year." A proper forecast includes the forecaster's uncertainty and is stated in probabilistic terms, as, "Projecting the trend of earthquakes in Fremont indicates that an earthquake of magnitude 7 or greater occurs about 3 times every 200 years." [Fremont was the imaginary State in the novel Space by James A. Michener]. One might add to such a forecast, a hybrid statement such as, "Fremont has not experienced an earthquake of magnitude greater than 6 in the past 100 years," which implies that "the big one" may be overdue. But it may indicate that conditions deep down may also be changing.

Earthquake prediction is the poster child of unpredictable phenomena. Intense study and research over decades, even centuries, have failed to yield a single valid prediction. Sports betting is close behind, except in the arena of baseball. Nate Silver once created a system he calls PECOTA, that rates the strength of teams against one another according to the past statistics of their players, and a well-known "aging curve" of the way performance changes over a player's career. Because baseball has such a rich data set, going back a century, and the principles needed to make useful forecasts are also well known, PECOTA and similar systems can evaluate players and teams at a level nearly equal to the best scouts. The computer can't quite replicate the humans, but it does give 'em a run for the money!

Why are forecasting and prediction so hard? Even though we have randomness at the deepest level of atomic phenomena, that randomness is constrained by the statistics of large numbers, and physics works very accurately to predict many systems, such as planetary orbits. Thus, though the path of an electron after passing through a hole may be uncertain, the distribution center of the paths of trillions of electrons (say, a millionth of an ampere for 0.1 second or so) will be very sharply defined and can be accurately measured, and the shape of the distribution tells you additional facts: the hole's size and shape. The much larger "distribution" consisting of the atoms making up a baseball mean that its flight, once thrown or batted, will be easily predicted.

The geological setting of an earthquake is not as simple as an electron. Perhaps this year, an earthquake might occur, large enough that the two sides of a fault will slip by each other by half a meter. That may be enough to put two kinds of rock in contact, that were not in contact before, which changes the likelihood of the next earthquake.

What about the weather? Air is in constant motion; its humidity and temperature, and thus its density, change constantly. How can anyone make a useful weather forecast? In some ways, we are still dependent on the "signs in the sky" that Jesus mentioned. In modern (18th Century) terms, "Red sky at morning, sailor take warning. Red sky at night, sailor's delight." Lore such as this is a compilation of patterns that happen over and over, so that generations of our ancestors took note and remembered. Yet now we can get a forecast up to a week or two ahead, complete with expected high and low, precipitation chances and intensity, and wind strength.

It's all done in a computer. Air may have complex behavior, but the physics of air motion and how it changes with temperature, pressure and humidity are well known. The 3D-gridded-cell models that run in supercomputers use surprisingly simple physics to determine how a 3D cell is influenced by the 6 cells it is in facial contact with, and the 8 cells at its corners. The reason supercomputers are used is that Earth is big. The surface area of the planet is 4πr², where r is 6,370 km: about 510 million km². Cells of half a km on a side, plus 0.1 km in depth (up to 12 km altitude) result in a Global Circulation Model (you'll see the acronym GCM in some weather web sites) with 1/4 trillion cells. It takes a lot of calculation to determine what will happen in the next quarter hour. There are 96 quarter hours in a day, and 672 in a week. To do all those trillions and quadrillions of calculations in only an hour or two requires today's largest computers. And the forecasters' computer gurus don't do it once, they run it several times with very small variations (the formal practice of selecting the variations is called Design of Experiments), to test the stability and sensitivity of the forecast to perturbations.

Weather forecasters have an incentive to get it right that others don't have. The reality is going to arrive tomorrow or the next day, it is visible to all, and it is no fun getting a call such as, "I have ten inches of 'partly cloudy' that I need to shovel off my driveway. Want to come over and help?" They also get a ton of research money from the Dept. of Defense, because good forecasts are crucial to military activities. Earth dynamic studies are different. Students of earthquakes can't observe the day-to-day conditions of a fault line. Its active zone is typically 8-15 km deep, and we can't yet drill a well that deep. Earthquakes are also rare. Sure, there are thousands of little ones, at the bottom of "measurable", every day, but there are trillions of weather events around the globe, every few minutes.

Mr. Silver entertains us with many, many stories of the vagaries of forecasts of all types. In the end, most phenomena are too difficult to forecast appropriately. Some involve living things. The cardinal rule of animal studies is, "Given any particular set of temperature, lighting, food availability and ambient noise, the rat will do whatever the rat wants to do." And this is in spite of lab rats being so inbred that their genetics are practically identical. The statistics of playing poker yield a few big winners, who work hard for the kind of edge they need to beat their fellow experts. But they love to be in a game that is well supplied with "fish": overconfident amateurs. A well-written computer package might tell a poker player the optimum betting strategy, but only if it is betting against other computers. The social aspects of the game, bluffing and speed or slowness of a bet for example, often provide a lot more of an edge than the math does. Carefully crafted intimidation works wonders. I don't expect a computer to master these aspects of the game for a number of decades (that's my forecast!).

The book's final example is the climate, particularly "global warming" or "climate change" or "greenhouse effect" or whatever the next buzzword will be. Climate is not weather. It is the setting in which weather happens. Climate changes unfold over multiple decades or centuries or millennia. Weather changes take seconds. In numerical analysis, this is the Stiffness problem. When something changes suddenly, it takes time for the effects to either move elsewhere or to die down. If you are interested in something with a 5-year cycle, such as El Niño (also called ENSO), the exact location and timing of today's sudden thundershower will not matter one tiny bit. If your interest is in human-induced greenhouse warming that began in the late 1700s, ENSO is an irritation at best. In fact, weather and medium-scale cycles such as ENSO are "noise" in the context of this book's thesis. Another researcher, later on, made clear a different view, that noise is really signals, but about stuff you aren't interested in at the moment.

This is like the crystal radio I made as a kid. It initially consisted of a long wire, running to a treetop, a piece of germanium crystal, and a "whisker", a wire that formed a diode with the germanium; and earphones attached to the whisker and the ground connection on the back of the germanium crystal. The diode "detected" the audio signal by separating it out of the radio frequency "hash". There was just one strong station nearby, so I could hear them pretty clearly. But later, as more stations came on the air (this was the 1950s), I could hear all of them at once. So, following a diagram in Mechanix Illustrated, I made a coil and paid a dime for a small capacitor and a piece of copper, to make a rough tuner. It could be tuned to resonate with one AM station at a time, so I could "tune out" the "noise" of the other stations. They were actually signals, just signals I didn't want right then.

The global greenhouse has warmed about 0.5°C (0.9°F) in a century, and perhaps 1°C (1.8°F) since 1750. Some of that may be warming since the Little Ice Age, which some consider a regional phenomenon, not a global one. But the current "ForecastFox for Mozilla" forecast for the next 24 hours indicates we'll have a 20°F swing tomorrow, from 75 in midafternoon to 55 overnight. You have to average out a lot of daily temperatures to see a change of a degree over 250 years. When you want weather, that is your signal. When you want climate, weather is noise, and lots of it.

The science of greenhouse warming is partly very well known, and partly not so well known. I learned to replicate the Arrhenius calculations from 150 years ago, when I was a pre-teen. Actual warming since his day has been about twice what he expected, because there seem to be amplifying factors. These are very poorly known. Does more cloud cover cool the atmosphere by reflecting more sunlight, or warm it by acting as a further thermal blanket? Or does it do one thing at a certain latitude and another elsewhere? If we do have a further warming by 2 to 4°C, will it shift the Hadley Cell north, or south, or not at all? (The northern edge of the Hadley Cell is a range of latitudes characterized by dry, descending air that form all the world's great deserts.) I've thought of buying land in central Canada, that is currently too cold to farm. Perhaps in 20 years it will be arable…unless the Hadley Cell shifts north and dries out Canada. Then maybe the Mojave would become a tropical paradise!

Y'know how to make a complex system into a positively unsolvable mess? Make it political. Both sides of the Climate debate are so politicized that they can only talk past each other. The tiniest proposal to set any policy is vigorously fought by every vested interest, even those who might benefit (the devil you know…). Heaven help us if weather forecasting ever gets politicized! It is already true that most forecasters err on the wet side: a 20% chance of rain is reported as a 40% or even 50% chance, because the ones rained on are less likely to complain, and those that aren't will feel they dodged a bullet. What if some "weather outcomes" become more politically correct than others?

By the way, I take issue with Silver's definition of statistical rain forecasts. He writes that if 40% of the computer models indicate rain in Chicago, and the rest don't, it is reported as a 40% chance of rain. Sounds logical, but it is quite different than that. The "chance of rain" has different meanings in spring (plus summer) and autumn (plus winter). Spring and summer squall lines pass through areas that are well predicted by most GCM programs. But a squall line is not a solid front of rain. It is a line of thunderstorms. A light squall line may have storms half a mile wide, spaced 2-3 miles apart, giving 20% of the area a 100% chance of rain. The forecasters just don't know which 20%, so the whole area is given a 20% chance of rain. A heavy squall line will have larger storms with closer spacing, and maxes out at about 80% coverage (though this will probably be reported as "near certain"). Fall and early winter storms tend to be solid and widespread, but subject to ripples several miles wide in the upper atmosphere. As a system rides up a ripple, it drops rain along a solid band dozens of hundreds of miles long but only about a mile wide or so. As it rides down, it dries out. The height of the ripples determines whether the overall chance of rain is 30% or 70% or somewhere between. The ripples drift along as system after system rides through, so it is very hard to tell exactly where the rain will fall. Timing is everything. Then, a lower-level storm that just dumps (ignoring the ripples) leads to those 100% forecasts, which are generally accurate.

In most arenas, Silver advocates using Bayesian analysis rather than "frequentist" simulations or estimations. These allow individualized forecasts for particular cases. An example is the probability of breast cancer in a woman in her 40s, who has just had the unwelcome news that a mammogram is "positive". The factors of a Bayesian calculation are:
  • x - Prior Estimate: the chance that a proposition is true.
  • y - Type 1 analysis: the chance that new data which indicates "Yes" is actually correct.
  • z - Type 2 analysis: the chance that the proposition is not true, in spite of the new data.
The data are usually noted as percents. The formula for a new estimate (a new x) is xy/(xy+z(1-x)). For this example, we find:


In this case, the woman may wish for a needle biopsy, but a bit of blood chemistry may be in order first. Enzymes in the blood can indicate whether a new cancer is likely to be slow growing, or faster. Is it slower (the most likely case)? She can wait a year for another mammogram. If the next mammogram is positive, re-do the analysis, replacing the 1.4% with 9.6%. Now the "new x" is just over 44%, and at the very least a biopsy is indicated. Most other forecasting methods don't use multi-step refinement. And by the way, if the next mammogram is negative (and no palpation can detect a lump, or any growth in an earlier lump), running the analysis with 9.6%, 10% and 75%, in that order, reverts to 1.4% as the "new x".

Those who follow this blog may wonder why it took me 3 weeks to read such a fascinating book. The writing is good and the examples are interesting, so that didn't slow me down. We have a lot going on, however, so I have had much less time for reading than usual. Retirement has been good to me so far, but I have to be careful not to take on too many projects at once. I completed a Real Estate course and passed the test in July. However, I will probably not seek a license or become a Realtor®, because there are simply too many other things I'd prefer to do. The change of style and reduced frequency with which I post is a similar effect. I used to post almost every lunch hour, doing research in off hours. I think I am working longer days than when I worked! Better busy than bored. Since retiring in February, I have put 24 items in my "job jar" file. Half of them, mostly the bigger ones, have been completed. One major item is awaiting an event that is at least a year in the future, but the preparations are nearly all completed. Others are smaller so I can take an odd half day to perform one. All things in their own time. In the meantime, I read when I can, and report what I read.

No comments: