Thursday, January 3, 2013

The big D

This article focuses on an interesting and much overlooked issue that has been occasionally discussed as a concern at my lab's meetings. It doesn't specifically touch on our main concern in terms of research (that large amounts of data can contain patterns that are merely artifacts of randomness). However it nicely illustrates the point that the layman (and the undiscriminating scientist) forget: that patterns are not a conclusion, and numbers are not a guaranteed forecast.

Are We All Being Fooled by Big Data?

When, on a summer Sunday morning in 1987, three hundred thousand people crammed onto the central span of San Francisco’s Golden Gate Bridge, they came perilously close to participating in the largest accident in American history. The bridge's engineers had made copious calculations and had designed it to sway nearly 28 feet and shoulder the burden of hundreds of vehicles. But nobody had ever predicted that a gigantic crowd of pedestrians, attracted by the fiftieth anniversary of its opening, would be stuck between its towering pylons unable to move in any direction. As a result, the bridge flattened out and came within whiskers of straining every last fiber of its vermilion superstructure.
The consequences of faulty data, wonky forecasts, ill-conceived opinions, loose predictions, incorrect assumptions and, in the case of the Golden Gate Bridge, an improbable event form the backbone of Nate Silver’s absorbing new book, The Signal and the Noise: Why Most Predictions Fail but Some Don’tThis book, written by the voice behind the popular election forecasting blog, FiveThirtyEight, now licensed by the New York Times, is a reminder that while data doesn’t lie, it does allow people to deceive themselves and others. In some cases it's a question of the bigger the data, the grander the deception.
These days our entire lives revolve around predictions. Government departments project the cost of health exchanges, the rate of economic growth, next year’s crop yields, the future birth rate and the arms buildup of unfriendly countries. Websites and retailers anticipate what we want to find and buy; oil companies gauge the best sites for drilling; pharmaceutical companies assess the probable efficacy of molecules on a disease; while, in the background, the bobble-heads on television incessantly spew out largely irrelevant and inaccurate forecasts. In the meantime, we busy ourselves with personal projections. How long will our commute take? When will the turkey be golden? How much will the price of a stock rise? What will the future value be of a law degree?
Some of these forecasts are surprisingly accurate while others are shockingly dismal. Silver, who has become the Woody Allen of statisticians, explains the reasons. Like many others, the 34-year-old Silver became fascinated with numbers because of a boyhood devotion to baseball. Unlike his peers, Silver – after a brief and frustrating spell as a consultant – instinctively returned to the challenges of numbers. He took up internet poker (only to eventually discover that the odds were not in his favor) and, also started to unravel the riddles presented by data.
There are events that – at least on the surface – defy forecast: things that are so outlandish or improbable that, for most people at one time or another, they seem inconceivable. Think of Pearl Harbor, 9/11, Fukushima, a black President of the United States or Apple as the world’s most valuable company. Yet all, to varying extents, were possible to predict if people had been able to separate the important from the trivial (a.k.a. the signal from the noise) and make the giant leap of faith which converts the improbable into the possible. While Silver provides a supple assessment of the reasons we struggle to comprehend these sorts of possibilities, the majority of his book is devoted to an often-hilarious account of how we deal with more mundane challenges.
About ten years ago, Silver developed a system for predicting the performance of batters and hitters for Baseball Prospectus. The exercise helped him develop his approach to predictions. It is no coincidence that Silver fastened on both baseball and politics. In each pursuit there is an enormous trove of accurate, historical information. The baseball fiend can immerse himself in minutiae such as hits, on-base percentages and pitches thrown, while the political junkie can stare at votes recorded, demographic shifts and polling results. Silver gradually discovered that in baseball the data, while essential, could be made richer with the judicious application of human judgment. This must have come as a reassuring endorsement for baseball scouts whose usefulness had been much maligned in the years following the publication of Moneyball, Michael Lewis’ much-read book about the way data had helped Billy Beane transform the Oakland A’s. After all, it is difficult for a machine to measure the determination, pluck, grit (and wandering eye or fondness for drink) of a baseball player.
The same goes for politics, the field in which Silver made his reputation with his accurate predictions about the 2008 races (which he subsequently burnished in 2012). Here he bases many of his predictions on the averages of poll results conducted by others. This, he has discovered, provides more accurate forecasts for election nights than reliance on a single pollster, no matter how sterling the reputation. When Silver does stray from the received wisdom, he does so with caution and says, “The further I move away from consensus, the stronger my evidence has got to be … that I have things right.” This is an observation worth dwelling upon because it helps explain why most people have such trouble making the correct decision about an unconventional selection or the path less trodden. Making a decision frowned upon by a committee or a popular opinion is a lonely place to be.
Accurate information married with human judgment is the best ally for the prognosticator. This explains why some forecasts, such as those for hurricanes, are so good and others, such as economic predictions, are so poor. Thanks to a knowledge of past catastrophes, satellite photography, weather balloons and airplanes that fly into the eye of the storms, the National Hurricane Center can predict the path and severity of hurricanes with remarkable certainty several days in advance of when they collide with land. This information, enhanced by the analysis of scientists, has improved the National Hurricane Center’s forecasting accuracy by 350% in the past 25 years. The fact that 1,833 people died when Hurricane Katrina swamped New Orleans is not because of faulty forecasting, but mainly because the city’s Mayoral office hesitated about ordering a compulsory emergency evacuation until it was too late. According to Silver, weather forecasting for the subsequent two or three days, at least as promulgated by the National Weather Service (before it falls into the buffoonish hands of the local TV weathermen for whom ratings are more important than accuracy), is also something that can be counted on.
Economic forecasting is another matter. Part of the reason that predictions about hurricanes and the weather have improved is that scientists, mathematicians and programmers can build computer models from accurate molecular data of cloud formations. The same is not true for the economy where attempts to capture every calorie of economic endeavor are much harder. Even the U.S. government – irrespective of whether a Democrat or Republican is at the helm – has proved woefully inept at forecasting overall GDP growth let alone more refined measures. It’s not uncommon for economic forecasters to fail to predict recessions even after they are already underway. It’s a wonder that any bank or company bothers to keep an economist on the payroll. They all might be better off employing the descendants of Carnac the Magnificent, the soothsayer from the East once played by Johnny Carson.
While economists have plenty of excuses, the same does not go for the rating agencies that, prior to the housing collapse, so conspicuously labeled the thousands of mortgages they bundled together as relatively riskless. Even if you are prepared to accept that officials at S&P, Moody’s and Fitch were merely guilty of a failure of judgment – as opposed to criminal collusion – they made the colossal mistake of not recognizing the consequences of uncertainty (a risk that is hard to measure): the close correlation between all these mortgages. They did not understand that they had designed a monstrous, nationwide pileup of concrete, glass and wood. It’s no coincidence that these same rating agencies are today all involved in designing a future economic calamity: the implosion of municipal, state and corporate pension obligations. In this case they are even more culpable because they are willfully ignoring copious amounts of stock market data which, if heeded, would instantly catapult these pension systems into default.
If an economist might deserve some pity, it’s a teaspoonful compared to what should be given to those charged with making an accurate prediction of the timing and strength of an earthquake. These hapless devils don’t have accurate pictures of geological formations dozens of miles below the earth’s crust or reams of data supplied by probes latched to different striations. Nonetheless, in our data-drenched age, the geologist is still somehow expected to provide certainty about a cataclysmic event that may last a matter of seconds. That’s especially true in Italy, where, in the wake of the 2009 quake that killed over 300 people in the central Italian town of L’Aquila, six scientists and a government official were found guilty of manslaughter and sentenced to six years in jail for not protecting their neighborhood. The sentencing magistrates, like the Japanese in the ninth century, must just believe that earthquakes can be accurately predicted from the behavior of catfish.

No comments:

Post a Comment