by Judith Curry
Update: see new cartoon by Josh at the bottom of the post
the massive amounts of data necessary to deal with complex phenomena exceed any single brain’s ability to grasp, yet networked science rolls on.
I haven’t read the book, but David Weinberger gives a perspective on the book in an essay he wrote for the Atlantic entitled “To Know But Not Understand: David Weinberger on Big Data.” Some excerpts:
In 1963, Bernard K. Forscher of the Mayo Clinic complained in a now famous letter printed in the prestigious journal Science that scientists were generating too many facts. Titled Chaos in the Brickyard, the letter warned that the new generation of scientists was too busy churning out bricks — facts — without regard to how they go together. Brickmaking, Forscher feared, had become an end in itself. “And so it happened that the land became flooded with bricks. … It became difficult to find the proper bricks for a task because one had to hunt among so many. … It became difficult to complete a useful edifice because, as soon as the foundations were discernible, they were buried under an avalanche of random bricks.”
There are three basic reasons scientific data has increased to the point that the brickyard metaphor now looks 19th century. First, the economics of deletion have changed. We used to throw out most of the photos we took with our pathetic old film cameras because, even though they were far more expensive to create than today’s digital images, photo albums were expensive, took up space, and required us to invest considerable time in deciding which photos would make the cut. Now, it’s often less expensive to store them all on our hard drive (or at some website) than it is to weed through them.
Second, the economics of sharing have changed. The Library of Congress has tens of millions of items in storage because physics makes it hard to display and preserve, much less to share, physical objects. The Internet makes it far easier to share what’s in our digital basements. When the datasets are so large that they become unwieldy even for the Internet, innovators are spurred to invent new forms of sharing. The ability to access and share over the Net further enhances the new economics of deletion; data that otherwise would not have been worth storing have new potential value because people can find and share them.
Third, computers have become exponentially smarter. John Wilbanks, vice president for Science at Creative Commons (formerly called Science Commons), notes that “[i]t used to take a year to map a gene. Now you can do thirty thousand on your desktop computer in a day.
The brickyard has grown to galactic size, but the news gets even worse for Dr. Forscher. It’s not simply that there are too many brickfacts and not enough edifice-theories. Rather, the creation of data galaxies has led us to science that sometimes is too rich and complex for reduction into theories. As science has gotten too big to know, we’ve adopted different ideas about what it means to know at all.
The problem — or at least the change — is that we humans cannot understand systems even as complex as that of a simple cell. It’s not that were awaiting some elegant theory that will snap all the details into place. The theory is well established already: Cellular systems consist of a set of detailed interactions that can be thought of as signals and responses. But those interactions surpass in quantity and complexity the human brains ability to comprehend them. The science of such systems requires computers to store all the details and to see how they interact. Systems biologists build computer models that replicate in software what happens when the millions of pieces interact. It’s a bit like predicting the weather, but with far more dependency on particular events and fewer general principles.
Models this complex — whether of cellular biology, the weather, the economy, even highway traffic — often fail us, because the world is more complex than our models can capture. But sometimes they can predict accurately how the system will behave. At their most complex these are sciences of emergence and complexity, studying properties of systems that cannot be seen by looking only at the parts, and cannot be well predicted except by looking at what happens.
Aiming at universals is a simplifying tactic within our broader traditional strategy for dealing with a world that is too big to know by reducing knowledge to what our brains and our technology enable us to deal with.
With the new database-based science, there is often no moment when the complex becomes simple enough for us to understand it. The model does not reduce to an equation that lets us then throw away the model. You have to run the simulation to see what emerges. We can model these and perhaps know how they work without understanding them. They are so complex that only our artificial brains can manage the amount of data and the number of interactions involved.
No one says that having an answer that humans cannot understand is very satisfying. The world’s complexity may simply outrun our brains capacity to understand it.
Model-based knowing has many well-documented difficulties, especially when we are attempting to predict real-world events subject to the vagaries of history; a Cretaceous-era model of that eras ecology would not have included the arrival of a giant asteroid in its data, and no one expects a black swan. Nevertheless, models can have the predictive power demanded of scientific hypotheses. We have a new form of knowing.
This new knowledge requires not just giant computers but a network to connect them, to feed them, and to make their work accessible. It exists at the network level, not in the heads of individual human beings.
JC comment: I’ve mentioned some of these general ideas in previous posts in the context of the complexity of climate science, but I think this article explains the challenge very well. The climate community has responded to this challenge in two ways: building ever complex models, and trying to reduce the system to something that an individual can understand (simple energy balance models and feedback analysis). Expending more intellectual effort on the epistemology of too big to know, where much of our knowledge resides in complex computer models, is needed IMO to better assess our knowledge level of the climate system and confidence in conclusions drawn from enormous data sets and complex system models.