Plato’s Cave, Ignorability, and Learning from Shadows

Plato’s Allegory of the Cave is usually interpreted as a story about the distinction between appearance and reality. The prisoners, chained inside the cave, see only shadows cast on a wall. Mistaking these shadows for reality itself, they remain ignorant of the world that lies behind them. The philosopher, according to Plato, is the person who turns away from the shadows, leaves the cave, and discovers the true nature of things.

This interpretation has dominated philosophical discussions for more than two millennia. Yet there is another aspect of the allegory that deserves attention. Before the prisoner can leave the cave, he faces a more immediate problem. He must somehow make sense of the world using only the information available to him. The shadows may be incomplete and distorted, but they are not entirely devoid of structure. They contain patterns, regularities, and relationships. The question is therefore not only whether the shadows differ from reality, but also whether anything meaningful can be learned from them.

This question has a surprisingly modern character. In many areas of contemporary science, direct access to the underlying reality is impossible. Scientists frequently investigate quantities that cannot be observed directly and, in some cases, cannot be observed even in principle. Instead, they rely on indirect observations, proxies, indicators, and traces. The challenge is not to escape from the world of appearances, but rather to determine how much can be learned from appearances alone.

Consider the case of Alzheimer’s disease. Suppose a scientist wishes to estimate a patient’s cognitive ability before the onset of the disease. In many instances, this quantity is no longer observable. The disease has altered the patient’s cognitive state, and the original level of functioning may never have been measured. In statistical language, the quantity of interest has become a missing value.

At first glance, this situation appears hopeless. If the original cognitive ability cannot be observed, how can it possibly be estimated? Yet scientists routinely make such assessments. They do so not by relying on a single observation, but by exploiting a network of correlated information. Educational attainment, occupational history, medical records, and long-term behavioral observations all contribute evidence. Particularly informative are close relatives. Spouses often resemble one another with respect to educational attainment, intellectual interests, socioeconomic status, and many predictors of cognitive performance. Children provide information as well, owing to shared genetic inheritance and family environment. By combining these various sources of information, one can form a reasonable estimate of a quantity that is itself unobservable.

The important point is that the scientist is not attempting to reconstruct reality from a single shadow. Rather, multiple shadows are available, and each contains partial information about the hidden quantity. The hidden reality remains inaccessible, but it leaves traces throughout a larger system of observable variables. Statistical inference exploits these traces.

This observation leads naturally to one of the most important concepts in modern statistical theory: ignorability. The term was introduced in the foundational work of Little (1976) and Rubin (1976) on missing data. Despite its name, ignorability does not mean that missing information is unimportant. Instead, it refers to situations in which valid inference can proceed without explicitly modelling the process that generated the missingness. The missing quantity remains hidden, yet the observed data contain sufficient information to support reliable conclusions.

The significance of this idea became apparent to me many years ago while working with Daniel Gianola on problems arising in quantitative genetics. In a study published in 2003, we examined the conditions under which selection could be regarded as ignorable when inferring unobserved genetic quantities. Our work was based on ideas developed by Sorensen, Fernando, and Gianola (2001), who had proposed a Bayesian framework for inferring the trajectory of genetic variance during a selection process. A central requirement of their approach was that the selection process be ignorable in the technical sense established by Rubin’s missing-data theory.

The problem was fundamentally epistemological. Breeding values cannot be observed directly. Mendelian sampling effects cannot be observed directly. Genetic variances cannot be observed directly. Nevertheless, these quantities influence observable outcomes and leave traces in the data. The question was whether the available information was sufficient to permit reliable inference about these hidden variables. In some situations the answer was affirmative. In others, omitted information, incomplete pedigrees, or inappropriate modelling assumptions led to biased conclusions. The hidden reality could not always be recovered successfully from the available shadows.

Seen from this perspective, Plato’s Cave acquires an unexpected contemporary relevance. Plato assumed that shadows were epistemically deficient because they concealed the true objects behind them. Modern statistics does not dispute the existence of hidden realities, but it approaches the problem differently. The crucial question is not whether the shadows are incomplete. Of course they are. The crucial question is whether the shadows preserve enough information about reality to permit useful inference.

The same issue appears in modern causal inference. The Rubin Causal Model begins with the recognition that every individual possesses multiple potential outcomes, only one of which can ever be observed. If a patient receives Treatment A, we observe the outcome under Treatment A, but we never observe what would have happened under Treatment B. The missing outcome is not merely difficult to obtain; it is fundamentally unavailable. Yet causal inference attempts to learn about these unseen possibilities by studying systematic patterns in observed data. Once again, scientists find themselves reasoning from shadows.

What is remarkable is that modern statistical theory often reaches a conclusion opposite to Plato’s. Plato regarded escape from the cave as the necessary path to knowledge. Statistical science suggests that under certain conditions, complete escape may be unnecessary. Reliable knowledge can sometimes be extracted from indirect observations alone. The hidden reality remains hidden, yet the information preserved in observable variables may be sufficient for meaningful inference.

This does not imply that all shadows are equally informative. The work of Rubin, Little, Sorensen, Gianola, and many others demonstrates that successful inference depends on strong structural conditions. Some forms of missingness are benign; others are not. Some hidden quantities can be estimated with considerable accuracy; others remain stubbornly elusive. In the language of Plato’s allegory, some caves are rich in information, while others are poor.

For this reason, I suspect that the deepest connection between Plato’s Cave and modern statistics is not induction, Bayesianism, or frequentism. It is the problem of learning from incomplete information. Plato recognized that human beings often encounter reality only indirectly. Modern statistical science accepts this predicament and asks a further question: under what conditions do the shadows contain enough information to support reliable knowledge?

The prisoners in Plato’s cave lacked probability theory, missing-data models, causal inference, and computational statistics. We possess these tools. Yet our intellectual situation may not be as different from theirs as we imagine. Like the prisoners, we rarely observe reality directly. Unlike them, we have learned that shadows can be surprisingly informative.

Jorjani, H., & Gianola, D. (2003). A Test of Ignorability of Selection under the Infinitesimal Model: A Preliminary Report. Interbull Bulletin, 31, 37–44. https://journal.interbull.org/index.php/ib/article/view/968

Little, R. J. A. (1976). Inferences about means from incomplete multivariate data. Biometrika 63, 593-604. https://doi.org/10.1093/biomet/63.3.593

Rubin, D. B. (1976). Inference and missing data. Biometrika 63, 581-592. https://doi.org/10.1093/biomet/63.3.581

Sorensen, D., Fernando, R., & Gianola, D. (2001). Inferring the trajectory of genetic variance in the course of artificial selection. Genetical Research, 77(1), 83–94. https://doi.org/10.1017/S0016672300004845

Share this: