The multiple –comparisons mess crops up when a researcher interrogates a large set of possible patterns. Take for example an arbitrary trial in which vitamins are handed to some primary schoolchildren and placebos are given to others. Do the vitamins work? That is all contingent on what we intend by “work“. The researchers take into account the children’s height, weight, preponderance of tooth caries, classroom conduct and test marks. Then there are aggregates to check: do the vitamins have effect on the disadvantaged kids, the well off kids, the male, the female.
Test enough different correlations (vs causations) and accidental results will silence the real discoveries.
There are multiple approaches to deal with this but the problem gets more complicated in large data sets, because there are vastly more available analogies than there are data points to compare. Without careful consideration, the scale of natural patterns to bogus patterns instantly turns to zero.
Worse still, one of the corrective measures for the multiple –comparisons dilemma is transparency, enabling other analysts to understand how many hypotheses were evaluated and how many conflicting outcomes are locked in desk drawers because they just didn’t seem relevant enough to disclose. Cloud companies aren’t yet ready to share their data with you any time soon.