Wisdom of crowds is not a claim that groups are smart. It is a claim about how a system can produce accurate answers from many imperfect inputs. Start with a real but unknown truth, like the number of items in a jar, the probability of a recession, or the value of a house. Each person who estimates that truth has limited information and an imperfect way of reasoning, so each estimate equals the truth plus an error. Error is expected. The question is whether those errors cancel out or pile up.
Crowds become accurate when people make mistakes in different ways. What matters is not simply that people disagree, but that their errors point in different directions and come from different blind spots. This is what “diversity” really means in this context: variation in what people get wrong. Two upstream conditions create that variation. First, independence of judgment: people form their estimates without copying others. If people influence each other early, they start making the same mistake, and averaging no longer helps. Second, decentralized information: different people see different facts because they live in different places, have different experiences, or face different constraints. That naturally produces different errors. Independence keeps errors from becoming the same; decentralization supplies different inputs so errors differ in the first place.
If those conditions hold, aggregation can work. Different methods combine estimates in different ways. A simple average treats all estimates equally and cancels errors that balance around the truth. A median resists extreme outliers by finding the middle estimate. Weighted averages can give more influence to people who have been accurate before or who express higher confidence. Prediction markets use prices to aggregate information continuously as people buy and sell based on what they know. The right method depends on the problem, but all work by the same principle: because the crowd error is the average of individual errors, and because the errors differ, positive and negative errors tend to offset. Random noise shrinks as the number of estimates grows, though the gains diminish as crowds get very large. Even small crowds can outperform the best individual when conditions are reasonable. The shared signal across many imperfect views remains. The crowd result is not a compromise or a consensus. It is the center of many independent attempts to measure the same thing.
Incentives matter in two separate ways. First, they affect whether people report what they really believe. Even if someone has useful information, they may hide it or shade it if the reward is social approval, group membership, or appearing confident. That makes errors line up and ruins cancellation. Incentives tied to accuracy reduce this problem by rewarding honest estimates, including unpopular ones, and by discouraging careless guessing. Second, incentives affect whether anyone does the work needed to improve what can be known. Many truths are not reachable without effort: collecting data, testing ideas, building tools, and checking results. If there is little reward for doing that work, people stay inside the current understanding and the crowd can only average guesses. When incentives reward discovery, they push people to produce new information and better methods. Over time that turns previously unknown facts into observable facts, which then spread unevenly through the population and give the crowd something real to aggregate.
This also explains why crowds in the distant past could be wrong about scientific questions. The problem was not that they lacked aggregation. The problem was that the needed observations and concepts did not exist. When a whole population shares the same deep blind spot, errors are not balanced around the truth. They are biased in the same direction. Averaging then produces a confident wrong answer. Crowds can refine and combine information that exists in fragments, but they cannot produce correct answers when the relevant evidence or tools are missing.
Recognizing when conditions are met is harder than describing them. Independence violations are often invisible: people may think they are reasoning alone while actually echoing the same sources, reacting to the same frames, or deferring to the same authorities. Shared blind spots are even harder to detect from inside, because they feel like common sense rather than bias. This is why crowds can fail quietly, producing narrow confidence intervals around wrong answers. External checks help: comparing crowd estimates to ground truth when available, testing whether new information changes the distribution of estimates in expected ways, and watching for signs that estimates are more correlated than the underlying information justifies.
So the useful domain of wisdom of crowds is broad but not unlimited. It works best for estimation and forecasting problems where many people each have partial information and can make independent judgments, and where there is a clear way to combine their estimates. It works poorly when people are copying one another, when everyone is relying on the same source, when incentives reward conformity, or when the truth is not yet reachable because the evidence or methods do not exist. In short, crowds become accurate when independence and decentralized information produce different errors, aggregation cancels those errors, and incentives both protect honest reporting and encourage the work that expands what can be known.