Why a Rare Event Can Feel Ordinary, and a Likely One Can Stun You
Surprise is not how rare an event is but how far it moves the model in your head, which is why a one-in-a-million event can bore you and a likely one can floor you.
Surprise is not the improbability of an event but the gap between your predictive model and what occurs. The load-bearing case is a violated prediction: an outcome the model could represent and assigned low probability, which a Bayesian update then absorbs. This is why Bayesian surprise (how much an observation shifts your beliefs) predicts human attention and learning better than Shannon rarity (how improbable the event was on its own). An astronomically rare event that moves no belief passes unnoticed, while a likely one that overturns a strong belief astonishes.
Roll a fair die and get a four. The odds were one in six, and you feel nothing. Now watch a coin you have flipped a thousand times come up the way it always has, except this time you had quietly bet your house it would land the other way. The likely outcome stuns you. The unlikely one bores you. So surprise is not simply rarity. It is something about the relationship between what happened and the model you were carrying.
Surprise lives in the model, not in the event
The cleanest way to see this comes from work on what grabs human attention. Laurent Itti and Pierre Baldi (2009) drew a sharp line between two things people often blur. One is Shannon's notion: how improbable an outcome was, on its own. The other they called Bayesian surprise: how much an observation shifts your beliefs, measured as the distance between what you believed before and what you believe after. Their result is the useful part. What pulls human eyes toward a spot on a screen is the second quantity, not the first. An event can be astronomically rare and move your beliefs not at all, and you will skip right over it. An event can be ordinary in raw probability and overturn your picture, and you will stare.
This fits the larger account of the brain as a prediction engine. On that view perception is a running match between incoming signal and top-down prediction, and the brain works to minimize the mismatch, the prediction error (Clark, 2013). Surprise, in the precise sense, is that mismatch: the gap between the model's expectation and what arrived. There is a sibling formalization worth keeping separate from the belief-shift measure above: the improbability of the outcome under your model, the negative log of the probability your model assigned, the quantity Karl Friston's framework places at the center of self-organizing systems (Friston, 2010). These are two non-identical things. One measures how far an observation moves your beliefs; the other measures how unlikely the observation was under the model you held. The attention result is exactly what adjudicates between them, and it favors the belief-shift, which is why that is the measure we lean on. The load-bearing words in both are under your model. Surprise is always relative to the predictor.
What most surprise really is: a broken prediction
That gives the first and primary category. Most surprise is a violated prediction about a regularity. Your model expected the pattern to continue, and it broke. The die you thought was loaded comes up fair. The friend you thought reliable forgets. The market you modeled as calm jumps. In every case the event lives inside the space of things your model could have predicted, you had assigned it a low probability, and reality handed you the low-probability branch. This is surprise that a Bayesian update can absorb: you were wrong about the odds, you adjust the odds, and next time the same event surprises you less. Everything measurable about surprise, the attention it pulls, the learning it drives, the speed at which it fades, lives here.
The second kind, sketched and held loosely
There seems to be a second kind, and I want to mark it as the more speculative half of this picture rather than lean the argument on it. Sometimes what astonishes is not a low-probability outcome inside your model, but an encounter with something your model had no frame for at all. The difference is between "I gave this a one-in-a-thousand chance and it happened" and "I had no scale on which to place this." The first is a wrong number. The second is the absence of a number.
I will not push this past what it can carry. The honest claim is only that the two feel structurally different, and that the second resists the tidy treatment the first gets: you cannot absorb it by nudging a probability, because there was no probability to nudge. Whether that difference is real or is just the first kind in unfamiliar clothes is an open problem, and everything below depends only on the first kind.
Is surprise about probability or about your expectation? A bet you can settle
The checkable claim sits in the first category, and it is already half-tested. If surprise is a model-relative gap, then a measure of how much an observation shifts a model (Bayesian surprise) should predict attention and learning better than a measure of bare rarity (Shannon surprise). Itti and Baldi found exactly this for where people look. The extension has to target things you can measure separately from the belief shift itself, or the test is circular: across learning tasks, the size of the belief shift, not the raw improbability, should predict how long the memory lasts, whether the lesson transfers to a related task, and how much the learner adjusts on the next trial. If raw rarity predicts these just as well once you control for the belief shift, the model-relative framing adds nothing and should be dropped. If it does not, surprise is a property of the gap, and the event alone never tells you how surprising it is.
Sources
- L. Itti & P. Baldi, "Bayesian surprise attracts human attention," Vision Research 49(10), 1295–1306 (2009). doi:10.1016/j.visres.2008.09.007.
- A. Clark, "Whatever next? Predictive brains, situated agents, and the future of cognitive science," Behavioral and Brain Sciences 36(3), 181–204 (2013). doi:10.1017/S0140525X12000477.
- K. Friston, "The free-energy principle: a unified brain theory?" Nature Reviews Neuroscience 11, 127–138 (2010): surprise as negative log model evidence. doi:10.1038/nrn2787.
Comments