” I judge wisely, as if nothing ever surprise me.” 4th Chamber, GZA
Errors, errors everywhere
The previous two posts on surprise are geared more towards a lay audience. The following essay digs further into the concept and its implications in AI and human cognition. The question we’ll consider here is what function does surprise serve? The feelings and responses humans are endowed with have purpose. We don’t know why we laugh for instance, or dream, but their study is guided by the knowledge that these phenomena have a function. The same must be the case for surprise. It’s relationship to inference and learning mean that deciphering its function is of some importance to AI research.
Alarm, sympathetic nervous system, fight or flight
In humans and other living organisms, surprise clearly serves as an alarm. Awareness of the unexpected is a threat, even if the latter turns out to be glad tidings. Therefore, the surprise response engages mechanisms in the body meant to confront a threat: increased heart rate and respiration, slowed digestion, dilated pupils etc. As an alarm bell, surprise can be a lifesaver, prompting us to run or strike. This function alone would justify the existence of surprise. And it is easily formalized by Shannon Surprise. But is there more to the usefulness of surprising events?
The computer scientist’s view
Artificial intelligence efforts focus on learning algorithms so naturally, questions on prediction, inference and uncertainty arise in the quest for ideal simulation. What happens when a simulated agent encounters the unexpected? Or rather, what should we tell it to do when the unexpected happens? Well, we can tell it to make some noise, a warning. In some contexts, we should tell it to flee. But some researchers have reckoned it should really just learn from this prediction error1, that surprise is only that which can alter, refine, update a previously held model of the environment. And if that sounds familiar, it’s because such a surprise has been formalized as a Bayesian update, an information gain captured by the Kullback-Leibler divergence. Do human brains make use of surprise this way?
Surprise in the human brain – two jobs
Well, it’s been found that they do. A learning task performed in an MRI found a response correlating to Shannon surprise – the alerting surprise – and another as an information gain, the “Bayesian” surprise2. Beautiful work. But the problem with Bayesian surprise is that it relies exclusively on a probabilistic framework, with the (likely correct) assumption that Learning is Forever or at least lifelong. But when we are surprised at an outcome that has no possibility of being altered, that is, when a surprising event comes out of a discrete action, when the model space is reduced to one round, with no clear expectation of the future, do we still spend brain money on gaining information from it?
Certainty in humans
Is there such a thing as certainty? According to Cromwell’s rule3, probability sets should exclude 0 and 1 and it is true that a real terminal state is hard to honestly imagine (except for death, which is even harder to imagine). Even at the outcome, there could be a mistake4,5. Therefore, Bayesian accounts of information gain should work, because they always call on a posterior probability. But humans operate under the guise of certainty. We don’t doubt the nightly cover of darkness for instance. We postulated that the conscious human therefore constructs certainty, because not doing so is more taxing for the person and her brain. We have enough conscious decisions to make without the burden of checking for highly improbable astronomical events. So what happens to a posterior probability that violates Cromwell’s rule? Well, we fiddled with the math to get some account of information gain in the case of certainty, or a deterministic outcome, allowing us to circumvent the DKL (for those of you who may be more interested, it resembles a state prediction error). And it turns out that the brain will expend neural cash on surprise and this information gain, even when there is nothing explicit to gain from the latter – that is, even when there’s no learning to be done4.
It’s as if the brain hoards information, like money in a bank account, just in case.
When you’re no longer surprised
One may cease being surprised when they’ve seen it all, perhaps with age and wisdom, like GZA6 at the top of this post. With time, there is less to learn, less information to gain. With time, rare events become more frequent, flattening a presumption of a gaussian curve in your priors. So we’d expect less surprise as we get older. But if you don’t see brain signals of surprise, it could also be a sign of neural dysfunction. Alternatively – an exaggerated neural response to surprise – both in the alarm and information update – could also be a harbinger of an underlying neural disorder. Functional brain imaging is still not a standard diagnostic procedure, but it could be in the future, and the profile of someone’s surprise signals can tell us something about an overt behavioral disorder, like addiction, gambling and anxiety.
Implications for AI
And what do the neuro-inspired AI researchers think of this? I don’t know. Parsimony is a good thing in simulation projects, and there is no clear reason why a machine might want to hoard information “just in case”. Similarly, with the alarm function of surprise, it might be worthwhile to focus on threatening surprises only (such as an unexpected obstacle on the road), that is surprise with a negative valence, rather than that which comes from human joy, or amazement, or ecstasy or awe. It’s a fascinating problem to think about, what should we endow our machines with, given our aims for the machine and the trade-off in energy costs.
tl; dr: In addition to figuring out how surprising and how rewarding an event is, the brain will also gain information from said event, even when it doesn’t obviously need to. This neural response to information gain absent a need to learn suggests the brain is curious, an explorer that values information even if there’s no immediate need for it.
- Baldi, P., & Itti, L. (2010). Of bits and wows: A Bayesian theory of surprise with applications to attention. Neural Networks, 23(5), 649-666.
- O’Reilly, J. X., Schüffelgen, U., Cuell, S. F., Behrens, T. E., Mars, R. B., & Rushworth, M. F. (2013). Dissociable effects of surprise and model update in parietal and anterior cingulate cortex. Proceedings of the National Academy of Sciences, 110(38), E3660-E3669.
- https://en.wikipedia.org/wiki/Cromwell%27s_rule
- Loued-Khenissi, L., & Preuschoff, K. (2020). Information theoretic characterization of uncertainty distinguishes surprise from accuracy signals in the brain. Frontiers in Artificial Intelligence, 3(ARTICLE).
- https://www.youtube.com/watch?v=Svd4BnLlN9M
- https://www.youtube.com/watch?v=3Fe3y8RLBgU