A Butterfly's View of Probability
Thanks to Alex Cai and Haneul Shin for discussing these ideas with me. Thanks to Eric Neyman for feedback.

What do we really mean when we use the word “probability”? Assuming that the universe obeys deterministic physical laws, an event must occur with either probability 0 or 1. The future positions of every atom in the universe are completely determined by their initial conditions. So what do we mean when we make statements like “Trump has a 25% chance of winning in 2024”? It will either happen, or it won’t. From an objective, omniscient standpoint, there’s no room for doubt.
One response is to reject determinism. Maybe in Newton’s day we believed the universe was deterministic, but now we know about wave functions and Heisenberg uncertainty and all of that stuff. If we accept that there is true randomness occurring on the quantum level, then the outcome of the next election isn’t predetermined – it will depend on all of the quantum interactions that occur between now and 2024. With this view, it makes complete sense to assign fractional probabilities.
But quantum randomness is a highly non-obvious property of physics… Is there a way to make sense of probability without relying on it? In this post, I hope to outline a new way of defining the probability of a future event in a deterministic system. In contrast with the Bayesian view – in which uncertainty about an event comes from incomplete information – and the frequentist view – which relies on an experiment being repeatable – this “Butterfly’s View of probability” draws its randomness from chaos theory.
Bayesianism and Frequentism
Let’s go over these two existing views, which are by far the most commonly accepted interpretations of probability.
The Bayesian view of probability is that randomness arises from incomplete information about the world. It is impossible to be aware of the current position of every atom in the universe at once. A Bayesian reasoner embraces this uncertainty by considering probability distributions over all possible universes consistent with his observations. At the core of this philosophy is Bayesian updating: starting with a prior probability distribution, upon seeing new evidence about the world a Bayesian reasoner will update this distribution according to Bayes’ rule. The probability of an event is the proportion of universes in this distribution in which the event occurs.
Bayesianism has a lot going for it. It works great in theory, serving as the fundamental mathematical law by which a rational agent’s knowledge about the world must interact with itself. It also works great in practice, making accurate predictions as the cornerstone of modern statistics.
But there is one thing that Bayesianism lacks: objectivity. This is not to say that it is unmathematical, but rather that Bayesian probability is inherently subjective. The probability of an event is only defined relative to an agent. Alice and Bob, while both being perfect Bayesian reasoners, can assign different probabilities to Trump winning simply because they have different observations or priors. Because of this, Bayesian probability is often thought of as a “degree of personal belief” rather than an objective probability. In the real world, this is more of a feature than a bug – nobody has perfect information, and if we did then we wouldn’t care about probability in the first place. But in this post, our goal is to find an interpretation of probability that still makes sense from an objective, omniscient standpoint. The omniscient Bayesian would have a posterior probability distribution that places 100% credence on a single possible universe, eliminating all fractional probabilities – so Bayesianism falls short of this goal.
The alternative to Bayesianism is called frequentism. A frequentist defines the probability of an event to be the limit of its relative frequency as you repeat it more and more times. A coin has a 50% chance of landing heads because if you flip it 100 times, close to 50 of the flips will be heads. In contrast with Bayesianism, the frequentist view is perfectly objective: the limit of a ratio will be the same no matter who observes it.
But the problem with frequentism is that it only makes sense when you’re talking about a well-defined repeatable random experiment like a coin flip. How would a frequentist define the probability that Trump wins the election? It’s not like we can just run the election 100 times in a row and take the average – by definition, the 2024 election is a non-repeatable historical event. We could consider simulating the same election over and over, but what initial conditions do we use for each trial? Frequentism doesn’t give us a recipe for how to define these simulations. This post will be my attempt to generalize the frequentist view by providing this recipe.
Bayesianism is rooted in uncertainty, so it is inherently subjective. Frequentism only applies to black-boxed repeatable experiments, so it struggles at describing events in the physical universe. Now I present a third view of probability that solves these two problems. I call this the Butterfly’s View.
The Butterfly Effect
On a perfect pool table, it is only possible to predict nine collisions before you have to take into account the gravitational force of a person standing in the room.1 Even an imperceptible change in the initial conditions becomes noticeable after just a few seconds. This is known as the “Butterfly Effect” – the idea that if you make a tiny change to a complex deterministic system, that change will propagate and compound at an exponential rate. This makes it extremely hard (though not impossible in theory) to predict the state of a chaotic physical system, even over short time periods.

Imperceptible changes to the initial conditions of a double pendulum quickly become noticeable, stolen straight from Wikipedia.
I believe that almost every aspect of our physical universe has the same chaotic properties as the pool table. The Brownian motion of air molecules, the complex firing patterns of neurons in our brain, and the turbulent flow of ocean currents are all extremely sensitive to changes. One tiny nudge could completely change the course of history.
How might this happen? Consider the consequences of adding a single electron at the edge of the observable universe. The gravitational pull of this electron is enough to disrupt the trajectories of all air molecules on Earth after only 50 collisions… a fraction of a microsecond. This changes the atmospheric noise that random.org uses to seed its random number generators, which changes the order in which my Spotify playlist gets shuffled2, which subtly affects my current mental state, which causes me to write this sentence with a different word order, and so on. In a matter of minutes, human events are unfolding in a measurably different fashion than they would have had that electron never existed.
The Formalization
We can use the random-seeming chaos generated by the Butterfly Effect to define a new notion of probability in a deterministic system. Informally, the “Butterfly Probability” of an event is the percentage of small perturbations to the current universe that result in that event occurring. To be more precise, I’ve come up with the following formalization.
Let
Since we’re assuming a deterministic version of physics, we have some transition function
Now we define a distance metric on
In one operation, you can pay
to translate one atom/particle by meters or to change the velocity of an atom/particle by m/s. You can also pay to add or delete an atom/particle of mass . Then is defined to be the minimum cost of any series of operations that transforms into .
The exact details of
Now we’re ready to define probability! Say we have some predicate
Notice the subscript:
But we’re not done yet. Our function is still parameterized by
Consider the following graph which shows how

Let’s focus on the left part of this graph first. As we decrease
Assuming that the Butterfly Effect acts exponentially, this would require
But now look at the behavior of
Why do we need

For a while
So how do we formally define the Butterfly Probability of
Def. The Butterfly’s Probability of
occurring within time is the value that converges to as , before it collapses to 0 or 1.
I admit that the final caveat in the definition is imprecise, but I am at a loss for how to mathematically formulate this notion of “double convergence”. However, I conjecture that in almost every example of a real-world probability, it will be abundantly clear what this almost-asymptote value should be by simply looking at the graph of
The Intuition
Essentially, I claim that the graph of

The behavior in each of these three regions is dominated by a different phenomenon. In the red region, the perturbations are too small for the Butterfly Effect to produce a noticeable difference in the universe over the given timescale. In the green region, the
Our definition of Butterfly Probability relies on the existence of a clear “phase shift” between each of the regions. If the cutoffs between the regions were less stark, then it might be ambiguous which value in the blue region we should count as the true probability. So why do I think that there is an obvious blue region?
My intuition for this is that the Butterfly Effect is so chaotic and sensitive that, once
The best way to think about this is by imagining the density of a gas. Fix some point
My intuition is that the distribution of black and white universes in
Conclusion
To say that properly calculating the Butterfly’s Probability of an event is computationally intractable would be the understatement of the century. Calculating even a single probability would require knowing the exact positions of all matter in the universe and the ability to simulate it with near-perfect accuracy. In fact, if the computer which you are using to simulate the universe is itself part of the universe, this leads to paradoxes. Because of this, the value of the Butterfly View formalism done in this blog post is mostly theoretical. It gives us a way to understand what probability would mean from the perspective of God (someone who is completely omniscient and computationally unbounded) without actually being able to carry it out in practice.
However, any time a political scientist or meteorologist builds a big model to predict the future, they are in some sense running an approximation algorithm of a Butterfly’s Probability. In doing so, they make the implicit assumption that the blue region in the graph is large enough that lots of irrelevant information can be left out of the model without much effect on the local density of
I will conclude with what I believe to be the strengths and weaknesses of the Butterfly View as a theoretical framework for understanding probability.
Strengths:
- It combines features of the Bayesian and frequentist views: we can talk about the probabilities of one-off events like the 2024 election without the need for an epistemic reference frame or a prior distribution.
- It can be applied to any deterministic system without the need for built-in randomness, as long as the system is chaotic enough to exhibit the Butterfly Effect.
- It accurately captures what people mean with the colloquial use of the word “random.” When someone says “the stock market is hard to predict… it’s so random,” they probably don’t mean that market volatility is caused by quantum randomness. Instead, it seems to me that they’re trying to describe how there are too many sensitive moving parts for its behavior to be predicted with confidence.
- It has a cool name.
Weaknesses:
- As mentioned before, it is computationally intractable.
- It cannot be adapted to deal with logical uncertainty (i.e. “What’s the probability that the millionth digit of
is a ?”). All of the “randomness” in the Butterfly View stems from physical uncertainty. But the decimal expansion of will always be the same no matter how atoms are perturbed, so the Butterfly’s Probability of a mathematical statement is always either or . - It is time-dependent. Over short timeframes (“What’s the probability that this coin flips heads in the next second?”), the Butterfly Effect might not have enough time to make much of a difference, so the probability will either be
or . Also, it is impossible to talk about probabilities of past events (unless you plug in a snapshot of the universe from before the event occurred).
As you can see, the Butterfly’s View of probability has one more strength than it has weaknesses, making it a good theory!
Thank you for reading all of these words :).

-
This observation is credited to the physicist Michael Berry (1978) and the calculations are explained in this paper. The idea is that, given some tiny error in the angle of a trajectory
, the next collision will have an angle error of about , then the next will have an error of , and so on (where is the distance traveled between collisions and is the radius of each ball). So even though might be vanishingly small, the error becomes quite large after only a few collisions. ↩ -
OK fine, Spotify doesn’t use random.org to shuffle its playlists, but I’m just trying to give an illustrative example. ↩
-
If you prefer an interpretation of physics in which time is discretized (as it is in a cellular automata), you can instead use a single-step transition function
. Then you can think of as , where is iterated times. ↩ -
We technically haven’t defined a preferred probability distribution on
for which we can invoke the phrase “uniformly at random”. I suppose one way you could do this would be to think of as (three spatial components and velocity components for each particle), where is the number of particles in the universe, and weight your probability distribution by -dimensional volume. Or you could think of as being discretized by choosing some super small “precision level” at which to encode positions and velocities. But at this point we’re just getting silly – it really doesn’t matter. ↩ -
Don’t let it bother you that this definition involves a
. We’re not being circular because we’re only constructing this definition for physical-world probabilities – we’re allowed to assume that the mathematical theory of probability rests on solid ground. ↩