hn-classics/_stories/2007/8377680.md

---
created_at: '2014-09-27T17:52:33.000Z'
title: Quantum mechanics as a generalization of probability (2007)
url: http://www.scottaaronson.com/democritus/lec9.html
author: ascertain
points: 209
story_text: ''
comment_text:
num_comments: 79
story_id:
story_title:
story_url:
parent_id:
created_at_i: 1411840353
_tags:
- story
- author_ascertain
- story_8377680
objectID: '8377680'

---
[PHYS771](default.html) Lecture 9: Quantum

[Scott Aaronson](http://www.scottaaronson.com)

There are two ways to teach quantum mechanics. The first way -- which
for most physicists today is still the only way -- follows the
historical order in which the ideas were discovered. So, you start with
classical mechanics and electrodynamics, solving lots of grueling
differential equations at every step. Then you learn about the
"blackbody paradox" and various strange experimental results, and the
great crisis these things posed for physics. Next you learn a
complicated patchwork of ideas that physicists invented between 1900 and
1926 to try to make the crisis go away. Then, if you're lucky, after
years of study you finally get around to the central conceptual point:
that nature is described not by probabilities (which are always
nonnegative), but by numbers called amplitudes that can be positive,
negative, or even complex.

Today, in the quantum information age, the fact that all the physicists
had to learn quantum this way seems increasingly humorous. For example,
I've had experts in quantum field theory -- people who've spent years
calculating path integrals of mind-boggling complexity -- ask me to
explain the Bell inequality to them. That's like Andrew Wiles asking me
to explain the Pythagorean Theorem.

As a direct result of this "QWERTY" approach to explaining quantum
mechanics - which you can see reflected in almost every popular book and
article, down to the present -- the subject acquired an undeserved
reputation for being hard. Educated people memorized the slogans --
"light is both a wave and a particle," "the cat is neither dead nor
alive until you look," "you can ask about the position or the momentum,
but not both," "one particle instantly learns the spin of the other
through spooky action-at-a-distance," etc. -- and also learned that they
shouldn't even try to understand such things without years of
painstaking work.

The second way to teach quantum mechanics leaves a blow-by-blow account
of its discovery to the historians, and instead starts directly from the
conceptual core -- namely, a certain generalization of probability
theory to allow minus signs. Once you know what the theory is actually
about, you can then sprinkle in physics to taste, and calculate the
spectrum of whatever atom you want. This second approach is the one I'll
be following here.

So, what is quantum mechanics? Even though it was discovered by
physicists, it's not a physical theory in the same sense as
electromagnetism or general relativity. In the usual "hierarchy of
sciences" -- with biology at the top, then chemistry, then physics, then
math -- quantum mechanics sits at a level between math and physics that
I don't know a good name for. Basically, quantum mechanics is the
operating system that other physical theories run on as application
software (with the exception of general relativity, which hasn't yet
been successfully ported to this particular OS). There's even a word for
taking a physical theory and porting it to this OS: "to quantize."

But if quantum mechanics isn't physics in the usual sense -- if it's not
about matter, or energy, or waves, or particles -- then what is it
about? From my perspective, it's about information and probabilities and
observables, and how they relate to each other.

My contention in this lecture is the following: Quantum mechanics is
what you would inevitably come up with if you started from probability
theory, and then said, let's try to generalize it so that the numbers we
used to call "probabilities" can be negative numbers. As such, the
theory could have been invented by mathematicians in the 19th century
without any input from experiment. It wasn't, but it could have been.

**A Less Than 0% Chance**

Alright, so what would it mean to have "probability theory" with
negative numbers? Well, there's a reason you never hear the weather
forecaster talk about a -20% chance of rain tomorrow -- it really does
make as little sense as it sounds. But I'd like you to set any qualms
aside, and just think abstractly about an event with N possible
outcomes. We can express the probabilities of those events by a vector
of N real numbers:

(p1,....,pN),

Mathematically, what can we say about this vector? Well, the
probabilities had better be nonnegative, and they'd better sum to 1. We
can express the latter fact by saying that the 1-norm of the probability
vector has to be 1. (The 1-norm just means the sum of the absolute
values of the entries.)

But the 1-norm is not the only norm in the world -- it's not the only
way we know to define the "size" of a vector. There are other ways, and
one of the recurring favorites since the days of Pythagoras has been the
2-norm or Euclidean norm. Formally, the Euclidean norm means the square
root of the sum of the squares of the entries. Informally, it means
you're late for class, so instead of going this way and then that way,
you cut across the grass.

Now, what happens if you try to come up with a theory that's like
probability theory, but based on the 2-norm instead of the 1-norm? I'm
going to try to convince you that quantum mechanics is what inevitably
results.

Let's consider a single bit. In probability theory, we can describe a
bit as having a probability p of being 0, and a probability 1-p of being
1. But if we switch from the 1-norm to the 2-norm, now we no longer want
two numbers that sum to 1, we want two numbers whose squares sum to 1.
(I'm assuming we're still talking about real numbers.) In other words,
we now want a vector (α,β) where  α2 + β2 = 1. Of course, the set of all
such vectors forms a circle:

![](circle.gif)

The theory we're inventing will somehow have to connect to observation.
So, suppose we have a bit that's described by this vector (α,β). Then
we'll need to specify what happens if we look at the bit. Well, since it
is a bit, we should see either 0 or 1\! Furthermore, the probability of
seeing 0 and the probability of seeing 1 had better add up to 1. Now,
starting from the vector (α,β), how can we get two numbers that add up
to 1? Simple: we can let α2 be the probability of a 0 outcome, and let
β2 be the probability of a 1 outcome.

But in that case, why not forget about α and β, and just describe the
bit directly in terms of probabilities? Ahhhhh. The difference comes in
how the vector changes when we apply an operation to it. In probability
theory, if we have a bit that's represented by the vector (p,1-p), then
we can represent any operation on the bit by a stochastic matrix: that
is, a matrix of nonnegative real numbers where every column adds up to
1. So for example, the "bit flip" operation -- which changes the
probability of a 1 outcome from p to 1-p -- can be represented as
follows:

![](/cgi-bin/mimetex.cgi?%5Cleft\(%20%5Cbegin%7Barray%7D0%20&%201%5C%5C1%20&%200%5Cend%7Barray%7D%20%5Cright\)%5Cleft\(%5Cbegin%7Barray%7D%20p%5C%5C1-p%5Cend%7Barray%7D%5Cright\)=%5Cleft\(%5Cbegin%7Barray%7D1-p%5C%5Cp%5Cend%7Barray%7D%5Cright\))

Indeed, it turns out that a stochastic matrix is the most general sort
of matrix that always maps a probability vector to another probability
vector.

**Exercise 1 for the Non-Lazy Reader:** Prove this.

But now that we've switched from the 1-norm to the 2-norm, we have to
ask: what's the most general sort of matrix that always maps a unit
vector in the 2-norm to another unit vector in the 2-norm?

Well, we call such a matrix a unitary matrix -- indeed, that's one way
to define what a unitary matrix is\! (Oh, all right. As long as we're
only talking about real numbers, it's called an orthogonal matrix. But
same difference.) Another way to define a unitary matrix, again in the
case of real numbers, is as a matrix whose inverse equals its transpose.

**Exercise 2 for the Non-Lazy Reader:** Prove that these two definitions
are equivalent.

This "2-norm bit" that we've defined has a name, which as you know is
qubit. Physicists like to represent qubits using what they call "Dirac
ket notation," in which the vector (α,β) becomes
![](/cgi-bin/mimetex.cgi?%5Calpha%20%7C0%5Crangle%20+%20%5Cbeta%20%7C1%5Crangle).
Here α is the amplitude of outcome |0〉, and β is the amplitude of
outcome |1〉.

This notation usually drives computer scientists up a wall when they
first see it -- especially because of the asymmetric brackets\! But if
you stick with it, you see that it's really not so bad. As an example,
instead of writing out a vector like (0,0,3/5,0,0,0,4/5,0,0), you can
simply write
![](/cgi-bin/mimetex.cgi?%5Cfrac%7B3%7D%7B5%7D%20%7C3%5Crangle%20+%20%5Cfrac%7B4%7D%7B5%7D%20%7C7%5Crangle),
omitting all of the 0 entries.

So given a qubit, we can transform it by applying any 2-by-2 unitary
matrix -- and that leads already to the famous effect of quantum
interference. For example, consider the unitary
matrix

![](/cgi-bin/mimetex.cgi?%20%5Cleft\(%20%5Cbegin%7Barray%7D%5Cfrac%7B1%7D%7B%5Csqrt%7B2%7D%7D%20&%20-%5Cfrac%7B1%7D%7B%5Csqrt%7B2%7D%7D%5C%5C%20%5Cfrac%7B1%7D%7B%5Csqrt%7B2%7D%7D%20&%20%5Cfrac%7B1%7D%7B%5Csqrt%7B2%7D%7D%20%5Cend%7Barray%7D%20%5Cright\))

which takes a vector in the plane and rotates it by 45 degrees
counterclockwise. Now consider the state |0〉. If we apply U once to this
state, we'll get
![](/cgi-bin/mimetex.cgi?%5Cfrac%7B1%7D%7B%5Csqrt%7B2%7D%7D%20%5C\(%20%7C0%5Crangle%20+%20%7C1%5Crangle%20%5C\))
-- it's like taking a coin and flipping it. But then, if we apply the
same operation U a second time, we'll get
|1〉:

![](/cgi-bin/mimetex.cgi?%5Cleft\(%5Cbegin%7Barray%7D%5Cfrac%7B1%7D%7B%5Csqrt%7B2%7D%7D%20&%20-%5Cfrac%7B1%7D%7B%5Csqrt%7B2%7D%7D%5C%5C%20%5Cfrac%7B1%7D%7B%5Csqrt%7B2%7D%7D%20&%20%5Cfrac%7B1%7D%7B%5Csqrt%7B2%7D%7D%5Cend%7Barray%7D%5Cright\)%20%20%5Cleft\(%20%5Cbegin%7Barray%7D%20%5Cfrac%7B1%7D%7B%5Csqrt%7B2%7D%7D%20%5C%5C%20%5Cfrac%7B1%7D%7B%5Csqrt%7B2%7D%7D%20%5Cend%7Barray%7D%5Cright\)=%5Cleft\(%5Cbegin%7Barray%7D0%5C%5C1%5Cend%7Barray%7D%5Cright\)%20)

So in other words, applying a "randomizing" operation to a "random"
state produces a deterministic outcome\! Intuitively, even though there
are two "paths" that lead to the outcome |0〉, one of those paths has
positive amplitude and the other has negative amplitude. As a result,
the two paths interfere destructively and cancel each other out. By
contrast, the two paths leading to the outcome |1〉 both have positive
amplitude, and therefore interfere constructively.

![](interfere.gif)

The reason you never see this sort of interference in the classical
world is that probabilities can't be negative. So, cancellation between
positive and negative amplitudes can be seen as the source of all
"quantum weirdness" -- the one thing that makes quantum mechanics
different from classical probability theory. How I wish someone had told
me that when I first heard the word "quantum"\!

**Mixed States**

Once we have these quantum states, one thing we can always do is to take
classical probability theory and "layer it on top." In other words, we
can always ask, what if we don't know which quantum state we have? For
example, what if we have a 1/2 probability of
![](/cgi-bin/mimetex.cgi?%5Cfrac%7B1%7D%7B%5Csqrt%7B2%7D%7D%20%5C\(%20%7C0%5Crangle%20+%20%7C1%5Crangle%20%5C\))
and a 1/2 probability of
![](/cgi-bin/mimetex.cgi?%5Cfrac%7B1%7D%7B%5Csqrt%7B2%7D%7D%20%5C\(%20%7C0%5Crangle%20-%20%7C1%5Crangle%20%5C\))?
This gives us what's called a mixed state, which is the most general
kind of state in quantum mechanics.

Mathematically, we represent a mixed state by an object called a density
matrix. Here's how it works: say you have this vector of N amplitudes,
(α1,...,αN). Then you compute the outer product of the vector with
itself -- that is, an N-by-N matrix whose (i,j) entry is αiαj (again in
the case of real numbers). Then, if you have a probability distribution
over several such vectors, you just take a linear combination of the
resulting matrices. So for example, if you have probability p of some
vector and probability 1-p of a different vector, then it's p times the
one matrix plus 1-p times the other.

The density matrix encodes all the information that could ever be
obtained from some probability distribution over quantum states, by
first applying a unitary operation and then measuring.

**Exercise 3 for the Non-Lazy Reader:** Prove this.

This implies that if two distributions give rise to the same density
matrix, then those distributions are empirically indistinguishable, or
in other words are the same mixed state. As an example, let's say you
have the state
![](/cgi-bin/mimetex.cgi?%5Cfrac%7B1%7D%7B%5Csqrt%7B2%7D%7D%20%5C\(%20%7C0%5Crangle%20+%20%7C1%5Crangle%20%5C\))
with 1/2 probability, and
![](/cgi-bin/mimetex.cgi?%5Cfrac%7B1%7D%7B%5Csqrt%7B2%7D%7D%20%5C\(%20%7C0%5Crangle%20-%20%7C1%5Crangle%20%5C\))
with 1/2 probability. Then the density matrix that describes your
knowledge
is

![](/cgi-bin/mimetex.cgi?%20%5Cfrac%7B1%7D%7B2%7D%20%5Cleft\(%20%5Cbegin%7Barray%7D%5Cfrac%7B1%7D%7B2%7D%20&%20%5Cfrac%7B1%7D%7B2%7D%5C%5C%20%5Cfrac%7B1%7D%7B2%7D%20&%20%5Cfrac%7B1%7D%7B2%7D%5Cend%7Barray%7D%20%5Cright\)%20+%20%5Cfrac%7B1%7D%7B2%7D%20%5Cleft\(%20%5Cbegin%7Barray%7D%5Cfrac%7B1%7D%7B2%7D%20&%20-%5Cfrac%7B1%7D%7B2%7D%5C%5C%20-%5Cfrac%7B1%7D%7B2%7D%20&%20%5Cfrac%7B1%7D%7B2%7D%5Cend%7Barray%7D%20%5Cright\)%20%20=%20%5Cleft\(%20%5Cbegin%7Barray%7D%5Cfrac%7B1%7D%7B2%7D%20&%200%5C%5C%200%20&%20%5Cfrac%7B1%7D%7B2%7D%5Cend%7Barray%7D%20%5Cright\)%20)

It follows, then, that no measurement you can ever perform will
distinguish this mixture from a 1/2 probability of |0〉 and a 1/2
probability of |1〉.

**The Squaring Rule**

Now let's talk about the question Gus raised, which is, why do we square
the amplitudes instead of cubing them or raising them to the fourth
power or whatever?

Alright, I can give you a couple of arguments for why God decided to
square the amplitudes.

The first argument is a famous result called Gleason's Theorem from the
1950's. Gleason's Theorem lets us assume part of quantum mechanics and
then get out the rest of it\! More concretely, suppose we have some
procedure that takes as input a unit vector of real numbers, and that
spits out the probability of an event. Formally, we have a function f
that maps a unit vector ![](/cgi-bin/mimetex.cgi?v%20%5Cin%20%5CRe%5EN)
to the unit interval \[0,1\]. And let's suppose N=3 -- the theorem
actually works in any number of dimensions three or greater (but
interestingly, not in two dimensions). Then the key requirement we
impose is that, whenever three vectors v1,v2,v3 are all orthogonal to
each other,

f(v1) + f(v2) + f(v3) = 1.

Intuitively, if these three vectors represent "orthogonal ways" of
measuring a quantum state, then they should correspond to
mutually-exclusive events. Crucially, we don't need any assumption other
than that -- no continuity, no differentiability, no nuthin'.

So, that's the setup. The amazing conclusion of the theorem is that, for
any such f, there exists a mixed state such that f arises by measuring
that state according to the standard measurement rule of quantum
mechanics. I won't be able prove this theorem here, since it's pretty
hard. But it's one way that you can "derive" the squaring rule without
exactly having to put it in at the outset.

**Exercise 4 for the Non-Lazy Reader:** Why does Gleason's Theorem not
work in two dimensions?

If you like, I can give you a much more elementary argument. This is
something I put it in [one of my
papers](http://www.scottaaronson.com/papers/island.pdf), though I'm sure
many others knew it before.

Let's say we want to invent a theory that's not based on the 1-norm like
classical probability theory, or on the 2-norm like quantum mechanics,
but instead on the p-norm for some
![](/cgi-bin/mimetex.cgi?p%20%5Cnot%5Cin%20%5C%7B1,2%5C%7D). Call
(v1,...,vN) a unit vector in the p-norm if

|v1|p+...+|vN|p = 1.

Then we'll need some "nice" set of linear transformations that map any
unit vector in the p-norm to another unit vector in the p-norm.

It's clear that for any p we choose, there will be some linear
transformations that preserve the p-norm. Which ones? Well, we can
permute the basis elements, shuffle them around. That'll preserve the
p-norm. And we can stick in minus signs if we want. That'll preserve the
p-norm too. But here's the little observation I made: if there are any
linear transformations other than these trivial ones that preserve the
p-norm, then either p=1 or p=2. If p=1 we get classical probability
theory, while if p=2 we get quantum mechanics.

**Exercise 5 for the Non-Lazy Reader**: Prove my little observation.

Alright, to get you started, let me give some intuition about why my
observation might be true. Let's assume, for simplicity, that everything
is real and that p is a positive even integer (though the observation
also works with complex numbers and with any real p≥0). Then for a
linear transformation A=(aij) to preserve the p-norm means
that

![](/cgi-bin/mimetex.cgi?w_1%5Ep+...+w_N%5Ep=v_1%5Ep+...+v_N%5Ep)

whenever

![](/cgi-bin/mimetex.cgi?%5Cleft\(%5Cbegin%7Barray%7Dw_%7B1%7D%5C%5C%20:%5C%5Cw_%7BN%7D%5Cend%7Barray%7D%5Cright\)=%5Cleft\(%5Cbegin%7Barray%7Da_%7B11%7D&...&a_%7B1N%7D%5C%5C:%20&%20%20&%20:%5C%5C%20a_%7BN1%7D%20&%20...%20&%20a_%7BNN%7D%5Cend%7Barray%7D%20%5Cright\)%20%20%5Cleft\(%20%5Cbegin%7Barray%7D%20v_%7B1%7D%5C%5C%20:%5C%5C%20v_%7BN%7D%5Cend%7Barray%7D%5Cright\))

Now we can ask: how many constraints are imposed on the matrix A by the
requirement that this be true for every v1,...,vN? If we work it out, in
the case p=2 we'll find that there are
![](/cgi-bin/mimetex.cgi?N+%5Cleft\(%5Cbegin%7Barray%7DN%5C%5C2%5Cend%7Barray%7D%5Cright\))
constraints. But since we're trying to pick an N-by-N matrix, that still
leaves us N(N-1)/2 degrees of freedom to play with.

On the other hand, if (say) p=4, then the number of constraints grows
like
![](/cgi-bin/mimetex.cgi?%5Cleft\(%5Cbegin%7Barray%7DN%5C%5C4%5Cend%7Barray%7D%5Cright\)),
which is greater than N2 (the number of variables in the matrix). That
suggests that it will be hard to find a nontrivial linear transformation
that preserves 4-norm. Of course it doesn't prove that no such
transformation exists -- that's left as a puzzle for you.

Incidentally, this isn't the only case where we find that the 1-norm and
2-norm are "more special" than other p-norms. So for example, have you
ever seen the following equation?

xn + yn = zn

There's a cute little fact -- unfortunately I won't have time to prove
it in class -- that the above equation has nontrivial integer solutions
when n=1 or n=2, but not for any larger integers n. Clearly, then, if we
use the 1-norm and the 2-norm more than other vector norms, it's not
some arbitrary whim -- these really are God's favorite norms\! (And we
didn't even need an experiment to tell us that.)

**Real vs. Complex Numbers**

Even after we've decided to base our theory on the 2-norm, we still have
at least two choices: we could let our amplitudes be real numbers, or we
could let them be complex numbers. We know the solution God chose:
amplitudes in quantum mechanics are complex numbers. This means that you
can't just square an amplitude to get a probability; first you have to
take the absolute value, and then you square that. In other words, if
the amplitude for some measurement outcome is α = β + γi, where β and γ
are real, then the probability of seeing the outcome is |α|2 = β2 + γ2.

Why did God go with the complex numbers and not the real numbers?

Years ago, at Berkeley, I was hanging out with some math grad students
-- I fell in with the wrong crowd -- and I asked them that exact
question. The mathematicians just snickered. "Give us a break -- the
complex numbers are algebraically closed\!" To them it wasn't a mystery
at all.

But to me it is sort of strange. I mean, complex numbers were seen for
centuries as fictitious entities that human beings made up, in order
that every quadratic equation should have a root. (That's why we talk
about their "imaginary" parts.) So why should Nature, at its most
fundamental level, run on something that we invented for our
convenience?

Alright, yeah: suppose we require that, for every linear transformation
U that we can apply to a state, there must be another transformation V
such that V2 = U. This is basically a continuity assumption: we're
saying that, if it makes sense to apply an operation for one second,
then it ought to make sense to apply that same operation for only half a
second.

Can we get that with only real amplitudes? Well, consider the following
linear
transformation:

![](/cgi-bin/mimetex.cgi?%5Cleft\(%20%5Cbegin%7Barray%7D1%20&%200%5C%5C%200%20&%20-1%5Cend%7Barray%7D%20%5Cright\)%20)

This transformation is just a mirror reversal of the plane. That is, it
takes a two-dimensional Flatland creature and flips it over like a
pancake, sending its heart to the other side of its two-dimensional
body. But how do you apply half of a mirror reversal without leaving the
plane? You can't\! If you want to flip a pancake by a continuous motion,
then you need to go into ... dum dum dum ... THE THIRD DIMENSION.

More generally, if you want to flip over an N-dimensional object by a
continuous motion, then you need to go into the (N+1)st dimension.

**Exercise 6 for the Non-Lazy:** Prove that any norm-preserving linear
transformation in N dimensions can be implemented by a continuous motion
in N+1 dimensions.

But what if you want every linear transformation to have a square root
in the same number of dimensions? Well, in that case, you have to allow
complex numbers. So that's one reason God might have made the choice She
did.

Alright, I can give you two other reasons why amplitudes should be
complex numbers.

The first comes from asking, how many independent real parameters are
there in an N-dimensional mixed state? As it turns out, the answer is
exactly N2 -- provided we assume, for convenience, that the state
doesn't have to be normalized (i.e., that the probabilities can add up
to less than 1). Why? Well, an N-dimensional mixed state is represented
mathematically by a N-by-N
[Hermitian](http://en.wikipedia.org/wiki/Hermitian_matrix) matrix with
positive eigenvalues. Since we're not normalizing, we've got N
independent real numbers along the main diagonal. Below the main
diagonal, we've got N(N-1)/2 independent complex numbers, which means
N(N-1) real numbers. Since the matrix is Hermitian, the complex numbers
below the main diagonal determine the ones above the main diagonal. So
the total number of independent real parameters is N + N(N-1) = N2.

Now we bring in an aspect of quantum mechanics that I didn't mention
before. If we know the states of two quantum systems individually, then
how do we write their combined state? Well, we just form what's called
the tensor product. So for example, the tensor product of two qubits,
α|0〉+β|1〉 and γ|0〉+δ|1〉, is given
by

![](/cgi-bin/mimetex.cgi?%5Cleft\(%5Calpha%7C0%5Crangle+%5Cbeta%20%7C1%5Crangle%5Cright\)%5Cotimes%5Cleft\(%5Cgamma%7C0%5Crangle+%5Cdelta%7C1%5Crangle%5Cright\)%20%20=%5Calpha%5Cgamma%20%7C00%5Crangle%20+%5Calpha%20%5Cdelta%20%7C01%5Crangle%20+%5Cbeta%5Cgamma%20%7C10%5Crangle+%5Cbeta%5Cdelta%20%7C11%5Crangle)

Again one can ask: did God have to use the tensor product? Could She
have chosen some other way of combining quantum states into bigger ones?
Well, maybe someone else can say something useful about this question --
I have trouble even wrapping my head around it\! For me, saying we take
the tensor product is almost what we mean when we say we're putting
together two systems that exist independently of each other.

As you all know, there are two-qubit states that can't be written as the
tensor product of one-qubit states. The most famous of these is the EPR
(Einstein-Podolsky-Rosen)
pair:

![](/cgi-bin/mimetex.cgi?%5Cfrac%7B%7C00%5Crangle%20+%7C11%5Crangle%20%7D%7B%5Csqrt%7B2%7D%7D)

Given a mixed state ρ on two subsystems A and B, if ρ can be written as
a probability distribution over tensor product states
![](/cgi-bin/mimetex.cgi?%7C%5Cpsi_A%5Crangle%20%5Cotimes%20%7C%5Cpsi_B%5Crangle),
then we say ρ is separable. Otherwise we say ρ is entangled.

Now let's come back to the question of how many real parameters are
needed to describe a mixed state. Suppose we have a (possibly-entangled)
composite system AB. Then intuitively, it seems like the number of
parameters needed to describe AB -- which I'll call dAB -- should equal
the product of the number of parameters needed to describe A and the
number of parameters needed to describe B:

dAB = dA dB.

If amplitudes are complex numbers, then happily this is true\! Letting
NA and NB be the number of dimensions of A and B respectively, we have

dAB = (NA NB)2 = NA2 NB2 = dA dB.

But what if the amplitudes are real numbers? In that case, in an N-by-N
density matrix, we'd only have N(N+1)/2 independent real parameters. And
it's not the case that if N = NA NB
then

![](/cgi-bin/mimetex.cgi?%5Cfrac%7BN%5Cleft\(N+1%5Cright\)%20%7D%7B2%7D=%5Cfrac%7BN_%7BA%7D%5Cleft\(%20%20N_%7BA%7D+1%5Cright\)%20%20%7D%7B2%7D%5Ccdot%5Cfrac%7BN_%7BB%7D%5Cleft\(%20N_%7BB%7D+1%5Cright\)%20%7D%7B2%7D)

There's actually another phenomenon with the same "Goldilocks" flavor,
which was observed by Bill Wootters -- and this leads to my third reason
why amplitudes should be complex numbers. Let's say we choose a quantum
state

![](/cgi-bin/mimetex.cgi?%5Csum_%7Bi=1%7D%5E%7BN%7D%5Calpha_%7Bi%7D%20%7Ci%5Crangle)

uniformly at random (if you're a mathematician, under the Haar measure).
And then we measure it, obtaining outcome |i〉 with probability |αi|2.
The question is, will the resulting probability vector also be
distributed uniformly at random in the probability simplex? It turns out
that if the amplitudes are complex numbers, then the answer is yes. But
if the amplitudes are real numbers or quaternions, then the answer is
no\! (I used to think this fact was just a curiosity, but now I'm
actually using it in a paper I'm working on...)

**Linearity**

We've talked about why the amplitudes should be complex numbers, and why
the rule for converting amplitudes to probabilities should be a squaring
rule. But all this time, the elephant of linearity has been sitting
there undisturbed. Why would God have decided, in the first place, that
quantum states should evolve to other quantum states by means of linear
transformations?

**Exercise 7 for the Non-Lazy Reader:** Prove that if quantum mechanics
were nonlinear, then not only could you solve **NP**-complete problems
in polynomial time, you could also use EPR pairs to transmit information
faster than the speed of light.

**Further Reading**

See [this](http://www.arxiv.org/abs/quant-ph/0101012) paper by Lucien
Hardy for a "derivation" of quantum mechanics that's closely related to
the arguments I gave, but much, much more serious and careful. Also see
pretty much anything [Chris
Fuchs](http://netlib.bell-labs.com/who/cafuchs/) has written (and
especially [this](http://www.arxiv.org/abs/quant-ph/0104088) paper by
Caves, Fuchs, and Schack, which discusses why amplitudes should be
complex numbers rather than reals or quaternions).

[\[Discussion of this lecture on
blog\]](http://scottaaronson.com/blog/?p=188)

[\[← Previous lecture](lec8.html) | [Next lecture →\]](lec10.html)

[\[Return to PHYS771 home page\]](default.html)