Special Relativity without assumptions about the speed of light


As the Special Theory of Relativity seems to contradict the common sense, it remains a somewhat magical topic for the regular people. The consequences of this theory seem to be so far removed from everyday life, that it's quite hard to admit them as the correct description of the surrounding reality.

Most people have their first contact with SR at school and its introduction there looks somewhat like this: near the end of the 19th century people discovered the electromagnetic waves. The equations describing these waves imply a specific speed of their propagation, denoted c and equal to about 300 000 km/s. It was quite interesting, since nothing seemed to imply any frame of reference for this speed. Since all known waves required a medium to propagate, it was assumed that the electromagnetic waves are no different and travel in something called the aether, and that the speed arising from the equations is relative to the aether.

Once people decided that aether should exist, the logical next step was to try and detect it. One of the ideas was to measure the speed of the Earth relative to the aether. Some attempts were made, but the results were unexpected - it seemed that the Earth is not moving in the aether. It was strange, especially considering that the Earth changes its velocity in its motion around the Sun, so even if it did stand still in the aether at one point, it shouldn't at another one - but the measured speed was always 0. People then tried to modify the concept of the aether to explain the results and started performing more sensistive experiments. One of these was the famous Michelson-Morley experiment, which, just like the earlier attempts, failed to detect the motion of the Earth, too.

The scientists were rather confused with these results. It seemed that the speed of light was constant regardless of the motion of the observer, which was quite extraordinary. To better illustrate what is so strange about this situation, let us imagine that we are in a car standing at an intersection, and that there is another car in front of us. Once the traffic light turns green, the car in front of us starts moving and accelerates to 15 m/s, so its distance from us starts to grow by 15 meters every second. We start moving shortly afterwards. Once we are moving at 5 m/s, we expect the car ahead to be leaving us behind by 10 m every second, but once we check that, we are surprised to discover that the distance is still growing by 15 m/s. We accelerate to 10 m/s - and the distance is still growing by 15 m/s. We accelerate more and more, but we can't seem to start catching up to the car in front, even though our friend, a policeman, was standing with a radar near the road and told us that the speed of that car was always just 15 m/s. Light seemed to behave just like such a weird car.

The 20th century came and various people were proposing different explanations - among them were Lorentz, Poincare, and eventually Einstein. In 1905, Einstein presented a theory known today as Special Relativity, which was based on 3 assumptions:

  1. The space(time) is homogeneous and isotropic, ie. there are no special points or directions in the Universe.
  2. There are no special inertial frames of reference, the laws of physics are the same in all of them - this is the so called Galilean relativity principle.
  3. The speed of light is the same in all frames of reference - this was a conclusion from the Michelson-Morley experiment.

Thus the aether became unnecessary - from that moment on, c was just a universal speed, independent of who is measuring it. Coincidentally, this also has some unusual consequences, such as time passing slower for moving observers, or contraction of moving objects.

There is still a loophole, though. One could argue - and some people do - that the third assumption is not adequately proven. The Michelson-Morley experiment could have been not sensitive enough, or it could give a null result under some specific circumstances, even though the speed of light is not really constant. Thus, SR can be (and, according to some, just is) wrong.

This is all true, but not many people are aware that this third assumption isn't actually needed to obtain SR. I'm going to show here how this is possible.

The derivation of SR

I'll just note here that the derivation below is heavily inspired by a lecture by prof. Andrzej Szymacha, which I actually attended during my first year of studies. He showed us a reasoning that is almost identical to what I'm going to present, but a bit more complex in my opinion, so I decided to make small modifications.

Let us outline the situation, then. Imagine that we have two observers, who we will denote O and O'. Both of them assign their own coordinates to the events in spacetime - they are (t, x, y, z) for O, and (t', x', y', z') for O'. Both of them find themselves at a point with spatial coordinates equal to 0 in their respective coordinate systems, that is, we have x=y=z=0 for O, and x'=y'=z'=0 for O'. We also assume that both observers met in a single point at time t=t'=0 and that O' is moving at a speed of v in direction x in O's frame of reference, so in O's frame the coordinates of O' satisfy x=vt.

Since we are only really interested in two directions - one temporal and one spatial - we will forget about y, y', z, z'. They play no role in the conclusions, and it will simplify the reasoning a lot.

Rozważana sytuacja. U góry: względem O, na dole: względem O'
The situation under consideration. Top: relative to O, bottom: relative to O'

Another huge simplification will be to assume that the axes x and x' point in the opposite directions. This way the situations of O and O' are perfectly symmetrical - both O' moves away from O in the positive x direction, and O moves away from O' in the positive x' direction. This perfect symmetry lets us immediately conclude that O's speed has to be v in O''s frame of reference, as everything looks exactly the same regardless of which observer is marked as O, and which one is O'.

Let us move on to some more mathematical issues. For starters, let us note that the homogeneity and isotropy of spacetime mean that the transformation between the frames of reference must be linear, ie. x' and t' can depend on at most the first powers of x and t. Why? If there were higher exponents in the equations, they would change their form in translations, that is, if we changed the choice of the point denotes as (0,0). We couldn't declare all points as equally good then, at least one would stand out - and we are assuming that it isn't so.

Linear transformations are pleasant in the way they can be written with matrices. We will then represent our transformation from O to O' this way:

 \left[ \begin{array}{c} t' \\ x' \end{array} \right] = \left[ \begin{array}{cc} A(v) & B(v) \\ C(v) & D(v)\end{array} \right] \left[ \begin{array}{c} t \\ x \end{array} \right]

For the people not familiar with matrices - the notation above means exactly the same as this one:

 t' = A(v)t + B(v)x \\ x' = C(v)t + D(v)x

Let us consider what we can deduce about the coefficients A, B, C, D.

First, since we know that the situation is symmetrical, we can immediately write:

 \left[ \begin{array}{c} t \\ x \end{array} \right] = \left[ \begin{array}{cc} A(v) & B(v) \\ C(v) & D(v)\end{array} \right] \left[ \begin{array}{c} t' \\ x' \end{array} \right]

The transformation from O' to O has to be exactly the same as the one from O to O', because, as we mentioned, switching the observators' places changes nothing in the situation. Hence, we can write:

 \left[ \begin{array}{c} t \\ x \end{array} \right] = \left[ \begin{array}{cc} A(v) & B(v) \\ C(v) & D(v)\end{array} \right] \left[ \begin{array}{cc} A(v) & B(v) \\ C(v) & D(v)\end{array} \right] \left[ \begin{array}{c} t \\ x \end{array} \right]

This simplifies to:

 \left[ \begin{array}{c} t \\ x \end{array} \right] = \left[ \begin{array}{cc} A(v)^2 + B(v)C(v) & A(v)B(v) + B(v)D(v) \\ A(v)C(v) + C(v)D(v) & B(v)C(v) + D(v)^2 \end{array} \right] \left[ \begin{array}{c} t \\ x \end{array} \right]

In order for everything to fit, the following must hold:

 A(v)^2 + B(v)C(v) = 1 \\ A(v)B(v) + B(v)D(v) = 0 \\ A(v)C(v) + C(v)D(v) = 0 \\ B(v)C(v) + D(v)^2 = 1

The equations 2 and 3 immediately lead to the conclusion that A(v) = -D(v). The first and the fourth one are equivalent, then.

Denoting the transformation matrix as L_1(v), we get:

 L_1(v) = \left[ \begin{array}{cc} A(v) & B(v) \\ C(v) & -A(v)\end{array} \right] \\ A(v)^2 + B(v)C(v) = 1

What's next? Let us remember that we mentioned that x' = 0 is the same as x = vt. Since we can read x' = C(v)t - A(v)x from the matrix, we get:

 0 = C(v)t - A(v)vt

Dividing by t, we will get C(v) = vA(v). We can substitute this into 1 = A(v)^2 + B(v)C(v) and extract B(v):

 1 = A(v)^2 + vB(v)A(v)

 B(v) = \frac{1 - A(v)^2}{vA(v)}

The transformation then takes the form:

 L_1(v) = \left[ \begin{array}{cc} A(v) & \frac{1 - A(v)^2}{vA(v)} \\ vA(v) & -A(v)\end{array} \right]

This is a lot already, but we still don't know what A(v) is. In order to find out, we have to introduce some more complications.

First of all, let us give up on symmetry. The transformation in SR is usually written under the assumption that the axes x and x' face the same direction. To achieve this, it is enough to flip the sign of x'. How do we do that?

Since we assumed x' = C(v)t + D(v)x, after flipping the sign we will get -x' = -C(v)t - D(v)x. So, in order to get the transformation with axes facing the same direction, it is enough to flip the signs of the bottom coefficients in the matrix. We will denote this "flipped" matrix as L(v):

 L(v) = \left[ \begin{array}{cc} A(v) & \frac{1 - A(v)^2}{vA(v)} \\ -vA(v) & A(v)\end{array} \right]

Let us also note that if we change the sign of the velocity (ie. it will be -v instead of v), we will get t' = A(-v)t + B(-v)x. If we now also flip the sign of x, we are back in the same situation in O's frame (opposite speed and opposite axis, so the observer is moving away in the positive x direction again). t' can't change, then. This means:

 A(v)t + B(v)x = A(-v)t - B(-v)x

 A(v)t + \frac{1 - A(v)^2}{vA(v)}x = A(-v)t + \frac{1 - A(-v)^2}{vA(-v)}x

From this we can conclude that A(v) = A(-v).

Wprowadzenie trzeciego obserwatora. U góry: sytuacja względem O', na dole: względem O
The introduction of the third observer. Top: the situation relative to O', bottom: relative to O

The second part of the whole ordeal is introducing a third observer. We will call him O'' and we will say that he moves at a speed of u relative to O', ie. we have x' = ut' for O''. What is his speed relative to O? Let us denote it by V, which will mean x = Vt. In order to transition from O' to O, we need to make a transformation by -v:

 \left[ \begin{array}{c} t \\ Vt \end{array} \right] = \left[ \begin{array}{cc} A(v) & \frac{1 - A(v)^2}{-vA(v)} \\ vA(v) & A(v)\end{array} \right] \left[ \begin{array}{c} t' \\ ut' \end{array} \right]

From this we get:

 t = A(v)t' - \frac{1 - A(v)^2}{A(v)}\frac{u}{v}t' \\ Vt = vA(v)t' + A(v)ut'

We substitute t from the first equation to the second and we get:

 V = \frac{u+v}{1 + \frac{A(v)^2 - 1}{A(v)^2}\frac{u}{v}}

Sytuacja z trzecim obserwatorem względem O''
The situation with the third observer relative to O''

Still with me? So now the other way round: O moves at -V relative to O'', and at -v relative to O', so we transform by u from O' to O'':

 \left[ \begin{array}{c} t'' \\ -Vt'' \end{array} \right] = \left[ \begin{array}{cc} A(u) & \frac{1 - A(u)^2}{uA(u)} \\ -uA(u) & A(u)\end{array} \right] \left[ \begin{array}{c} t' \\ -vt' \end{array} \right]


 t'' = A(u)t' - \frac{1-A(u)^2}{A(u)}\frac{v}{u}t' \\ -Vt'' = -uA(u)t' - A(u)vt'

This gives:

 V = \frac{u+v}{1 + \frac{A(u)^2 - 1}{A(u)^2}\frac{v}{u}}

Phew. We calculated V in two ways. However, it is still the same V, so both results must be the same. The denominators must be the same, then:

 1 + \frac{A(u)^2 - 1}{A(u)^2}\frac{v}{u} = 1 + \frac{A(v)^2 - 1}{A(v)^2}\frac{u}{v}

After subtracting 1 and dividing by uv we get:

 \frac{A(u)^2 - 1}{u^2A(u)^2} = \frac{A(v)^2 - 1}{v^2A(v)^2}

So now we are reaching the climax. The left-hand side only depends on u, and the right-hand side only on v, which are two independent parameters. If we set a specific u, the left-hand side will be determined, but v is still subject to change. Despite that, the right-hand side cannot change, because it must still be equal to the other one. This means that both sides must be constant, equal to a number we will call \alpha:

 \alpha = \frac{A(v)^2 - 1}{v^2A(v)^2}

Solving this for A(v) leads to the result:

 A(v) = \frac{1}{\sqrt{1 - \alpha v^2}}

We can thus write the final transformation:

 L(v) = \left[ \begin{array}{cc} \frac{1}{\sqrt{1 - \alpha v^2}} & \frac{-\alpha v}{\sqrt{1 - \alpha v^2}} \\ \frac{-v}{\sqrt{1 - \alpha v^2}} & \frac{1}{\sqrt{1 - \alpha v^2}}\end{array} \right]

All is great, but what exactly is \alpha...?

Let us first consider the consequences of various possible values of \alpha.

Zero value

This case is the simplest one. When \alpha = 0, the transformation boils down to:

 t' = t \\ x' = x - vt

This is nothing else than the Galileo's transformation! So if the constant turns out to be zero, it will mean that people have known the correct transformation since the 17th century.

Negative value

This is also an interesting case. When the value of \alpha is negative, we can assume that it's \alpha = -\frac{1}{k^2}. The transformation looks like this, then:

 L(v) = \left[ \begin{array}{cc} \frac{1}{\sqrt{1 + \frac{v^2}{k^2}}} & \frac{v}{k^2 \sqrt{1 + \frac{v^2}{k^2}}} \\ \frac{-v}{\sqrt{1 + \frac{v^2}{k^2}}} & \frac{1}{\sqrt{1 + \frac{v^2}{k^2}}} \end{array} \right]

Let us introduce new variables: y' = kt' and y = kt. We get then:

 \left[ \begin{array}{c} y' \\ x' \end{array} \right] = \left[ \begin{array}{cc} \frac{1}{\sqrt{1 + \frac{v^2}{k^2}}} & \frac{\frac{v}{k}}{\sqrt{1 + \frac{v^2}{k^2}}} \\ \frac{-\frac{v}{k}}{\sqrt{1 + \frac{v^2}{k^2}}} & \frac{1}{\sqrt{1 + \frac{v^2}{k^2}}}\end{array} \right] \left[ \begin{array}{c} y \\ x \end{array} \right]

Let us define an angle \varphi such that \tan \varphi = \frac{v}{k}. This reduces the transformation to:

 L(\varphi) = \left[ \begin{array}{cc} \frac{1}{\sqrt{1 + \tan^2 \varphi}} & \frac{\tan \varphi}{\sqrt{1 + \tan^2 \varphi}} \\ \frac{-\tan \varphi}{\sqrt{1 + \tan^2 \varphi}} & \frac{1}{\sqrt{1 + \tan^2 \varphi}} \end{array} \right]

But! We know from trigonometry that:

 \cos \varphi = \frac{1}{\sqrt{1 + \tan^2 \varphi}} \\ \sin \varphi = \frac{\tan \varphi}{\sqrt{1 + \tan^2 \varphi}}

So, we get:

 L(\varphi) = \left[ \begin{array}{cc} \cos \varphi & \sin \varphi \\ -\sin \varphi & \cos \varphi \end{array} \right]

This is just a rotation matrix for the angle \varphi! So, in the case of a negative \alpha, time is just another spatial direction, and changing the velocity by v is a rotation by the angle of \arctan (\sqrt{-\alpha}v).

Positive value

As it turns out, \alpha is actually positive in reality (I will tell you in a moment how we know). We can then denote \alpha = \frac{1}{v_0^2}, where v_0 is some constant in units of velocity. This constant has a special property. In order to see what it is, let us revisit the transformations of velocities.

We already transformed velocities when deriving the matrix coefficients. Let us do it again, then - assume that an object moves at a speed u relative to O' (so it satisfies x' = ut') and see how it moves relative to O:

 \left[ \begin{array}{c} t \\ Vt \end{array} \right] = L(-v) \left[ \begin{array}{c} t' \\ ut' \end{array} \right]

Let us write L(-v) again:

 \left[ \begin{array}{c} t \\ Vt \end{array} \right] = \left[ \begin{array}{cc} \frac{1}{\sqrt{1 - \frac{v^2}{v_0^2}}} & \frac{\frac{v}{v_0^2}}{\sqrt{1 - \frac{v^2}{v_0^2}}} \\ \frac{v}{\sqrt{1 - \frac{v^2}{v_0^2}}} & \frac{1}{\sqrt{1 - \frac{v^2}{v_0^2}}}\end{array} \right] \left[ \begin{array}{c} t' \\ ut' \end{array} \right]

We get:

 t = \frac{t' + \frac{uv}{v_0^2}t'}{\sqrt{1 - \frac{v^2}{v_0^2}}} \\ Vt = \frac{vt' + ut'}{\sqrt{1 - \frac{v^2}{v_0^2}}}

Dividing side by side, we get:

 V = \frac{u + v}{1 + \frac{uv}{v_0^2}}

Let us see what happens when u = v_0:

 V = \frac{v_0 + v}{1 + \frac{vv_0}{v_0^2}} = \frac{v_0 + v}{1 + \frac{v}{v_0}} = v_0

So, if an object moves at a speed v_0 relative to O', it is also moving at v_0 relative to O, regardless of what the relative speed of O and O' is. v_0 is then a kind of a universal speed, independent of the frame of reference.

In the positive case we can also do a trick like what we did in the negative case, and introduce a value \eta called "rapidity" such that: \tanh \eta = \frac{v}{v_0}. Introducing, analogously, y = v_0t, we get:

 L(\eta) = \left[ \begin{array}{cc} \cosh \eta & -\sinh \eta \\ -\sinh \eta & \cosh \eta \end{array} \right]

It is a matrix of a transformation analogous to a rotation, but in a so-called Minkowski spacetime. I won't go into details here, but this idea turns out to be very useful in SR.

Measuring \alpha

Now we know what different values of \alpha mean, but we still don't know what is its value in reality. We do have a nice description of the phenomena that should be happening for various values of \alpha, though, so we can try to measure it. Specifically, we know how to add velocities:

 V = \frac{u + v}{1 + \alpha uv}

Światło w płynącej wodzie
Light in flowing water

One of the first measurements of \alpha was done in 1851 by a French physicist Armand Fizeau, but he didn't know back then that such a constant can exist, nor that it can be deduced from his measurements ;) What he did was measure the speed of light in the air, in water and in flowing water.

The speed of light in water is \frac{c}{n}, where n is the refractive index of water. He expected to get a value of \frac{c}{n} + v in water flowing with speed v, according to Galileo's transformation, but he actually got \frac{c}{n} + v \left( 1 - \frac{1}{n^2} \right). Let us see what we can deduce about \alpha from this.

If we assume that \alpha uv is small, we can approximate the formula for adding velocities:

 V = (u+v)(1 - \alpha uv + ...)

where "..." stands for higher powers of \alpha uv, so numbers that are even smaller.

When u = \frac{c}{n}, we get:

 V \approx (\frac{c}{n} + v)\left(1 - \alpha \frac{c}{n} v \right) = \frac{c}{n} + v - \alpha \frac{c^2}{n^2}v - \alpha \frac{c}{n} v^2

Since \frac{c}{n} was a lot larger than v in Fizeau's setup, this approximately equals:

 V \approx \frac{c}{n} + v \left(1 - \alpha \frac{c^2}{n^2} \right)

For this to agree with Fizeau's results, it must be that \alpha \approx \frac{1}{c^2}. What is not very surprising, the assumption that the speed of light is universal gives exactly \alpha = \frac{1}{c^2}.


We got the Special Theory of Relativity without assuming that the speed of light is constant. To be precise, we got a result that there is a universal speed, which is approximately equal to the speed of light - but all experiments performed so far agree with the theory in which it is exactly the speed of light.

We have shown, then, that we can obtain SR not even assuming that the speed of light is the universal speed. Nevertheless, the experiments indicate that there indeed is a universal speed in nature and that it is the speed of light with very high accuracy (the original Fizeau experiment might not have been this precise, but 150 years have passed since then and we have much more precise results now). So even if it did turn out that the speed of light can depend on the frame of reference - which isn't entirely out of the question - it means nothing for phenomena that are such a pain to SR's opponents like time dilation and Lorentz contraction, or the existence of a universal speed. These phenomena arise from something much more general than just a constant speed of light, and in order to significantly change their interpretation, a discovery much larger than just variability of the speed of light would be needed.

It might be good to remember about this the next time you encounter someone who would try hard to convince you that SR is a scientific conspiracy ;)