Black Hole Merging and Gravitational Waves

  • Neil J. CornishEmail author
Part of the Saas-Fee Advanced Course book series (SAASFEE, volume 48)


I was tasked with covering a wide swath of gravitational wave astronomy—including theory, observation, and data analysis—and to describe the detection techniques used to span the gravitational wave spectrum—pulsar timing, ground based interferometers and their future space based counterparts. For good measure, I was also asked to include an introduction to general relativity and black holes. Distilling all this material into nine lectures was quite a challenge. The end result is a highly condensed set of lecture notes that can be consumed in a few hours, but may take weeks to digest.

1 Introduction

In writing up these lecture notes I have mostly followed the order in which the material was presented in Saas Fee, with the exception of the discussion of the detectors, which have been grouped here in a single section. My goal is not to write a textbook on each topic—many excellent texts and review articles on general relativity and gravitational wave astronomy already exist (see e.g. [11, 16, 37, 40, 50]). Rather, I try to highlight the key concepts and techniques that underpin each topic. I also strive to provide a unified picture that emphasizes the similarities between pulsar timing, ground base detectors and space based detectors, and commonalities in how the data is analyzed across the spectrum and across source types.

2 General Relativity

The historical course that lead Einstein to develop the general theory of relativity had many twists and turns, but as Einstein reflected in 1922, one of his primary goals was to understand the equivalence between inertial mass and gravitational mass “It was most unsatisfactory to me that, although the relation between inertia and energy is so beautifully derived [in Special Relativity], there is no relation between inertia and weight. I suspected that this relationship was inexplicable by means of Special Relativity” [43]. Einstein found the resolution to this conundrum by adopting a geometrical picture that generalized Minkowski’s description of special relativity to allow for spacetime curvature.

2.1 Special Relativity

Minkowski showed that phenomena such as time dilation and length contraction followed naturally as a consequence of space and time being combined into a single spacetime geometry with distances measured by the invariant interval
$$\begin{aligned} ds^2 = -c^2 d t^2 + dx^2 + dy^2 + dz^2 \, . \end{aligned}$$
The \(t=\mathrm{const.}\) spatial section of this geometry is ordinary three dimensional Euclidean space, which is invariant under translations and rotations, and can be described by the special Euclidean group \(\mathrm{E}(3) = \mathrm{SO}(3) \times \mathrm{T}(3)\). The full Minkowski spacetime is invariant under the Poincaré group, which is made up of translations in time and space T(1, 3), rotations in time and space \(\mathrm{SO}(1,3)\), otherwise known as the Lorentz group. The Lorentz group includes regular spatial rotations \(\mathrm{SO}(3)\), and boosts, which can be thought of as a hyperbolic rotation in a plane that includes a time-like direction.
Fig. 1

A spacetime diagram shown in the the rest frame of observer \(\mathcal{O}\). Observer \(\mathcal{O}'\) is moving at velocity v with respect to \(\mathcal{O}\) in the x direction

The key results of special relativity can be derived by considering motion that is restricted to the \((1+1)\) dimensional sub-manifold spanned by coordinates (tx) with invariant interval \(ds^2 = -c^2 d t^2 + dx^2\). Rotations in this two-dimensional Minkowski space leave fixed hyperbolae, \(x^2 - c^2 t^2 = \pm a^2\), just as rotations in two dimensional Euclidean space leave fixed circles, \(x^2 + y^2 = a^2\). The coordinates (tx) define a frame of reference \(\mathcal{O}\). An observer at rest in this coordinate system will follow the trajectory \(x=\mathrm{const.}\): in other words, a line parallel to the t-axis. A particle moving at velocity v in the positive x-direction will follow the trajectory (worldline) \(x = v t + \mathrm{const.}\) We can perform a boost to a new reference frame \(\mathcal{O}'\), with coordinates \((t',x')\), where the particle is a rest: \(x' = \mathrm{const.}\) This implies that the two coordinate systems are related: \(x' = \gamma (x-vt)\), where \(\gamma \) is a constant. Objects at rest in frame \(\mathcal{O}\) will be moving at velocity \(-v\) in the \(x'\) direction in frame \(\mathcal{O}'\), so it follows that \(x = \gamma (x'+vt')\). Solving for \(t'\) we find \(t'=\gamma (t + (1-\gamma ^2)/(v\gamma ^2) x)\). Invariance of the interval \({x'}^2 - c^2 {t'}^2 = x^2 - c^2 t^2\) fixes the constant to be \(\gamma = 1/\sqrt{1-\beta ^2}\), where \(\beta =v/c\). Thus we have derived the following coordinate transformation for a boost with velocity v in the positive x direction:
$$\begin{aligned} ct'= & {} \gamma (c t - \beta x) \nonumber \\ x'= & {} \gamma (x-vt) \, . \end{aligned}$$
The transformation can be viewed as a hyperbolic rotation with \(\cosh \eta = \gamma \) and \(\sinh \eta = \beta \gamma \), were the “angle” \(\eta = \mathrm{arctanh}\, \beta \) is called the rapidity. Lines of simultaneity in \(\mathcal{O}'\) have \(t'=0\), and lie parallel to the line \(x = ct/\beta \) in frame \(\mathcal{O}\). Figure 1 displays a spacetime diagram illustrating the relationship between the two reference frames.
Classic results such as time dilation and length contraction follow directly from the spacetime geometry of Minkowski space. For example, consider two events \(\mathrm{A,B}\) along the worldline of observer \(\mathcal{O}'\). The proper time elapsed as measured by a clock carried by observer \(\mathcal{O}'\) is \(T'=\varDelta t'\), while the time elapsed as measured by a clock carried by observer \(\mathcal{O}\) is \(T=\varDelta t\). The invariant interval is
$$\begin{aligned} \varDelta s_\mathrm{AB} = -(c\varDelta t')^2 = -(c\varDelta t)^2 + \varDelta x^2 \, . \end{aligned}$$
Since \(\varDelta x = v \varDelta t\) we find that
$$\begin{aligned} T = \gamma T' \, , \end{aligned}$$
so that the time elapsed in the static frame is greater than the time elapsed in the moving frame. Next consider a rod of proper length \(L'\) moving at velocity v relative to observer \(\mathcal{O}\). At a given instant, the ends of the rod at at events \(\mathrm{D,F}\) in frame \(\mathcal{O}\) and at events \(\mathrm{D,E}\) in frame \(\mathcal{O}'\). Thus the length of the rod in frame \(\mathcal{O}\) is \(L = \varDelta s_\mathrm{DF}\), while the length of the rod in frame \(\mathcal{O}\) is \(L' = \varDelta s_\mathrm{DE}\). Using the invariance of the interval we have
$$\begin{aligned} \varDelta s_\mathrm{DF}^2= & {} L^2 =\varDelta x^2 = \varDelta x'^2 -(c\varDelta t')^2 \nonumber \\ \varDelta s_\mathrm{DF}^2= & {} L'^2 = \varDelta x'^2 = ( \varDelta x +v \varDelta t)^2 -(c\varDelta t)^2 \, . \end{aligned}$$
Incorporating the time dilation found earlier, \(\varDelta t = \gamma \varDelta t'\), we find the lengths to be related:
$$\begin{aligned} L = L' /\gamma \, , \end{aligned}$$
so that the rod appears shorter in the static frame. The spacetime diagram makes it clear that this discrepancy is due to the two frames having different lines of simultaneity (Fig. 2).
Fig. 2

Spacetime diagrams illustrating time dilation (left) and length contraction (right)

2.2 The Equivalence Principle

Einstein, like Newton before him, was struck by the equivalence of the inertial mass that appears in the relation between force and acceleration \(\mathbf{F} = m_I \mathbf{a}\) and the gravitational charge, or mass, that appears in Newton’s gravitational force law \(\mathbf{F}_G= - G m_G M \mathbf{r}/r^3\). He also noted that inertial frames of reference play a special role in both Newtonian mechanics and special relativity. In Newtonian mechanics, objects in a non-inertial frame that is uniformly accelerating and rotating experience “pseudo-forces” of the form
$$\begin{aligned} \mathbf{F}_\mathrm{P} = - m_I \mathbf{a} - 2 m_I \omega \times \mathbf {v} - m_I \varvec{\omega } \times (\varvec{\omega } \times \mathbf{x}) \, . \end{aligned}$$
Since these forces are a coordinate effect, they must scale with the inertial mass. The first term in the above expression is referred to as a rectilinear force, the second term is called the Coriolis force, while the third term is called the centrifugal force. The “pseudo” moniker is perhaps a little misplaced—hurricanes that get stirred up by the Coriolis force due to the rotation of the Earth are real enough. Einstein suggested that the rectilinear term be identified with a uniform gravitational field. “I was sitting in a chair in the patent office in Bern when all of a sudden a thought occurred to me: if a person falls freely he will not feel his own weight. I was startled. This simple thought made a deep impression upon me. It impelled me towards a theory of gravitation” [43]. In other words, the acceleration \(g \simeq 9.8\,\mathrm{m}\,\mathrm{s}^{-2}\) that we experience while sitting or standing on the surface of the Earth is due to us being in a non-inertial frame of reference. Take away the ground, say by jumping into a mineshaft, and the “force of gravity” goes away. The equivalence of inertial and gravitational mass follows naturally from the equivalence of uniform gravitational fields and uniform accelerations.
Fig. 3

Path of light as seen in a uniformly accelerated reference frame. In an inertial frame the light follows the straight-line dotted path, while in the accelerated frame of the rocket the path appears to follow a curved path shown here as a solid line

Einstein set out to incorporate this insight into a modification of special relativity that could account for gravitational effects. The connection to coordinate transformations suggested a geometrical approach, which caused Einstein to pay more attention to Minkowski’s geometrical formulation of special relativity. Einstein began by showing that the path of light seen by a uniformly accelerated observer could be interpreted in terms of spacetime geometry where the speed of light depends on position.

Consider the picture in Fig. 3 where a rocket accelerates from rest with uniform acceleration a in the positive z direction, and a photon traveling in the positive x direction enters a window in the rocket at time t. The photon follows the path \(x=ct, z=0\), shown as a horizontal dotted line, in the inertial frame where the rocket was originally at rest. A first the velocity of the rocket \(v_z= at\) is much less than the speed of light, and the coordinates in the two frames are related such that \(t'=t\), \(x'=x\) and \(z'= z - \frac{1}{2} a t^2\). Thus, in the non-inertial frame of the rocket, the photon follows the parabola \(z'= - a x'^2/ (2 c^2)\). Einstein showed that this “bending of light” could be derived from the line element for a uniformly accelerated observer, which to leading order in a has the form [18]
$$\begin{aligned} ds^2 = -c^2 \left( 1 +\frac{2 a z'}{c^2}\right) d t'^2 + dx'^2 + dy'^2 + dz'^2 \, . \end{aligned}$$
To confirm that light paths in this geometry are indeed parabolas, we need to derive the geodesic equation, which describes the shortest/straightest paths in spacetime.
Using the short-hand notation \(\mathbf {x} \rightarrow \{ x^\mu \} = \{c t, x, y, z\}\) and \(ds^2 = g_{\mu \nu }(\mathbf {x}) dx^\mu dx^\nu \), with summation implied on repeated indices, the path length between events A, B, is given by
$$\begin{aligned} S = \int _A^B \sqrt{-ds^2} = \int _{\lambda _A}^{\lambda _B} \left( - g_{\mu \nu } \frac{ d x^\mu }{d\lambda } \frac{ d x^\nu }{d\lambda } \right) ^{1/2} d\lambda \equiv \int _{\lambda _A}^{\lambda _B} L\left( x^\mu , \frac{d x^\mu }{d\lambda }\right) d\lambda . \end{aligned}$$
Holding the end points fixed and extremizing the path length yields the Euler–Lagrange equations
$$\begin{aligned} \frac{d}{d \lambda } \left( \frac{\partial L}{\partial (d x^\alpha /d\lambda )} \right) = \frac{\partial L}{\partial x^\alpha } \end{aligned}$$
which evaluate to
$$\begin{aligned} \frac{d ^2 x^\alpha }{d\lambda ^2} = -\frac{1}{2} g^{\alpha \beta } ( g_{\beta \mu ,\nu } + g_{\beta \nu ,\mu } - g_{\mu \nu ,\beta }) \frac{d x^\mu }{d\lambda } \frac{d x^\nu }{d\lambda } \equiv - \varGamma ^\alpha _{\mu \nu } \frac{d x^\nu }{d\lambda } \frac{d x^\nu }{d\lambda } \, . \end{aligned}$$
Here commas denote partial derivatives \(h_{, \mu } = \partial h/\partial x^\mu \), and the collection of metric derivatives appearing on the right-hand side of the above equation are referred to as the Christoffel symbol \(\varGamma ^\alpha _{\mu \nu }\). Note that \(g^{\alpha \beta }\) denotes the components of the inverse metric tensor, so that \(g^{\alpha \beta } g_{\alpha \kappa } = \delta ^\alpha _\kappa \). The geodesic equation can be simplified by introducing the notation \(u^\alpha = dx^\alpha /d\lambda \) for the 4-velocity and \(\nabla _\beta u^\alpha = u^\alpha _{\; , \beta } + u^\nu \, \varGamma ^\alpha _{\beta \nu }\) for the covariant derivative. With these definitions we have \(d/d\lambda = u^\alpha \nabla _\alpha \), and the geodesic equation takes the more compact form
$$\begin{aligned} u^\beta \nabla _\beta u^\alpha = 0 \, . \end{aligned}$$
Returning to the metric for a uniformly accelerated observer, we find \(\varGamma ^z_{tt} = a\) and all others zero. For a photon with initial velocity \(\mathbf {U} \rightarrow (1, c, 0, 0)\) the geodesic equations integrate to give \(t'= \lambda \), \(x'=c t'\) and \(z' = -\frac{1}{2} a t'^2\). This confirms that photons do indeed follow parabolic paths in the \(x'-z'\) spacetime with line element given in Eq. (8). The result can be generalized to describe uniform acceleration in any direction and uniform rotation about any axis. The line element for this non-inertial frame is given by (dropping the primes to simplify the notation)
$$\begin{aligned} ds^2 = -((c+\mathbf{a}\cdot \mathbf{x})^2 - (\varvec{\omega } \times \mathbf{x})^2) dt^2 + 2 c (\varvec{\omega } \times \mathbf{x})_i dx^i dt +dx^2 +dy^2 + dz^2\, . \end{aligned}$$
The convention being used here is that Roman indices run over spatial coordinates, while Greek indices run over time and space coordinates. To leading order in \(\mathbf{a}\) and \(\varvec{\omega }\) the non-vanishing Christoffel symbols are:
$$\begin{aligned} \varGamma _ {tt}^i\simeq & {} -g_{tt,i} = a^i + (\varvec{\omega } \times (\varvec{\omega }\times \mathbf{x}))^i \nonumber \\ \varGamma ^i_{tj}\simeq & {} \frac{1}{2} ( g_{ti,j}- g_{tj,i}) = -c\, \epsilon _{ijk} \omega ^k \, . \end{aligned}$$
and the geodesic equations yield
$$\begin{aligned} \frac{d^2 \mathbf{x}}{dt^2} = - \mathbf{a} - 2\, \omega \times \mathbf {v} - \varvec{\omega } \times (\varvec{\omega } \times \mathbf{x}) \, , \end{aligned}$$
which recovers the form of the acceleration attributed to pseudo forces in a non-inertial frame. Turning this around, it is always possible to find a coordinate transformation to an inertial frame where the pseudo forces vanish. But this does not mean that we can simply transform gravity away.

2.3 Tides and Curvature

Einstein’s trick for making gravity vanish only works in a small region of spacetime. It is impossible to remove the tidal forces that manifest over larger regions.
Fig. 4

Tidal forces in an Earth bound laboratory

Suppose we do an experiment in a laboratory on the surface of the Earth as shown in Fig. 4. We can set up a coordinate system where the z axis points in the outward radial direction at the center of the lab, and the xy directions span the floor of the lab. Now suppose that we release two masses from near the ceiling of the lab, the first with position vector relative to the center of the Earth given by \(\mathbf{r}_1 = \mathbf{R}+\mathbf{x}_1\), where \(\mathbf{R}\) is a vector connecting the center of the Earth to the floor of the lab, and the second with \(\mathbf{r}_2 = \mathbf{R}+\mathbf{x}_2\). Initially the distance between the two masses is \(L=|\varDelta x| = |\mathbf{x}_2 - \mathbf{x}_1| = y_2-y_1\). To leading order in \(L/ R_{\oplus }\) Newton’s theory of gravity predicts a constant acceleration in the \(-z\) direction:
$$\begin{aligned} \frac{ d^2 \mathbf{x_{1,2}}}{dt^2} = -\frac{G M_{\oplus }}{R_{\oplus }^2} \hat{z} = - g \hat{z} \, . \end{aligned}$$
This part of the gravitational field can be transformed away by adopting a freely falling reference frame. Continuing the expansion of the Newtonian equations of motion to next order we encounter tidal forces in the y direction:
$$\begin{aligned} \frac{d^2{\varvec{\Delta }}{} \mathbf{x}}{dt^2} = -\frac{G M_{\oplus } L }{R_{\oplus }^3} \hat{y} \, . \end{aligned}$$
These non-uniform accelerations cannot be transformed away. Similarly, if one were to consider the motion of a single mass over an extended period of the time the local value of the acceleration g would change with time, and this change in acceleration can not be transformed away. In summary, the effects of gravity can only be transformed away across small regions of space for a short period of time.
Looking at the geodesic equation (11), we see that transforming away local acceleration terms is equivalent to setting the first derivatives of the metric equal to zero. It turns out that we always have enough coordinate freedom to set the components of the metric in the neighborhood of an event equal to the Minkowski metric \(\eta _{\mu \nu } = \mathrm{diag}(-1,1,1,1)\), and to set the first derivatives equal to zero. However, there is not enough coordinate freedom to remove the second and higher derivatives. As shown by Riemann, the second derivatives of the metric describe the curvature of the spacetime. The components of the Riemann curvature tensor are given by
$$\begin{aligned} R^\kappa _{\mu \lambda \nu } = \varGamma ^{\kappa }_{\mu \lambda ,\nu } - \varGamma ^{\kappa }_{\mu \nu ,\lambda } + \varGamma ^\alpha _{\mu \nu } \varGamma ^\kappa _{\alpha \lambda } - \varGamma ^\alpha _{\mu \lambda } \varGamma ^\kappa _{\alpha \nu } \, . \end{aligned}$$
The Christoffel symbols \(\varGamma ^\alpha _{\mu \nu }\), defined in Eq. (11), involve first derivatives of the metric and can be made to vanish at any point in spacetime, however their derivatives can not. In local free fall coordinates the metric components take the form
$$\begin{aligned} g_{\mu \nu } = \eta _{\mu \nu } - \frac{1}{3}R_{\mu \alpha \nu \beta } \varDelta x^\alpha \varDelta x^\beta + \cdots \end{aligned}$$
This is nothing other than a Taylor series expansion about a point P, where the coordinates have been chosen such that \(g_{\mu \nu ,\lambda }|_P = 0\). These are variously refers to as free fall, locally Lorentzian or Riemann normal coordinates. These coordinates can be extended along the worldline of a particle to yield a locally non-rotating inertial frame defined by a set of four orthogonal basis vectors \(\mathbf{e}_{(\gamma )}\). Here \(\gamma \) labels the basis vectors and should not be confused with the components of the basis vector which are labelled by a superscript: \(e^\alpha _{(\gamma )}\). The locally non-rotating frame is carried along the worldline of the particle by Fermi–Walker transport:
$$\begin{aligned} u^\beta \nabla _\beta e^\alpha _{(k)} = e^\alpha _{(t)} g_{\mu \nu } e^{\mu }_{(k)} u^\beta \nabla _\beta e^\nu _{(t)} \, . \end{aligned}$$
Fermi–walker transport keeps the basis vector \(\mathbf{e}_{(t)}\) tangent to the worldline, as shown in Fig. 5.
While Fermi–Walker transport can eliminate “pseudo forces” along the worldline of a single particle, spacetime curvature prevents us from extending this inertial coordinate system globally. Spacetime curvature manifest as a tidal force that causes initially parallel geodesics to converge, diverge or twist about one another. Considering the geodesic equation for two nearby geodesics with separation vector \(\varvec{\xi }\). We find that spacetime curvature causes a relative acceleration:
$$\begin{aligned} \frac{d^2 \xi ^\mu }{d\lambda ^2} = u^\alpha \nabla _\alpha (u^\beta \nabla _\beta \xi ^\mu ) = R^\mu _{\alpha \beta \gamma } u^\alpha u^\beta \xi ^\gamma \, . \end{aligned}$$
In the slow motion limit where \(u^t \gg u^i\) the equation for geodesic deviation becomes
$$\begin{aligned} \frac{ d^2 \xi ^i}{dt^2} = c^2 R^i_{0 j 0}\, \xi ^j = R^i_{t j t}\, \xi ^j\, . \end{aligned}$$
Comparing with the expression (17), the tidal gravitational field of the Earth implies a spacetime curvature with \(R^y_{tyt} = -G M_{\oplus }/R_{\oplus }^3\). Shortly we will see that this exactly what is predicted by Einstein’s general theory of relativity.
Fig. 5

Fermi–Walker transport of a locally inertial coordinate system along the worldline of a particle \(x^\mu (\tau )\) parametrized by proper time \(\tau \)

2.4 Newtonian Gravity in Geometric Form

Before moving to the full Einstein equations, it is interesting to cast Newtonian gravity in geometrical terms. Consider a spacetime with line element
$$\begin{aligned} ds^2 = -(c^2 + 2 \varPhi (\mathbf{x})) dt^2 + dx^2 + dy^2 + dz^2 \, , \end{aligned}$$
where \(\varPhi \ll c^2\) is the Newtonian gravitational potential derived from Poisson’s equation \(\nabla ^2 \varPhi = 4 \pi G \rho \), where \(\rho \) is the mass density. The non-vanishing components of the Christoffel symbol are:
$$\begin{aligned} \varGamma ^i_{tt} = \varPhi _{,i} \quad \quad \varGamma ^t_{ti} = \varGamma ^t_{it} = \frac{\varPhi _{,i}}{(c^2 + 2 \varPhi (\mathbf{x}))} \, . \end{aligned}$$
The geodesic equation reduces to the usual Newtonian force law
$$\begin{aligned} \frac{d^2 \mathbf{x}}{dt^2} = -\nabla \varPhi \, , \end{aligned}$$
and the non-vanishing components of the Riemann tensor are given by
$$\begin{aligned} R^i_{tjt} = -\varGamma ^i_{tt, j} = \varPhi _{,ij} \end{aligned}$$
Returning to the Earth bound lab example, we have \(\varPhi = G M_\oplus /r\), \(r^2=R_\oplus ^2+y^2\) and
$$\begin{aligned} \varPhi _{,yy} = \frac{2G M_\oplus }{r^3} (r_{,y})^2- \frac{G M_\oplus }{r^2} (r_{,yy}) \approx -\frac{G M_{\oplus }}{R_{\oplus }^3} \, , \end{aligned}$$
which recovers our earlier result. We see that in the Newtonian limit the Riemann tensor is nothing other than the tidal tensor of the Newtonian potential. The metric (23) accurately describes all aspects of Newtonian gravity, and corresponds to the the weak field and slow motion approximation to general relativity. It cannot, therefore, be used to model relativistic effects such as the deflection of starlight passing near the Sun (the prediction using (23) turns out to off by a factor of two).

2.5 Einstein Equations

The geometric treatment of Newtonian gravity has shown that curvature is related to the tidal field, and that in the weak field limit
$$\begin{aligned} R^j_{tjt} = g^{ij} \varPhi _{,ij} = \nabla ^2 \varPhi = 4 \pi G \rho . \end{aligned}$$
The quantity on the right-hand side of this equation is the tt component of the Ricci tensor \(R_{\mu \nu }=R^\kappa _{\mu \kappa \nu }\), while the left hand side is proportional to the tt component of the stress-energy-momentum tensor \(T_{\mu \nu }\). We seek to generalize (28) using terms that involve at most second derivatives of the metric. Moreover, we seek a coordinate invariant expression rather than one that singles out specific components of a tensor. The most general expression of this form is
$$\begin{aligned} \mathbf{R} + \alpha R \mathbf{g} + \varLambda \mathbf{g} = \kappa \mathbf{T} \end{aligned}$$
where \(\alpha \), \(\varLambda \) and \(\kappa \) are constants, \(R= \mathrm{trace}(\mathbf{R})\) is the Ricci scalar and the bold-faced letters denote the Ricci, metric and energy-momentum tensors in coordinate-free notation. Conservation of energy-momentum \(\nabla \cdot \mathbf{T}=0\) requires that \(\alpha =-1/2\), and recovery of the Newtonian limit (28) fixes \(\kappa = 8 \pi G/c^4\). The Einstein equations in component form are then [19]
$$\begin{aligned} R_{\mu \nu } - \frac{1}{2} g_{\mu \nu } R + \varLambda g_{\mu \nu } = \frac{8 \pi G}{c^4} T_{\mu \nu } \, . \end{aligned}$$
The quantity \(\varLambda \) is the cosmological constant. A remarkable feature of Einstein’s theory is that the field equations can be used to derive the equations of motion of objects in spacetime [20, 21]. The derivation involves solving Einstein’s equations in the limit of a small concentration of mass moving in the geometry generated by a larger concentration of mass [44]. Work continues on this problem today as part of the self-force program, with the goal of deriving waveforms describing small compact objects spiraling into massive black holes—otherwise known as Extreme Mass Ratio Inspires, or EMRIs [53]. The current state of the art yields equations of motion that are valid to first order in the mass ratio of the two bodies [25]:
$$\begin{aligned} u^\mu \nabla _\mu u^\nu = \frac{1}{2 M} R_{\alpha \beta \gamma }^{\;\;\;\; \; \nu }S^{\alpha \beta } u^\gamma - (g^{\nu \kappa } +u^\nu u^\kappa )\left( \nabla _\alpha h_{\kappa \gamma }^\mathrm{tail} -\frac{1}{2} \nabla _\kappa h_{\gamma \alpha }^\mathrm{tail}\right) u^\gamma u^\alpha \,. \end{aligned}$$
Here \(S^{\alpha \beta }\) is the spin tensor for the small body, and \(h_{\kappa \gamma }^\mathrm{tail}\) are the “tail terms” of the gravitational waves produced by the motion of the small body. The tail terms are proportional to the mass of the smaller body, and arise from the failure of Huygens principle in curved spacetime. The tail terms depend on the entire past history of the motion. To lowest order, ignoring the mass and spin of the small body, we see that small objects follow geodesics of the spacetime. The spin-curvature terms are named after Papapetrou and Dixon (though they were not the first to discover them). The self-force terms are called the MiSaTaQuWa equations, and were primarily derived by Mino et al. [39], Quinn and Wald [46].

2.6 Black Holes

The Einstein equations (30) applied to a fully general spacetime metric represent ten coupled, non-linear partial differential equations. Solving such equations is extremely challenging, even by numerical means. The equations become more tractable when applied to spacetimes with a high degree of symmetry. One of the earliest exact solutions to Einstein’s equations was found by Schwarzschild for the case of a vacuum static, spherically symmetric spacetime with line element
$$\begin{aligned} ds^2 = -U(r) c^2 dt^2 + V(r) dr^2 +r^2 d\theta ^2 + r^2 \sin ^2\theta d\phi ^2 \, . \end{aligned}$$
The tt and rr components of the vacuum Einstein equations become
$$\begin{aligned} \frac{V'}{rV^2} +\frac{1}{r^2} \left( 1 - \frac{1}{V}\right)= & {} 0 \end{aligned}$$
$$\begin{aligned} \frac{U'}{r U} -\frac{V}{r^2} \left( 1 - \frac{1}{V}\right)= & {} 0 \end{aligned}$$
where \(A' = d A/dr\). Equation (33) implies that \(V^{-1} = (1- R_s/r)\) where \(R_s\) is called the Schwarzschild radius. Combining rV times Eq. (33) with r times Eq. (34) yields \((\ln (UV))'=0\), so that \(U=V^{-1}\) (up to an arbitrary constant that be absorbed by rescaling t). To recover the Newtonian limit (23) for large r we have to set the Schwarzschild radius proportional to the mass of the central object: \(R_s= 2GM/c^2\), yielding the Schwarzschild line element
$$\begin{aligned} ds^2 = -\left( 1 - \frac{2 G M}{ c^2 r}\right) c^2 dt^2 + \frac{dr^2}{\left( 1 - \frac{2 G M}{ c^2 r}\right) } +r^2 d\theta ^2 + r^2 \sin ^2\theta d\phi ^2 \, . \end{aligned}$$
For an object such as the Sun, the Schwarzschild radius \(R_{\odot ,s} = 2.95\) km would lie deep within the interior of the Sun, where the vacuum solution is no longer valid.
For a pure vacuum spacetime, the metric (35) is singular at the Schwarzschild radius \(r= R_s\) and at \(r=0\). However, coordinate singularities are not physical, and the spacetime can look quite different in alternative coordinate systems. Note that general coordinate transformation have the form
$$\begin{aligned} x^{\bar{\mu }} = \varLambda ^{\bar{\mu }}_{\nu } \, x^\nu = \frac{\partial x^{\bar{\mu }}}{\partial x^{\nu }} x^\nu \end{aligned}$$
with tensor components transforming as
$$\begin{aligned} A^{\bar{\mu }}_{\bar{\nu }} = \varLambda ^{\bar{\mu }}_{\alpha }\, \varLambda ^{\beta }_{\bar{\nu }} \, A^\alpha _\beta \end{aligned}$$
and similarly for higher rank tensors. The coordinate transformation \(r = r'(1+ R_s/(4r'))^2\) yields the metric in isotropic form where the singularity at \(r=R_s\) moves to a singularity at \(r'= 4R_s\):
$$\begin{aligned} ds^2 = -\left( \frac{1 - \frac{R_s}{4r'}}{1 - \frac{R_s}{4r'}}\right) ^2 c^2 dt^2 +\left( 1 + \frac{R_s}{4r'}\right) ^4 \left[ d{r'}^2 +{r'}^2 d\theta ^2 + {r'}^2 \sin ^2\theta d\phi ^2\right] \, . \end{aligned}$$
Going a step further, the coordinate transformation \(r = (3(R-cT)/2)^{2/3} R_s^{1/3}\) yields the Lemaître form of the metric which is only singular at \(r=0\):
$$\begin{aligned} ds^2 = - c^2 dT^2 + \frac{R_s}{r} dR^2 + r^2 d\theta ^2 + r^2 \sin ^2\theta d\phi ^2 \,. \end{aligned}$$
The nature of the surface at \(r=R_s\) was not understood for many decades, and even today debate continues about what happens to quantum fields and superstrings in this spacetime leading to the suggestion of exotic phenomena such as “firewalls” or “fuzzballs” [38]. What is now understood at least for classical (non-quantum) objects is that geodesics can be continued through the Schwarzschild surface, and the tidal forces are finite there. One way to see this is to compute the components of the curvature tensor using an orthonormal coordinate system, denoted by hats, \(\hat{\mu }\), where locally \(g_{\hat{\mu } \hat{\nu }} = \eta _{\mu \nu }\). The non-vanish components of the Riemann tensor are then
$$\begin{aligned} R_{\hat{r}\hat{t}\hat{r}\hat{t}} = -R_{\hat{\theta }\hat{\phi }\hat{\theta }\hat{\phi }} = 2 R_{\hat{\theta }\hat{t}\hat{\theta }\hat{t}} = 2 R_{\hat{\phi }\hat{t}\hat{\phi }\hat{t}} = -2 R_{\hat{r}\hat{\phi }\hat{r}\hat{\phi }} = -2 R_{\hat{r}\hat{\theta }\hat{r}\hat{\theta }} = \frac{R_s}{r^3} \, . \end{aligned}$$
We see that the curvature is finite at \(r=R_s\) and divergent at \(r=0\). While non-singular, the Schwarzschild surface does have many special properties. For example, the redshift of a photon sent from a static source at radius r to a distant observer is given by
$$\begin{aligned} z(r) = \frac{1}{\sqrt{1 - \frac{R_s}{r}}} - 1 \, . \end{aligned}$$
We see that \(r=R_s\) defines a surface of infinite redshift. Moreover, the force required to stay at a fix radius r also diverges at \(r=R_s\):
$$\begin{aligned} F^{\hat{r}} = \frac{m R_s}{2 \sqrt{1 - \frac{R_s}{r}}} \, . \end{aligned}$$
Further insight can be gained by considering photon geodesics, which reveal the causal structure of the spacetime. In particular, radial null geodesics define the past and future light cones, as all other geodesics (null or timeline) lie within these cones. The condition \(\mathbf{u} \cdot \mathbf{u}=0\) for radial null geodesics yields the relation
$$\begin{aligned} \frac{dr }{d t} = \pm c \left( 1 - \frac{R_s}{r}\right) \, . \end{aligned}$$
Introducing the Regge–Wheeler radial coordinate \(r_* = r + R_s \ln |r/R_s-1|\) we find \(d r_* / dt = \pm 1\), so the light cones are given by \(t\pm r_* = \mathrm{const.}\) Adopting the new coordinate \(v = c t+r_*\), which is constant along ingoing null geodesics, we arrive at the Eddington–Finkelstein form of the metric
$$\begin{aligned} ds^2 = -\left( 1 - \frac{R_s}{r}\right) dv^2 + 2 dv dr + r^2 d\theta ^2 + r^2 \sin ^2\theta d\phi ^2 \, . \end{aligned}$$
Ingoing null geodesics have \(c t-r_* = \mathrm {const.}\), which in the Eddington–Finkelstein becomes \(v = 2 r + 2R_s \ln |r/R_s-1| + \mathrm {const}.\) Introducing the new time coordinate \(t_* = v-r\) so that ingoing null geodesics are straight lines at \(-45^\circ \) to the r axis, we can plot the inward and outward null geodesics as shown in Fig. 6. We see that the ingoing radial null geodesics cross the event horizon and terminate on the central singularity. Outgoing null geodesics that outside \(r=R_s\) continue to travel outward in r, while those inside the Schwarzschild radius are trapped, and destined to encounter the singularity at \(r=0\). Since null geodesics define the light cones for all time-like and null paths—geodesics or otherwise—we see that \(r=R_s\) defines a trapped surface—known as the event horizon—from which there is no escape.
Fig. 6

Ingoing (black) and outgoing (blue) null geodesics in the Schwarzschild spacetime. All geodesics (null and otherwise) inside \(r=R_s\) are unable to reach \(r > R_s\), and are destined to encounter the curvature singularity at \(r=0\) (shown in red)

Generic geodesics of the Schwarzschild spacetime can be derived by integrating the geodesics equations, but it is simpler to make use of the symmetries of the spacetime which imply the existence of several conserved quantities. The metric is invariant along the integral curves of the four Killing vectors \(\varvec{\xi }_1 = \varvec{\partial }_t\), \(\varvec{\xi }_2 = \varvec{\partial }_\phi \), \(\varvec{\xi }_3 = \sin \phi \varvec{\partial }_\theta + \cot \theta \cos \phi \varvec{\partial }_\phi \) and \(\varvec{\xi }_4 = \cos \phi \varvec{\partial }_\theta - \cot \theta \sin \phi \varvec{\partial }_\phi \). The symmetries tell us that \(E=-\varvec{\xi }_1\cdot \mathbf{p}\), \(L_z = \varvec{\xi }_2\cdot \mathbf{p}\), \(L_3 = \varvec{\xi }_3\cdot \mathbf{p}\) and \(L_4 = \varvec{\xi }_4\cdot \mathbf{p}\) are conserved quantities, where \(\mathbf{p}\) is the four momentum. The four momentum also satisfies the normalization condition \(\mathbf{p} \cdot \mathbf{p}= - m^2 c^4\) for particles and \(\mathbf{p} \cdot \mathbf{p}= 0\) for photons. We can use the rotational symmetry to place all orbits in the equatorial plane \(\theta = \pi /2\), with \(L_3=L_4=0\). The normalization condition can then be expressed as
$$\begin{aligned} \frac{\tilde{E}^2 -1}{2}= & {} \frac{1}{2} \left( \frac{ dr }{d\tau }\right) ^2 + \tilde{V}(r) \quad (\mathrm{particles}) \nonumber \\ E^2= & {} \left( \frac{ dr }{d\lambda }\right) ^2 + V(r) \quad (\mathrm{photons}) \end{aligned}$$
$$\begin{aligned} \tilde{V}(r)= & {} - \frac{M}{r} + \frac{\tilde{L}_z^2}{2 r^2} - \frac{M \tilde{L}_z^2}{r^3}\, , \nonumber \\ V(r)= & {} \frac{L^2}{r^2} \left( 1 - \frac{2 M}{r} \right) \, . \end{aligned}$$
play the role of effective potentials. Here \(\tilde{E} = E/m\) and \(\tilde{L}_z /m\) are the energy and angular momentum per unit mass. The expression for the radial velocity of a particle is identical to its Newtonian counterpart aside from the term \(-M \tilde{L}_z^2/r^3\). This term has important consequences for tightly bound orbits, allowing trajectories with \(\tilde{L}_z \ne 0\) to reach \(r=0\), and giving rise to a collection of unstable circular orbits at \(r_{u} = (\tilde{L}_z^2/R_s)(1-\sqrt{1- 3 R_s^2/ L_z^2})\). There is an inner most (marginally) stable circular orbit (ISCO) at \(r = 3 R_s\) that plays an important role in understanding accretion discs around black holes and the gravitational waveforms of merging black holes. Photon orbits can be described in an analogous way, though there are no bound orbits other than an isolated unstable circular orbit at \(r= 3R_s/2\) (Fig. 7).
Fig. 7

Examples of particle geodesics in the Schwarzschild spacetime. The radial velocity can be read off from the difference in the height of the effective potential (shaded in grey) and the energy (horizontal lines)

The Schwarzschild solution was generalized by Kerr to describe the asymptotically flat, stationary, axisymmetric spacetime that we interpret as a rotating black hole. The derivation is much more difficult, and the resulting spacetime has a much richer phenomenology. The metric in Boyer–Lindquist coordinates has the line element
$$\begin{aligned} ds^2 = \frac{-\varDelta }{\varSigma } \left( c dt - a \sin ^2 \theta d\phi \right) ^2 +\frac{\sin ^2\theta }{\varSigma }\left( (r^2+a^2) d\phi - a \,c dt\right) ^2 + \frac{\varSigma }{\varDelta } dr^2 +\varSigma d\theta ^2 \end{aligned}$$
$$\begin{aligned} \varSigma= & {} r^2 + a^2 \cos ^2 \theta , \nonumber \\ \varDelta= & {} r^2 - R_s r + a^2\, , \end{aligned}$$
and \(a = S/(c M)\), where S is the spin angular momentum of the black hole and M the mass. The metric has coordinate singularities at \(r_\pm = R_s/2 \pm \sqrt{R_s^2/4-a^2}\). The surface at \(r=r+\) is identified as the event horizon as no trajectories from inside \(r_+\) can cross the surface. Additionally there is a static-limit surface with \(r_\mathrm{SL} = R_s/2+ \sqrt{R_s^2/4-a^2\cos ^2\theta }\) interior to which it is impossible to stay fixed with respect to the distant stars—everything gets swept around in the swirling vortex of the black hole. Energy can be extracted from the black hole by scattering radiation or particles within the so-called ergo-region between \(r_+\) and \(r_\mathrm{SL}\). The spacetime has a curvature singularity along the ring \(r=0\), \(\theta =\pi /2\). The key feature of a rotating black hole are illustrated in Fig. 8.
Fig. 8

Cross section showing the key features of a Kerr black hole

Geodesics in the Kerr geometry are fully specified by four constants of the motion that arise from the symmetries of the spacetime. The constant are the energy \(E = -p_t\), the azimuthal angular momentum \(L_z = p_\phi \), the mass \(m^2 = -\mathbf{p} \cdot \mathbf{p}/c^4\) and the Carter constant \(C= p^2_\theta +p_\phi /\sin ^2\theta \). The existence of the latter was something of a surprise, and its existence guarantees that the orbits are regular (non-chaotic). Closed orbits in Kerr can be highly non-Keplerian, exhibiting rich structure due to the presence of three distinct orbital frequencies associated with the radial, azimuthal and equatorial motion [36]. Examples of some closed periodic orbits of Kerr are shown in Fig. 9.
Fig. 9

Examples of the some of the highly non-Keplerian closed periodic orbits of the Kerr geometry

3 Gravitational Wave Theory

Gravitational waves are generated by flows of energy-momentum. When you wave to someone you are generating gravitational waves, though with amplitudes that are far to weak to be detected using existing technologies. The waves we can detect come from violent astrophysical events where large concentrations of mass move at close to the speed on light.

Gravitational waves are often described as “propagating ripples of curvature”, and while this description is apt, deciding which part of the spacetime curvature to call a wave and what part to associate with the background spacetime is a subtle question that was not settled until the late 1960s [29, 30]. Here I will consider the simpler case of gravitational waves in a background Minkowski spacetime, where the waves are wholly responsible for the curvature. The metric and its inverse take the form
$$\begin{aligned} g_{\mu \nu } = \eta _{\mu \nu } + h_{\mu \nu }, \quad \quad g^{\mu \nu } = \eta ^{\mu \nu } - h^{\mu \nu }\, , \end{aligned}$$
where we have assumed that \(|h_{\mu \nu } | \ll 1\), so that \(h^{\mu \nu } = \eta ^{\mu \kappa }\eta ^{\nu \lambda } h_{\kappa \lambda }\). Note that it only makes sense to say that \(|h_{\mu \nu }| \ll 1\) in a coordinate system where \(\varvec{\eta }= \mathrm{diag}(-1,1,1,1)\). For example, we may use cartesian coordinates or more generally, any orthonormal coordinate system. The linearized metric (49) can be used to describe more that just gravitational waves. It can also describe the solar system, which has \(|h_{\mu \nu } | < 10^{-5}\), and the Universe out to \(\sim \)1 Gpc (in an average sense, avoiding regions around black holes).
There are two classes of coordinate transformation that preserve the form of the linearized metric. The first are background Lorentz transformations \(x^{\bar{\mu }} = L^{\bar{\mu }}_{\nu } \, x^\nu \), and the second are infinitesimal coordinate transformations (also called gauge transformations) \(x^{\bar{\mu }} = x^\mu + \zeta ^\mu \) with \(| \zeta ^\mu | \ll 1\). Using (36) and (37) we find
$$\begin{aligned} h_{\bar{\mu }\bar{\nu }} = h_{\mu \nu } -\partial _\mu \zeta _\nu - \partial _\nu \zeta _\mu \, . \end{aligned}$$
It is a straightforward exercise to verify that the Riemann tensor, which has components
$$\begin{aligned} R_{\alpha \beta \mu \nu } = \frac{1}{2}( \partial _\beta \partial _\mu h_{\alpha \nu }+\partial _\alpha \partial _\nu h_{\beta \mu }-\partial _\beta \partial _\nu h_{\alpha \mu }-\partial _\alpha \partial _\mu h_{\beta \nu })\, , \end{aligned}$$
is invariant under these transformations. The linearized Einstein equations take the form
$$\begin{aligned} -\Box \bar{h}_{\mu \nu } +\partial _\nu \partial ^\alpha \bar{h}_{\mu \alpha } +\partial _\mu \partial ^\alpha \bar{h}_{\nu \alpha } -\eta _{\mu \nu }\partial ^\alpha \partial ^\beta \bar{h}_{\alpha \beta }= \frac{16\pi G}{c^4} T_{\mu \nu } \end{aligned}$$
where we have introduced the trace-reversed metric perturbation
$$\begin{aligned} \bar{h}_{\mu \nu } = h_{\mu \nu } - \frac{1}{2} h \, \eta _{\mu \nu }\, , \end{aligned}$$
(so-called because \(\bar{h}=-h\)), and the wave operator
$$\begin{aligned} \Box = \partial ^\alpha \partial _\alpha = -\frac{1}{c^2}\frac{\partial ^2}{\partial t^2} + \nabla ^2 \, . \end{aligned}$$
The linearized Einstein equations can be brought into a simpler form by utilizing some of the gauge freedom we have at our disposal. Under a gauge transformation the divergence of the trace-reverse metric transforms as
$$\begin{aligned} \partial ^{\bar{\alpha }} \bar{h}_{\bar{\mu } \bar{\alpha }} = \partial ^\alpha \bar{h}_{\mu \alpha } - \Box \zeta _\mu \, . \end{aligned}$$
We can choose \(\zeta _\mu \) such that the right hand side of the above equation vanishes, resulting in the Lorentz family of gauges with \(\partial ^{\bar{\alpha }} \bar{h}_{\bar{\mu } \bar{\alpha }} =0\). Note that some gauge freedom remains as we can add to \(\zeta _\mu \) any homogeneous solution of the wave equation \(\Box \lambda _\mu = 0\) and still maintain the divergence free condition. Dropping primes on the coordinate indices, the Einstein equations in the Lorentz gauge become
$$\begin{aligned} \Box \bar{h}_{\mu \nu } = - \frac{16\pi G}{c^4} T_{\mu \nu }\, , \quad \quad \partial ^\alpha \bar{h}_{\mu \alpha } = 0 \, . \end{aligned}$$
These equations are very similar to the Maxwell equations in the Lorentz gauge,
$$\begin{aligned} \Box A^{\mu } = \mu _0 J^{\mu }\, , \quad \quad \partial _\alpha A^{\alpha } = 0 \, , \end{aligned}$$
and be solved using similar techniques (Green’s functions, expansion in orthogonal function—a.k.a. the full Jackson [32]).

3.1 Newtonian Limit Redux

Before continuing with the discussion of gravitational waves, let us pause for a moment and consider the weak-field, slow motion limit of the linearized Einstein equations, which appears as the first step in the post-Newtonian expansion of Einstein’s equations. In the limit where \(\partial _t^2 \ll c^2 \nabla ^2\), \(|T_{tt}| \gg |T_{ti}| \gg |T_{ij}|\) and objects are moving at much less than the speed of light, the linearized field equations reduce to
$$\begin{aligned} \Box \bar{h}_{tt} = - \frac{16\pi G}{c^4} T_{tt} \quad \Rightarrow \quad \nabla ^2 \bar{h}_{tt} = -16 \pi G \rho \, . \end{aligned}$$
Making the identification \(\bar{h}_{tt} = -4 \varPhi \) we have \(\bar{h} = - h = 4 \varPhi \) and
$$\begin{aligned} ds^2 = -(c^2 + 2 \varPhi ) dt^2 +\left( 1 - \frac{2\varPhi }{c^2}\right) (dx^2 + dy^2 + dz^2) \, . \end{aligned}$$
This form for the metric is similar to what we encountered earlier in the geometrical description of Newtonian gravity (23), but includes an additional post-Newtonian term in the spatial metric. The normalization of 4-velocity \(\mathbf{u}\cdot \mathbf{u}=-c^2\) implies that
$$\begin{aligned} u^t = \frac{dt}{d\tau } = 1 + \frac{v^2}{2 c^2} - \frac{\varPhi }{c^2} \, , \end{aligned}$$
which includes the effects of relativistic time dilation and the slowing of clocks in a gravitational field.

3.2 Waves in Vacuum

Consider a plane wave expansion of the vacuum \((T_{\mu \nu }=0)\) field equations by writing \(\bar{h}_{\mu \nu } = \mathfrak {R}\{ A_{\mu \nu } e^{i \mathbf {k} \cdot \mathbf {x}} \}\), where \(A_{\mu \nu }\) is a constant polarization tensor and \(\mathbf {k} \rightarrow ( \omega /c , k^i )\) is the wave vector. The linearized field Eqs. (57) yield the conditions
$$\begin{aligned} k^\alpha k_\alpha = -\omega ^2 + k_i k^i = 0\, , \quad \quad A_{\mu \nu }k^\nu =0 \, . \end{aligned}$$
The first condition tells us that gravitational waves travel at the speed of light, while second condition tells that in our chosen gauge, the oscillations are transverse to the direction of propagation. The polarization tensor is symmetric, so has ten components in a four dimensional spacetime. The transverse condition provides four constraints, so that six degrees of freedom remain. We can use the residual coordinate freedom \(\zeta ^\mu \rightarrow \zeta ^\mu + \lambda ^\mu \), where \(\lambda ^\mu \) is a homogenous solution to the wave equation, \(\Box \lambda ^\mu = 0\), to eliminate a further four degrees of freedom. Writing \(\lambda ^\mu = i C^\mu e^{i \mathbf {k} \cdot \mathbf {x}}\) we have
$$\begin{aligned} ^{(\mathrm{new})}A_{\mu \nu } = \, ^{(\mathrm{old})}\!A_{\mu \nu } + C_\mu k_\nu + C_\nu k_\mu -\eta _{\mu \nu } k^\alpha C_\alpha \, . \end{aligned}$$
We can use this gauge freedom in many different ways. One popular choice is the Transverse-Traceless (TT) gauge, where \(C_\mu \) is chosen so as to make the polarization trace-free and orthogonal to the worldlines of timelike particles:
$$\begin{aligned} ^{(\mathrm{new})}A^\mu _\mu = 0\, \quad \quad ^{(\mathrm{new})}A_{\mu \nu } u^\mu = 0 \,. \end{aligned}$$
The trace-free condition sets one constraint, while the orthogonality condition sets three constraints, not four, since we already have that \( ^{(\mathrm{new})}\!A_{\mu \nu } u^\mu k^\nu = 0\) from the waves being transverse.
Consider the case of a plane gravitational wave propagating in the \(+z\) direction, \(\mathbf {k} \rightarrow ( \omega /c , 0, 0, \omega )\), as seen by a stationary observer, \(\mathbf {u} \rightarrow (c,0,0,0)\). The TT gauge conditions (63) imply that \(A_{t\mu }=A_{z\mu }=0\) and \(A_{yy}=-A_{xx}\). Writing \(h_{xx} = h_+\) and \(h_{xy}=h_\times \) we have
$$\begin{aligned} ds^2 = - c^2 dt^2 + (1+ h_+)dx^2 + (1-h_+)dy^2 + 2 h_\times dx dy +dz^2 \end{aligned}$$
$$\begin{aligned} h_+ = A_+ \cos (\omega (t-z/c)+\phi _+), \quad \quad h_\times = A_\times \cos (\omega (t-z/c)+\phi _\times ). \end{aligned}$$
The plane wave spacetime (64) has a number of interesting properties, the most surprising being that objects at rest remain at rest in this coordinate system. This follows immediately from the geodesic equation
$$\begin{aligned} \frac{d u^\alpha }{d\tau } = -\varGamma ^\alpha _{\mu \nu } u^\mu u^\nu = \frac{1}{2} \partial ^\alpha h_{\mu \nu } u^\mu u^\mu \, . \end{aligned}$$
An initially stationary object has \(u^\alpha = \delta ^\alpha _t\), and since \( \partial ^\alpha h_{tt}=0\), we see that the coordinate acceleration vanishes: \({d u^\alpha }/{d\tau } =0\). While objects stay at the same spatial coordinate locations, it does not mean that the waves have no measurable effect. For example, the curvature tensor has the non-vanishing components
$$\begin{aligned} R_{ytyt} = R_{yzyz} = R_{xtxz} = -R_{xtxt}=-R_{xzxz}=-R_{ytyz}= & {} \frac{1}{2} \ddot{h}_+ \nonumber \\ R_{xtyz} = R_{ytxz} = -R_{xzyz} = -R_{xtyt}=-R_{ytxt}=-R_{yzxz}= & {} \frac{1}{2} \ddot{h}_\times \, , \end{aligned}$$
which tells us that the proper separation between nearby particles will vary with time, even though their coordinate locations remain fixed in space. What has happened here is that the wave has been put into the coordinates when we added in a solution of the homogeneous wave equation.
The TT gauge is by far the simplest gauge to work in when computing things like the response of a gravitational wave detector, but it does have the unfortunate property that the wave motion has been hidden inside the coordinates. The physical properties are more readily seen by transforming to a locally inertial frame. Recall that in Fermi Normal (FN) coordinates the metric can be written as \(g_{\mu \nu } =\eta _{\mu \nu } -\frac{1}{3} R_{\mu \alpha \nu \beta } x^\alpha x^\beta \), which for the case at hand yields
$$\begin{aligned} ds^2\approx & {} - c^2 d\bar{t}^2 (1+ R_{titj} \bar{x}^i \bar{x}^j) - \frac{4}{3} d\bar{t} d\bar{x}^i (R_{tjik}\bar{x}^j \bar{x}^k) + d\bar{x}^i d\bar{x}^j\left( \delta _{ij} - \frac{1}{3} R_{ikjl} \bar{x}^k \bar{x}^l\right) \nonumber \\= & {} - c^2 d\bar{t}^2 + d\bar{x}^2 + d\bar{y}^2 + d\bar{z}^2+\left( \ddot{h}_\times \bar{x}\bar{y} +\frac{1}{2} \ddot{h}_+ (\bar{x}^2-\bar{y}^2)\right) (c d\bar{t} - d\bar{z})^2. \end{aligned}$$
In deriving this expression we have used the fact that the components of the Riemann tensor are unchanged to leading order in h by the coordinate transformation that takes us to the FN coordinate system. The coordinate transformation between TT and FN coordinates is, to leading order in h, given by Rakhmanov [47]
$$\begin{aligned} x= & {} \bar{x} -\frac{1}{2} h_+ \bar{x} - \frac{1}{2} h_\times \bar{y} -\frac{1}{2} \bar{z}(\bar{x} \dot{h}_+ + \bar{y} \dot{h}_\times ) \nonumber \\ y= & {} \bar{y} +\frac{1}{2} h_+ \bar{y} - \frac{1}{2} h_\times \bar{x} +\frac{1}{2} \bar{z}(\bar{y} \dot{h}_+ - \bar{x} \dot{h}_\times ) \nonumber \\ z= & {} \bar{z} +\frac{1}{4} (\bar{x}^2 - \bar{y}^2)\dot{h}_+ + \frac{1}{2} \bar{x}\bar{y} \dot{h}_\times \nonumber \\ t= & {} \bar{t} -\frac{1}{4} (\bar{x}^2 - \bar{y}^2)\dot{h}_+ - \frac{1}{2} \bar{x}\bar{y} \dot{h}_\times \, . \end{aligned}$$
With a little algebra it is easy to show that the line element (64) is transformed to the line element (68) under the coordinate transformation (69). Remarkably, while the construction of the FN coordinate system is usually only valid locally, the metric (69) turns out to be an exact solution of Einstein’s equations that is valid globally  [47], which is useful when considering detectors where the arm lengths are large compared to the wavelength of the gravitational wave (as is the case for pulsar timing).
Restricting our attention for now to the long wavelength limit, were the wavelength of the gravitational wave is much greater than the size of the detector: \(|\bar{x}|, |\bar{y}| \ll \lambda = 2\pi c/\omega \), the geodesic equation in the FN metric yields
$$\begin{aligned} \frac{d^2\bar{x}}{d\bar{t}^2}= & {} \frac{1}{2} \bar{x} \ddot{h}_+ + \frac{1}{2} \bar{y} \ddot{h}_\times \nonumber \\ \frac{d^2\bar{y}}{d\bar{t}^2}= & {} \frac{1}{2} \bar{x} \ddot{h}_\times - \frac{1}{2} \bar{y} \ddot{h}_+ \nonumber \\ \frac{d^2\bar{z}}{d\bar{t}^2}= & {} 0 \, . \end{aligned}$$
We see that particles oscillate back and forth in the plane orthogonal to the propagation direction. A ring of test particles that is initially at rest in the plane \(\bar{z}=0\) with coordinates \(\bar{x}(0) = L \cos \phi \), \(\bar{y}(0) = L\sin \phi \) will oscillate as
$$\begin{aligned} \bar{x} = L\left( \cos \phi + \frac{1}{2} (h_+ \cos \phi + h_\times \sin \phi )\right) \nonumber \\ \bar{y} = L\left( \sin \phi + \frac{1}{2} (h_+ \cos \phi - h_\times \sin \phi )\right) \end{aligned}$$
The motion is illustrated in Fig. 10.

3.3 Making Waves

The linearized Einstein equations (56) can be formally solved using the Green’s function
$$\begin{aligned} G(\mathbf {x}-\mathbf {x}') = \frac{1}{4\pi | \mathbf {x}_s-\mathbf {x}'_s |} \delta (t_\mathrm{ret} - t'_\mathrm{ret}) \, , \end{aligned}$$
where \( \mathbf {x}_s\) denotes the spatial part of the 4-vector \(\mathbf {x}\) and \(t_\mathrm{ret} = t - |\mathbf {x}_s-\mathbf {x}'_s |/c\) is the retarded time. The Green’s function satisfies the equation \(\Box G(\mathbf {x}-\mathbf {x}') = \delta ^4(\mathbf {x}-\mathbf {x}')\). The formal solution is then
$$\begin{aligned} \bar{h}_{\mu \nu }(\mathbf {x})= & {} \frac{-16 \pi G}{c^4} \int d^4 x' \, G(\mathbf {x}-\mathbf {x}') T_{\mu \nu }(\mathbf {x}') \nonumber \\= & {} \frac{4 G}{c^4} \int d^3 x' \, \frac{T_{\mu \nu }(t_\mathrm{ret}, \mathbf {x}'_s)}{| \mathbf {x}_s-\mathbf {x}'_s |} \, . \end{aligned}$$
The general solution can be expressed in the TT gauge by applying the projection tensor \(P_{ijkl}\), which removes any longitudinal components and subtracts the trace: \(h_{ij}^\mathrm{TT} = P_{ijkl} \bar{h}^{kl}\). The projection tensor is defined:
$$\begin{aligned} P_{ijkl} = p_{ik} p_{jl} - \frac{1}{2} p_{ij}p_{kl}\, , \quad \mathrm{where}\quad p_{ij} = \delta _{ij} - n_i n_j, \end{aligned}$$
and \(\hat{n}\) is the unit vector in the direction of the source.
Fig. 10

The distortion of a ring of test particles caused by the plus and cross polarization states of a plane gravitational wave propagating in the z direction

The general solution (73) is not particularly illuminating. To arrive at an expression that can be used in practice we need to make some approximations. The first is that the size of the source region d is very much smaller than the distance to field point r where we are evaluating the wave:
$$\begin{aligned} |\mathbf {x}_s-\mathbf {x}'_s | \approx r - \mathbf{x}'_s \cdot \hat{n} + \mathcal{O}\left( \frac{d^2}{r}\right) \, , \end{aligned}$$
so that
$$\begin{aligned} h_{ij}^\mathrm{TT} =\frac{4 G}{r c^4} P_{ij}^{\;\; kl} \int d^3 x' \, T_{kl}(t- r/c + \mathbf{x}'_s \cdot \hat{n}/c, \mathbf{x}'_s) \, . \end{aligned}$$
The second approximation we make is that the material is moving slowly compared to the speed of light, which allows us to Taylor expand the energy-momentum tensor:
$$\begin{aligned}&T_{kl}(t- r/c + \mathbf{x}'_s \cdot \hat{n}/c, \mathbf{x}'_s) = T_{kl}(t- r/c) + \frac{x'_i n^i}{c} \partial _t T_{kl}(t- r/c) \nonumber \\& +\,\frac{x'_ix'_j n^i n^j}{2 c^2} \, \partial ^2_t T_{kl}(t- r/c) + \cdots \end{aligned}$$
Define the multipole decomposition of the source:
$$\begin{aligned} S^{ij}(t)= & {} \int d^3 x' \, T^{ij}(t,\mathbf{x}'_s) \nonumber \\ S^{ijk}(t)= & {} \frac{1}{c} \int d^3 x' \, T^{ij}(t,\mathbf{x}'_s) x'^k \nonumber \\ S^{ijkl}(t)= & {} \frac{1}{c^2} \int d^3 x' \, T^{ij}(t,\mathbf{x}'_s) x'^k x'^l \end{aligned}$$
we can writeFor our purposes it will be enough to consider the lowest order in the multipole expansion, which turns out to be proportional to the second time derivative of the quadrupole moment: \(S^{ij}(t) = \ddot{Q}^{ij}(t)/(2 c^2)\). To establish this result we need to take time derivatives of the quadrupole moment and integrate by parts. For slow moving sources we have
$$\begin{aligned} Q^{ij} = \int d^3 x \, T_{tt}(t,\mathbf{x}_s) x^i x^j \approx \int d^3 x \rho (t,\mathbf{x}_s) x^i x^j \, . \end{aligned}$$
$$\begin{aligned} \dot{Q}^{ij}= & {} \int d^3 x \, \partial _t T_{tt} x^i x^j = c \int d^3 x \partial ^k T_{tk} x^i x^j \nonumber \\= & {} -c \int d^3 x (T^i_t x^j + T_t^j x^i) + c \oint T^{tk}x^i x^j \, d^2 S \, . \end{aligned}$$
In the first line we have used conservation of energy to swap the time derivative for a spatial derivative. The surface integral on the second line vanishes since the surface is outside the source. Taking a second time derivative we have
$$\begin{aligned} \ddot{Q}^{ij}= & {} - \int d^3 x \, c^2 (\partial _k T^{ki} x^j + \partial _k T^{kj} x^i) \nonumber \\= & {} 2 c^2 \int d^3 x T^{ij} - c^2 \oint (T^{ki}x^j + T^{kj} x^i) n_k d^2 S \, . \end{aligned}$$
Once again the surface integral vanishes, and we arrive at the promised result. Putting everything together we arrive at the leading order, quadrupole formula for gravitational wave emission:
$$\begin{aligned} h_{ij}^\mathrm{TT} =\frac{2 G}{r c^6} \, P_{ijkl} \ddot{Q}^{kl}(t-r) \, . \end{aligned}$$
Applying this expression to a gravitational wave traveling in the z direction we find
$$\begin{aligned} h_+= & {} \frac{G}{r c^6} \left( \ddot{Q}_{xx}(t-r) - \ddot{Q}_{yy}(t-r) \right) \nonumber \\ h_\times= & {} \frac{2 G}{r c^6} \ddot{Q}_{xy}(t-r) \, . \end{aligned}$$

3.4 Energy and Momentum of a Gravitational Wave

Gravitational waves carry energy and momentum away from a source, and through the non-linearity of Einstein’s equations, become sources that modify the background geometry and even generate waves of their own. The energy and angular momentum carried by gravitational waves causes binary stars to spiral inward and eventually merge. The linear momentum carried by gravitational waves can lead to recoil kicks during black hole mergers that send the merged black hole racing away at thousands of kilometers per hour. The calculation of the energy and momentum carried by gravitational waves raises several subtle issues that deserve a more careful treatment that can be squeezed into these lectures, so here I sketch out the main results.

The derivation begins by expanding the metric to next order:
$$\begin{aligned} g_{\mu \nu } =\eta _{\mu \nu } + h_{\mu \nu } + f_{\mu \nu } \end{aligned}$$
where \(|f_{\mu \nu }| \sim |h_{\mu \nu }|^2\). The Einstein equations are then expanded order-by-order:
$$\begin{aligned} G^{(1)}_{\mu \nu }(h) = \frac{8\pi G}{c^4} T_{\mu \nu } \quad \quad G^{(2)}_{\mu \nu }(f) = \frac{8\pi G}{c^4} \tau _{\mu \nu }(h^2) \, . \end{aligned}$$
The leading order equation is what we considered earlier. The second order equation is sourced by the energy momentum tensor for gravitational waves, which in the TT gauge has the form
$$\begin{aligned} \tau ^\mathrm{TT}_{\mu \nu } = \frac{c^4}{32 \pi G} \langle \partial _\mu h^\mathrm{TT}_{jk} \partial _\nu h_\mathrm{TT}^{jk} \rangle \, . \end{aligned}$$
The angle brackets denote an average over a region of spacetime that covers several wavelengths and wave cycles. The averaging is needed since gravitational energy cannot be localized. The gravitational wave energy momentum tensor is traceless \(\tau ^\mu _\mu =0\) and conserved \(\partial _\nu \tau ^{\mu \nu } = 0\).
In spherical coordinates, and working to quadrupole order we have
$$\begin{aligned} \tau ^\mathrm{TT}_{tt} = \tau ^\mathrm{TT}_{rr} = - \tau ^\mathrm{TT}_{tr} = \frac{c^4}{32 \pi G} \langle |\dot{h}^\mathrm{TT}_{ij}|^2 \rangle = \frac{c^2}{8 \pi r^2 G} \langle |\dddot{Q}^\mathrm{TT}_{ij}|^2 \rangle \, . \end{aligned}$$
The energy radiated by a source is given by
$$\begin{aligned} \frac{d E}{dt} = \oint \tau ^\mathrm{TT}_{tr} r^2 \sin ^2\theta \,d\theta d \phi = \frac{1}{5} \langle |\dddot{Q}^\mathrm{TT}_{ij}(t-r)|^2 \rangle \, . \end{aligned}$$
The linear momentum radiated and the angular momentum radiated can be calculated in a similar fashion:
$$\begin{aligned} \frac{d P^k}{dt} =-\frac{r^2}{32 \pi } \oint d\varOmega \langle \dot{h}^\mathrm{TT}_{ij} \partial ^k h_\mathrm{TT}^{ij}\rangle \end{aligned}$$
$$\begin{aligned} \frac{d J^i}{dt} =\frac{r^2}{32 \pi } \oint d\varOmega \langle 2 \epsilon ^{ikl} \dot{h}^\mathrm{TT}_{al} \dot{h}^\mathrm{TT}_{ak} - \epsilon ^{ikl} \partial ^k h^\mathrm{TT}_{ab} x_k \partial _l h^{ab}_\mathrm{TT}\rangle \, . \end{aligned}$$

4 Gravitational Wave Detection

There are three main techniques used to detect gravitational waves. Acoustic detectors, time-of-flight detectors and astrometry. Acoustic, or bar detectors, seek to measure the tidal force imparted by a gravitational wave through the alternate stretching and compression of a mechanical oscillator, such as a large bar of aluminum. Acoustic detectors are sensitive in a narrow frequency band around the resonant frequency of the oscillator, and this limitation, along with practical challenges in achieving high sensitivity, have seen bar detectors abandoned in favor of wide-band laser interferometers. Time-of-flight detectors come in many forms, and include ground and space interferometers, spacecraft doppler tracking and pulsar timing. While the measurement techniques differ, the underling observable is the same: the small changes in the time of arrival caused by gravitational waves. Astrometric detection is a relatively new approach that seeks to measure the apparent change in the arrival direction of starlight through gravitational lensing by gravitational waves. In these lectures I will only consider time-of-flight detectors as they are currently the most sensitive and widely used.
Fig. 11

The measurement principle used by various time-of-flight gravitational wave detectors

4.1 Photon Timing

The time that it takes a photon to propagate between two points in space will be perturbed by the presence of gravitational waves. Figure 11 illustrates the measurement principle behind pulsar timing, spacecraft doppler tracking and laser interferometers. The pulsar timing approach to gravitational wave detection operates directly on this principle. The highly regular radio pulses from a milli-second pulsar will arrive a little earlier or a little later than they would if no gravitational waves were perturbing the spacetime geometry. Spacecraft doppler tracking measures changes in the frequency of radio signals sent from Earth and transponded back from a satellite. Here the measurement is proportional to the time derivative of the photon propagation time. Laser interferometers measure the phase shifts imparted on a laser signal that is sent down two paths and reflected or transponded back to a common point where the phase of the two beams can be compared. The phase shift is directly proportional to the difference in propagation time long the two paths. To calculate the response of each detector type we only need to calculate the general expression for the change in propagation time caused by gravitational waves for photons propagating between two points in space. The response is then found by combining the effects along the entire photon path, which amounts to a single pass for pulsar timing, two passes for spacecraft doppler tracking and four passes for a Michelson interferometer. More complicated measurement paths, such as a Michelson interferometer with Fabry–Perot cavities, or a space interferometer using time delay interferometry, just require additional single passes to be added together.

In most applications the distance to the gravitational wave source is much larger than the distance along the arms of the detector, which allows us to describe the perturbed spacetime using the plane wave metric (64) that we considered earlier. Making the coordinate change \(u = ct-z\), \(v=ct+z\) the line element takes the form
$$\begin{aligned} ds^2 = - dudv + (1+ h_+(u))dx^2 + (1-h_+(u))dy^2 + 2 h_\times (u) dx dy +dz^2 \, . \end{aligned}$$
Since we are working in the TT-gauge, particles at rest will remain at the same spatial coordinate location. Our goal is to compute the time it takes a photon to propagate from the spatial origin (0, 0, 0) to the point (xyz). The high degree of symmetry of the plane wave spacetime (92) makes computing geodesics simple [The derivation here follows [12, 22]]. The spacetime is invariant along the Killing vector fields \({\partial }_x, {\partial }_y, {\partial }_v\). The associate conserved quantities, combined with the normalization condition \(u^\alpha u_\alpha = 0\), fully specifies the four velocity of a photon. If no gravitational waves are present, the photon follows the trajectory \(x_0^\mu (\lambda )\) with
$$\begin{aligned} x_0(\lambda ) = x \frac{\lambda }{\varDelta \lambda }, \quad y_0(\lambda ) = y \frac{\lambda }{\varDelta \lambda }, \quad z_0(\lambda ) = z \frac{\lambda }{\varDelta \lambda }, \quad t_0(\lambda ) = \frac{L}{c} \frac{\lambda }{\varDelta \lambda } \end{aligned}$$
where \(L=\sqrt{x^2+y^2+z^2}\). The constants of the motion \(u_x=\alpha _x, u_y = \alpha _y, u_v= \alpha _v\) in this case are then
$$\begin{aligned}&\alpha _x^0 =u^x = \frac{d x_0}{d\lambda } = \frac{x}{\varDelta \lambda } \nonumber \\&\alpha _y^0 = u^y=\frac{d y_0}{d\lambda } = \frac{y}{\varDelta \lambda } \nonumber \\&\alpha _v^0 =-\frac{1}{2}u^u = -\frac{1}{2}\left( c\frac{d t_0}{d\lambda } - \frac{d z_0}{d\lambda }\right) = \frac{z-L}{2\varDelta \lambda } . \end{aligned}$$
When a gravitational wave is present the photon trajectory is modified: \(x^\mu (\lambda )=x_0^\mu (\lambda )+\delta x^\mu (\lambda )\), and we have to adjust the initial direction of the photon to arrive at the same point: \(\alpha _\mu = \alpha _\mu ^0+\delta \alpha _\mu \). Both \(\delta x^\mu \) and \(\delta \alpha _\mu \) are of order h. Note that the TT-gauge “gluing” condition ensures that the spatial coordinate location of the emitter and receiver are unperturbed by the gravitational wave, so that \(\delta x^\mu (0)=\delta x^\mu (\varDelta \lambda ) = 0\).
In the presence of a gravitational wave the constants of motion and the normalization condition become
$$\begin{aligned}&\alpha _x = (1+h_+) u^x + h_\times u^y \nonumber \\&\alpha _y = (1-h_+) u^y + h_\times u^x\nonumber \\&\alpha _v = -\frac{1}{2}u^u \nonumber \\&0 = \alpha _x u^x + \alpha _y u^y + 2 \alpha _v u^v \, . \end{aligned}$$
Next we linearize these equations in h and solve them. For example, the x component becomes
$$\begin{aligned} \delta \alpha _x = \frac{ d \delta x}{d\lambda } + u_0^x h_+ + u_0^y h_\times \, . \end{aligned}$$
Integrating along the photon path we have
$$\begin{aligned} \varDelta \lambda \, \delta \alpha _x = \delta x(\varDelta \lambda ) - \delta x(0) + u_0^x \int h_+ d\lambda + u_0^y \int h_\times d\lambda \, . \end{aligned}$$
The first two terms on the RHS vanish due to the gluing condition. Inserting the lowest solution we have
$$\begin{aligned} \delta \alpha _x = \frac{x \int h_+(u) d\lambda + y \int h_\times (u) d\lambda }{\varDelta \lambda ^2}\, . \end{aligned}$$
Using the change of variable \(du = u^u d\lambda \approx d\lambda (L-z)/\varDelta \lambda \) we arrive at the solution
$$\begin{aligned} \delta \alpha _x = \frac{x H_+ + y H_\times }{(L-z)\varDelta \lambda } \end{aligned}$$
where \(H = \int _0^{L-z} h(u) du\). Repeating the same procedure for the y and v components we have
$$\begin{aligned}&\delta \alpha _y = \frac{x H_+ + y H_\times }{(L-z)\varDelta \lambda } \nonumber \\&\delta \alpha _v = \frac{c \delta T}{2\varDelta \lambda } \, . \end{aligned}$$
$$\begin{aligned} \delta T = \frac{ (x^2 - y^2) H_+ + 2 xy H_\times }{2 c L(L-z)} \, . \end{aligned}$$
The time shift can be cast into coordinate free form by writing the propagation direction as \(\hat{k}\) and by writing the vector connecting the two points as \(L\hat{a}\) so that
$$\begin{aligned} \delta T = \frac{(\hat{a}\otimes \hat{a}):\mathbf{H} }{2 c (1-\hat{k}\cdot \hat{a})} \, . \end{aligned}$$
Here \(\mathbf{H} = \int _{\xi _1}^{\xi _2} \mathbf{h} d\xi \) is the antiderivative of the gravitational wave tensor
$$\begin{aligned} \mathbf{h}(\xi ) = h_+(\xi ) \varvec{\epsilon }^+ + h_\times (\xi ) \varvec{\epsilon }^\times \, , \end{aligned}$$
\(\xi = ct -\hat{k}\cdot \mathbf {x}_s\), and \(\varvec{\epsilon }^+, \varvec{\epsilon }^\times \) are the polarization tensors
$$\begin{aligned}&\varvec{\epsilon }^+ = \hat{p} \otimes \hat{p} - \hat{q} \otimes \hat{q} \nonumber \\&\varvec{\epsilon }^\times = \hat{p} \otimes \hat{q} + \hat{p} \otimes \hat{q}\, , \end{aligned}$$
and \(\hat{p},\hat{q}\) are vectors that define the principal polarization directions. The colon denotes the double dot product \(\mathbf{A}:\mathbf{B}= A_{ij}B^{ij}\).
Fig. 12

The coordinate system used to describe the gravitational wave polarization states

The signal observed from a gravitational wave source in the \(\theta ,\phi \) direction \(\hat{n}=-\hat{k}\) can be described using the coordinate system shown in Fig. 12 with
$$\begin{aligned}&\hat{n} = \sin \theta \cos \phi \hat{x} + \sin \theta \sin \phi \hat{y} + \cos \theta \hat{z} \nonumber \\&\hat{u} = \cos \theta \cos \phi \hat{x} + \cos \theta \sin \phi \hat{y} - \sin \theta \hat{z} \nonumber \\&\hat{v} = \sin \theta \hat{x} - \cos \phi \hat{y} \, . \end{aligned}$$
The vectors \(\hat{u}, \hat{v}\) will generally be rotated relative to the principal polarization axes by an angle \(\psi \):
$$\begin{aligned}&\hat{p} = \cos \psi \, \hat{u} + \sin \psi \, \hat{v} \nonumber \\&\hat{q} = -\sin \psi \, \hat{u} + \cos \psi \, \hat{v} \, , \end{aligned}$$
so that
$$\begin{aligned}&\varvec{\epsilon }^+ = \cos (2\psi ) \varvec{\varepsilon }^+ + \sin (2\psi ) \varvec{\varepsilon }^\times \nonumber \\&\varvec{\epsilon }^\times = -\sin (2\psi ) \varvec{\varepsilon }^+ + \cos (2\psi ) \varvec{\varepsilon }^\times \end{aligned}$$
$$\begin{aligned}&\varvec{\varepsilon }^+ = \hat{u} \otimes \hat{u} - \hat{v} \otimes \hat{v} \nonumber \\&\varvec{\varepsilon }^\times = \hat{u} \otimes \hat{v} + \hat{v} \otimes \hat{u} \, . \end{aligned}$$
Fig. 13

Photon paths for a Michelson interferometer

As a concrete example, consider a Michelson interferometer. The photon paths we need to consider are shown in Fig. 13. The total time delay is given by
$$\begin{aligned} \varDelta T(t) = \delta T_{12}+ \delta T_{24} - \delta T_{13}- \delta T_{34} \,. \end{aligned}$$
A kilometer scale detector such as LIGO operating in the frequency band \(f\sim 10 \, \mathrm{Hz} \rightarrow 1000\,\mathrm{Hz}\) has \(fL \ll c\), or equivalently \(\lambda \gg L\), so we can simplify the calculation by working in the long wavelength limit. Using (102) we find the gravitational wave response to be given by
$$\begin{aligned} h(t)= & {} \frac{c \varDelta T(t)}{2 L} = \frac{1}{2} \left[ \hat{a}\otimes \hat{a} - \hat{b}\otimes \hat{b} \right] : ( h_+(t) \varvec{\epsilon }^+ + h_\times (t) \varvec{\epsilon }^\times ) \nonumber \\= & {} F^+ h_+(t) + F^\times h_\times (t) \, , \end{aligned}$$
where the antenna pattern factors are defined:
$$\begin{aligned} F^+= & {} \frac{1}{2} \left[ \hat{a}\otimes \hat{a} - \hat{b}\otimes \hat{b} \right] : \varvec{\epsilon }^+ \nonumber \\ F^\times= & {} \frac{1}{2} \left[ \hat{a}\otimes \hat{a} - \hat{b}\otimes \hat{b} \right] : \varvec{\epsilon }^\times \, . \end{aligned}$$
If we choose \(\hat{a}\) to lie along the x-axis of the coordinate system and \(\hat{b}\) to lie along the y-axis, then the various inner products become:
$$\begin{aligned}&(\hat{a} \otimes \hat{a}): \varvec{\varepsilon }^+ = \cos ^2\theta \cos ^2\phi - \sin ^2\phi \nonumber \\&(\hat{a} \otimes \hat{a}): \varvec{\varepsilon }^\times = \cos \theta \sin 2\phi \nonumber \\&(\hat{b} \otimes \hat{b}): \varvec{\varepsilon }^+ = \cos ^2\theta \sin ^2\phi - \cos ^2\phi \nonumber \\&(\hat{b} \otimes \hat{b}): \varvec{\varepsilon }^\times = -\cos \theta \sin 2\phi \end{aligned}$$
and (Fig. 14)
$$\begin{aligned} F^+= & {} \frac{1}{2} (1+\cos ^2\theta ) \cos (2\phi ) \cos (2\psi ) -\cos \theta \sin 2\phi \sin 2\psi \nonumber \\ F^\times= & {} \frac{1}{2} (1+\cos ^2\theta ) \cos (2\phi ) \sin (2\psi ) + \cos \theta \sin 2\phi \cos 2\psi \, . \end{aligned}$$
Fig. 14

Antenna patterns in the long wavelength limit for an interferometer with arms along the x and y axes

As a second example, consider the time delays measured by pulsar timing. The relevant frequency band is then \(f\sim 10^{-9}\, \mathrm{Hz} \rightarrow 10^{-6}\,\mathrm{Hz}\), with the pulsars at \(L \sim 1\, \mathrm{kpc}\), so \(fL \gg c\) or \(\lambda \ll L\). Thus pulsar timing operates in the short wavelength limit, and we need to consider the full integral along the photon trajectory. For example, a continuous wave with \(\mathbf{h} = A \cos (\omega (t-\hat{k}\cdot \mathbf{x}_s)) \varvec{\epsilon }^+\) will produce a time-varying time delay from a pulsar in the \(\hat{a}\) direction that can be written as
$$\begin{aligned} \varDelta T = \frac{L}{2} \left\{ ((\hat{a} \otimes \hat{a}): \varvec{\epsilon }^+)\mathrm{sinc}\left[ \frac{\omega L}{2c} (1+\hat{k}\cdot \hat{a})\right] \right\} \cos \left[ \omega \left( t +\frac{L}{2 c}(1+\hat{k}\cdot \hat{a})\right) \right] \, . \end{aligned}$$
We identify the static term in curly brackets to be the wavelength-dependent antenna pattern \(F^+(L/\lambda )\), with a similar expression defining the cross polarization. In the pulsar timing literature it is conventional to fix the gravitational wave propagation direction to be \(\hat{k}\), and to consider pulsars at different sky locations \(\hat{a} \rightarrow (\theta ,\phi )\). Setting \(\hat{k}= - \hat{z}\) we have \(\hat{k}\cdot \hat{a} = -\cos \theta \), \((\hat{a} \otimes \hat{a}): \varvec{\epsilon }^+ = \cos 2\phi \sin ^2\theta \) and \((\hat{a} \otimes \hat{a}): \varvec{\epsilon }^\times = \sin 2\phi \sin ^2\theta \). The antenna patterns for the case \(L = 10\, \lambda \) are shown in Fig. 15. The time delays are largest for pulsars that are located in roughly the same direction as the source (though the response is zero for pulsars that are in exactly the same direction as the source due to the waves being transverse).
Fig. 15

Antenna patterns as a function of the sky location of the pulsar for a fixed gravitational wave propagation direction \(\hat{k}\). The distance to pulsars was set at ten gravitational wavelengths

Fig. 16

Gravitational wave detection efforts cover many decades of the gravitational wave spectrum

5 Gravitational Wave Observatories

Gravitational wave detection efforts began in the 1950s with Joseph Weber’s development of acoustic “bar” detectors. Weber’s announcement of a detection in 1969, while ultimately discredited, spurred further interest, even prompting theorists such as Steven Hawking and his student Gary Gibbons to try their hand at gravitational wave detection! The possibility of using laser interferometry as a detection method was proposed in 1963, and the first experimental studies of this approach occurred in 1971. The following year, Rainer Weiss published a landmark study that laid out the basic design for a practical laser interferometer gravitational wave detector, paying particular attention to the various sources of noise and how they might be mitigated. The idea of launching a laser interferometer into space appeared a few years later in a report by Weiss, Bender, Pound and Misner. At around the same time, Davies, Anderson, Estabrook and Wahlquist were developing the idea of using spacecraft doppler tracking for gravitational wave detection, which gave Detweiler the idea of using pulsar timing to search for gravitational waves. By the end of the 1970s the basic idea behind the three major detection techniques being pursed today were in place: ground and space based interferometers, and pulsar timing (Fig. 16).

5.1 Ground Based Laser Interferometers

Ground based laser interferometers operate in the audio frequency band \(f\sim [10,10^4]\, \mathrm{Hz}\), where the primary targets are stellar-remnant mergers of neutron stars and black holes. Other potential sources in the audio band include isolated distorted neutron stars, core-collapse supernovae, low mass X-ray binaries, collapsars and cosmic strings.

The design of a laser interferometer takes into account many factors, but two basic considerations set the overall parameters of the design—maximizing the response to a signal and mitigating laser frequency noise. Interferometers record the time delays due to gravitational waves, \(\varDelta T(t)\), as phase shifts between laser signals: \(\varDelta \varPhi (t) = 2 \pi \nu _0 \varDelta T(t)\), where \(\nu _0\) is the laser frequency. In principle it is possible to make a one-arm interferometer that compares the phase of the laser light returning for a round trip down one arm to a local phase reference. But in practice the laser frequency is not perfectly constant, and frequency fluctuations get multiplied by the overall light travel time to produce phase shifts that would swamp any gravitational wave signals. Using ultra-stable lasers can help mitigate the problem, but even the most stable lasers are still many orders of magnitude too noisy for a one-arm design. The solution is to adopt a Michelson interferometer topology where the laser phase noise is cancelled in the differential arm-length readout. In a Michelson interferometer the input beam is split and sent along two paths of equal length, reflected, then recombined. The laser phase noise is cancelled in the differential read-out. Figure 17 shows a schematic of the interferometer design used for the initial LIGO detectors. The basic Michelson design is augmented by using resonant Fabry–Perot cavities to amplify the signal in each arm. The amplification is achieved by bouncing the laser light back and forth multiple times, effectively increasing the arm-length of the detector.
Fig. 17

Basic layout of the initial LIGO interferometers. Two 4-km Fabry–Perot resonant cavities are combined together in a Michelson interferometer topology. Light from the two arms is mixed at the photodetector, yielding a differential measure of the light propagation time in the two arms

The overall size and shape of the detectors follow from some simple considerations. As to the shape, a right-angle configuration is chosen since it maximizes the differential arm-length change caused by a quadrupole radiation pattern, as is evident from Fig. 10. As to the size, the longer the arms the larger the response, at least until when the armlength becomes comparable to the gravitational wavelength, a condition which defines the transfer frequency, \(f_* = c/(2\pi L)\), where L is the optical path length. For \(f < f_*\) we have \(\varDelta T \sim h L/c\), so to get a large time delay we need a large detector. If the detector is too large we go outside the low frequency limit and the response is diminished. Setting a maximum frequency of 1 kHz defines an optimal size of around 50 km. But building a detector this large would be very costly, so instead resonant cavities are used to fold the light so that the signal gets built up over multiple bounces using much shorter arms—4 km for the LIGO detectors and 3 km for the Virgo detector.

The amplifying effect of a Fabry–Perot cavity can be computed using basic electromagnetic theory. The cavity mirrors can be characterized in terms of their transmissivity and reflectivity, and by keeping track of the electric field transmission and reflection coefficients at each mirror and summing over the multiple bounces yields an expression for the phase shift due to a gravitational wave. In the long wavelength limit, and for a gravitational wave that is incident perpendicular to the cavity, the phase shift is given by
$$\begin{aligned} \vert \varPhi _\mathrm{FP}\vert = \left( \frac{\nu _0 h L}{c} \right) \left[ \frac{8 \mathcal{F}}{\sqrt{1+(f_\mathrm{gw}/f_p)^2}}\right] \end{aligned}$$
$$\begin{aligned} \mathcal{F} = \frac{\pi \sqrt{r_1 r_2}}{1-r_1 r_2} \end{aligned}$$
is the cavity finesse and
$$\begin{aligned} f_p = \frac{1}{4 \pi \tau _s} = \frac{c}{4 \mathcal{F} L} \end{aligned}$$
is the cavity pole frequency. Here \(r_1, r_2 \sim 1\) are the reflectivities of the cavity mirrors, and \(\tau _s\) is the light storage time in the cavity. Roughly speaking, the cavity boosts the effective length of the arm from L to \(2 \mathcal{F} L\).
Fig. 18

Optical layout of the advanced LIGO interferometers. Two 4-km Fabry–Perot resonant cavities are coupled together by the signal recycling mirror

To compute the full phase shift due to a gravitational wave for the advanced LIGO and Virgo instruments we need to take into account the fact that the the two Fabry–Perot cavities are now coupled together due to the addition of a signal re-cycling mirror, shown in Fig. 18. The signal recycling mirror allows the cavity to be detuned from resonance in order to recycle the signal at some frequencies and resonantly extract the signal at other frequencies. This makes it possible to manipulate the optical response of the detector as a function of frequency. Without the coupling provided by the signal recycling mirror, the cavity pole frequency for advanced LIGO would sit at the low value of \(f_p=42\) Hz, resulting in poor sensitivity across the band. With the signal recycling mirror the pole frequency for the differential mode is shifted to \(f_p \simeq 350\) Hz in the so-called “zero detuned” (fully resonant) configuration, and the signal gain from folding and recycling is roughly a factor of 1,100. The full cavity transfer function for the differential mode is given by the complex function
$$\begin{aligned} C(f) = \frac{t_s e^{-i(2\pi (f+\nu _0) L_s/c)}}{1-r_s \left( \frac{r_i - e^{4\pi i f L/c}}{1-r_i e^{-4\pi i f L/c}}\right) e^{-i(4\pi (f+\nu _0) L_s/c)}} \end{aligned}$$
where \(t_s,r_s,r_i\) are the transmissivities and reflectivities of the signal recycling mirror and the input mirror, and \(L_s\) is the distance to the signal recycling mirror. By changing the phase advance, \(\phi =4\pi L_s/c(f+\nu _0)\) it is possible to tune the response. Examples of how the tunings can modify the instrument sensitivity are shown in Fig. 19. Some of the tunings allow for higher sensitivity in certain narrow frequency bands, which may be useful when targeting particular types of signals.
Fig. 19

Examples of the sensitivities that can be achieved by using the placement of the signal recycling mirror to tune the response of the advanced LIGO detectors. The blue line shows the degraded sensitivity that would result if the signal recycling mirror were not used

To understand the sensitivity curves shown in Fig. 19 we need to consider the various noise sources that impact the phase measurement. The noise comes in two flavors, facility noise and fundamental noise. Facility noise includes seismic noise, gravity gradient noise, and various types of thermal noise in the mirrors, mirror coatings and suspensions. Fundamental noise has its origin in quantum mechanics and the Heisenberg uncertainty principle. Figure 20 shows the noise budget—the estimated contribution of each noise term—for the advanced LIGO design.
Fig. 20

Noise budget for the advanced LIGO detectors

The overall “U” shape of the noise curve is set by the fundamental quantum sensing noise and the cavity response. The accuracy with which the phase shifts can be measured scales inversely with the number of photons—the more photons the better. This photon counting noise is referred to as “shot noise”, and has an amplitude spectral density that scales as \(S_\mathrm{shot}^{1/2} \propto I_0^{-1/2}\), where \(I_0\) is the laser intensity. However, photons carry momentum, and the laser field exerts a fluctuating radiation pressure on the mirrors that scales as \(S_\mathrm{rp}^{1/2} \propto I_0^{1/2}/(M f^2)\), where M is the mass of the mirror. Thus we have a trade off between minimizing the shot noise by increasing the laser power, versus minimizing the disturbance to the mirrors by lowering the laser power. The cavity response makes the effective laser intensity frequency dependent: \(I(f) \simeq I_0/(1+(f/f_p)^2)\), resulting in a quadrature-sum quantum noise of
$$\begin{aligned} S_Q = \frac{\hbar c}{L^2} \left( \frac{I(f)}{\pi ^4 f^4 M^2 c^2} + \frac{1}{I(f)} \right) . \end{aligned}$$
The frequency at which the two contributions are equal defines what is known as the standard quantum limit. While “quantum limit” sounds rather insurmountable, it turns out not to be. The expression in Eq. (119) assumed that the shot noise and radiation pressure noise are uncorrelated, and so could be added in quadrature, but this does not have to be the case. It is possible to introduce correlations between the terms, and effectively rotating the error ellipse to make the uncertainty in either position or momentum arbitrarily small. This kind of squeezing, or quantum non-demolition measurement was first demonstrated using the GEO600 detector [1].
To get down to the level set by the quantum sensing noise it is necessary to mitigate contributions to the facility noise. Left unattenuated, seismic noise would make gravitational wave detection impossible. The typical level for a ground level facility is of order
$$\begin{aligned} S_\mathrm{seis} \simeq 10^{-12} \left( \frac{ 10 \; \mathrm{Hz}}{f}\right) ^2 \, \mathrm{Hz}^{-1/2}\, . \end{aligned}$$
The seismic noise can be filtered by suspending the optics on pendula, which introduce transfer functions of the form \(|1-(f/f_\mathrm{pend})^2|\). Setting the pendulum frequency well outside the sensitive band, \(f_\mathrm{pend} \sim 1\) Hz for advanced LIGO, and using a five stage pendulum yields a surpression of \(\sim \) \((f_\mathrm{pend}/ f)^{10}\), bringing the seismic noise below the quantum sensing noise across the measurement band.
Thermal noise can be understood in terms of the fluctuation-dissipation theorem, which states that the power spectral density of the fluctuations of a system in equilibrium at temperature T is determined by the dissipative terms that return the system to equilibrium. This can be modeled in terms of an anelastic spring, with dynamics determined by the modified Hooke’s law \(F = -k x (1 +i \phi )\), where \(\phi \) is the loss angle. The thermal noise is then
$$\begin{aligned} S_\mathrm{T} = \frac{k_B T}{2 \pi ^3 M f} \frac{f_\mathrm{res}^2 \phi (f)}{[(f_\mathrm{res}^2 - f^2)^2 + f_\mathrm{res}^2 \phi ^2(f)]} \, . \end{aligned}$$
Here M is the mirror mass and \(f_\mathrm{res}\) is the resonant frequency of the oscillator. This one expression can be used to estimate mirror coating thermal noise and suspension thermal noise. Suspension thermal noise comes in two forms: pendulum thermal noise and violin mode thermal noise. For the mirror coating thermal noise the resonant frequencies are very high (tens of kHz), and the loss angles very small, and we find
$$\begin{aligned} S_\mathrm{MC}^{1/2} \simeq 2.5\times 10^{-24} \left( \frac{100\,\mathrm{Hz}}{f}\right) ^{1/2} \, \mathrm{Hz}^{-1/2}\, . \end{aligned}$$
For the pendulum noise we have \(f_\mathrm{res} = f_\mathrm{pend} \ll f\) and
$$\begin{aligned} S_\mathrm{pend}^{1/2} \simeq 3.5\times 10^{-25} \left( \frac{100\,\mathrm{Hz}}{f}\right) ^{5/2} \, \mathrm{Hz}^{-1/2}\, . \end{aligned}$$
The violin modes described vibrations in the silicon wires used to suspend the optics. The resonant frequencies come in at harmonics of the fundamental mode, which has \(f_0 \simeq 500\) Hz. Expanding around the first harmonic we have
$$\begin{aligned} S_\mathrm{violin}^{1/2} \simeq \frac{3\times 10^{-24}}{1+(f^2_0-f^2)^2/\delta f^4} \, \mathrm{Hz}^{-1/2}\, , \end{aligned}$$
with a linewidth \(\delta f = f_0 \phi ^{1/2} \simeq 2\) Hz.
Local variations in the gravitational field are an inescapable noise source at low frequencies. For example, a person walking past one of the LIGO end stations exerts a time varying gravitational attraction on the mirrors, which changes the optical path length. A time varying mass distribution \(\delta \rho \) generates gravity gradient noise through the gravitational acceleration of the test masses:
$$\begin{aligned} \ddot{\mathbf{x}} = G \int \frac{\delta \rho (\mathbf{x}', t)}{|\mathbf{x}- \mathbf{x}'|^3} \, (\mathbf{x}-\mathbf{x}') \, d^3\mathbf{x}' \, . \end{aligned}$$
One of the largest contributions to gravity gradient noise comes from the density fluctuations caused by seismic surface waves. Fluctuations in the density of the atmosphere are also important. These disturbances can be reduced by placing the detectors deep underground. It may also be possible to use very accurate gravimeters to measure the disturbances and subtract them from the data, but ultimately the best way to escape gravity gradient noise is to put the detectors in deep space.

There is a lot more that could be said about the operation of the LIGO and Virgo detectors that goes well beyond the brief description given here. Topics of particular importance that are not covered here are how the control system keeps the interferometers in resonance, and how the output from this control loop is used to calculate the calibrated strain. To learn more, see Abbott et al. [2], Izumi and Sigg [31].

5.2 Space Based Laser Interferometers

To detect signals below \(f\sim 1\) Hz we need to get away from gravity gradient and seismic noise. This can be achieved by launching a detector into deep space. The frequency of a gravitational wave signal scales inversely with the size of the source, so space based detectors can detect much larger systems, such as massive black hole mergers and stellar binaries on wide orbits.

The same considerations that guide the design of a ground based interferometer also apply to space detectors, but with the added complication that it is much more difficult to control the distance between the mirrors. Two very different concepts have been proposed for space based interferometers. The first concept, which will be used by the Laser Interferometer Space Antenna [4] (LISA), employs three free-flying spacecraft to form a long baseline detector with synthetic interferometry, while the second concept, which will be used by the Deci-Hertz Interferometer Gravitational wave Observatory  [33] (DECIGO), employs precision formation flying, much shorter arms and a resonant Fabry–Perot cavity. The DECIGO concept is essentially LIGO in space, and requires precise control of the spacecraft separations to produce a resonant cavity. The signal would have to be extracted from the control system that maintains the inter-spacecraft separation. In what follows I will focus on the LISA mission since it has already been selected for launch, while DECIGO is at an earlier stage in its development.
Fig. 21

Proposed orbit for the LISA mission. The constellation cartwheels clockwise as the spacecraft orbit the Sun

The current LISA design calls for three spacecraft to be placed into slightly eccentric and slightly inclined orbits about the Sun, with the center of mass of the constellation trailing the Earth’s orbit by about 20\(^{\circ }\). The orbit is illustrated in Fig. 21. Each spacecraft follows a geodesic Keplerian orbit that is given to leading order in eccentricity by
$$\begin{aligned}&x_k = \mathrm{AU}\left( \cos \alpha _k +\frac{e}{2} (\cos (2\alpha _k-\beta _k)-3\cos \beta _k)\right) \nonumber \\&y_k = \mathrm{AU}\left( \sin \alpha _k +\frac{e}{2} (\sin (2\alpha _k-\beta _k)-3\sin \beta _k)\right) \nonumber \\&z_k = -\sqrt{3} e \mathrm{AU} \cos (\alpha _k-\beta _k) \end{aligned}$$
where \( \alpha _k = 2\pi t/\mathrm{yr} + \kappa \) is the orbit phase of the center of mass and \(\beta _k = 2 \pi k/3 + \lambda \) is relative phase of the spacecraft in the constellation. The constants \(\kappa ,\lambda \) set the overall orientation and location of the array. To leading order in the eccentricity e the distance between each pair of spacecraft is given by \(|\mathbf{x}_k - \mathbf{x}_{k+1}| = L = 2\sqrt{3} \mathrm{AU} e\). The normal to the plane of the constellation makes an angle of 60\(^{\circ }\) with the ecliptic. Setting \(L=2.5\times 10^6\) km yields an orbital eccentricity of \(e=0.00483\). The armlengths are only constant to leading order in the eccentricity, and even ignoring three body effects from the Earth and other planets, the distance between the spacecraft will change by several meters per second.

The basic idea behind the LISA mission is to use laser interferometry to precisely track the distance between widely separated free flying proof masses. It is necessary to house the proof masses inside a spacecraft to protect them from non-gravitational disturbances such as solar radiation pressure and the solar wind. The spacecraft also provide the platform to house the lasers, optical benches and telescopes for the interferometry system. Figure 22 shows a schematic of the LISA design. The LISA design employs two optical benches on each spacecraft, comprised of a free floating gold/platinum cube and a \(\sim \)25 cm diameter telescope to transmit and receive the laser signal along each arm. Laser signals are transmitted from each optical bench, producing a total of six laser links. The gold/platinum proof masses are housed in metal cages, and capacitive sensing is used to track the distance between the cubes and the sides of the cage. Micro-Newton trusters gently maneuver the spacecraft to maintain separation between the proof masses and the cages.

The large distances between the spacecraft and the constant changes in the spacecraft separation make it impossible to form a traditional interferometer. For one, the spread in the laser beams over these huge distances mean that very little of the transmitted laser light is received, and even less would be reflected back. Secondly, the changes in the separation amount to millions of phase cycles per second, and even if a Michelson signal could be formed, the differences in the armlengths would lead to the readout being swamped by laser phase noise. The solution is to use synthetic interferometry, where the phase of the incoming laser light is compared to a local reference and recorded by a phasemeter to yield a collection of phase readouts \(\phi _{ij}(t)\) that measuring the phase of the signal from spacecraft i that is recorded at spacecraft j at time t. This phase readout will include contributions from laser phase noise \(C_i\), position noise \(n_{ij}^p\), acceleration noise \(\mathbf{n}^{a}_{ij}\) and gravitational waves \(\psi _{ij}\):
$$\begin{aligned} \phi _{ij}(t) = C_i(t-L_{ij}/c) -C_j(t) +n^p_{ij}(t) - \hat{\mathbf{x}}_{ij}\cdot (\mathbf{n}_{ij}^a(t)-\mathbf{n}_{ji}^a(t-L_{ij})) +\psi _{ij}(t) \, , \end{aligned}$$
where \(L_{ij}\) is the instantaneous length of the arm connecting the two spacecraft and \(\hat{\mathbf{x}}_{ij}\) is a unit vector along the arm. The position noise includes a combination of effects in the optical metrology system, the principal one being shot noise in the phase measurement. The acceleration noise is due to non-gravitational forces pushing on the proof masses, for example, due to residual gas in the proof mass housing and feedback from the capacitive sensing and control system. The six phase readouts are combined in software with carefully chosen time delays to synthesize equal-arm length interferometry signals that are free of laser phase noise.
Fig. 22

The LISA measurement system. Laser signals are transmitted from each optical bench, producing a total of six laser links

Fig. 23

Synthetic time-delay interferometry works by combing the laser phase measurements along the two virtual paths shown here in red and blue. By going up and back along each arm we ensure that each path has the same total length, and that the total phase difference is therefore free of laser phase noise

The idea behind synthetic time-delay interferometry is illustrated in Fig. 23. The laser phase noise can be cancelled by following the virtual path shown in Fig. 23 to form the Michelson-like signal:
$$\begin{aligned}&X(t) = \phi _{12}(t-L_{31}/c-L_{13}/c-L_{21}/c)-\phi _{13}(t-L_{21}/c-L_{12}/c-L_{31}/c) \nonumber \\&\quad \quad +\,\phi _{21}(t-L_{31}/c-L_{13}/c) -\phi _{31}(t-L_{21}/c-L_{12}/c) \nonumber \\&\quad \quad +\,\phi _{13}(t-L_{31}/c)- \phi _{12}(t-L_{21}/c)+\phi _{31}(t)-\phi _{21}(t) \, . \end{aligned}$$
Similar signals Y(t), Z(t) can be extracted from vertices 2, 3, with expressions that can be found by permuting the indices 1, 2, 3 in the above expression for X(t).
The gravitational wave signal can be computed using the expression given in Eq. (102). The phase shift due to plane gravitational wave propagating in the \(-\hat{\varOmega }\) direction with frequency f is given by
$$\begin{aligned} \psi _{ij}(t) = \mathbf{h}(f, t-L_{ij}, \mathbf{x}_i):(\hat{\mathbf{x}}_{ij} \otimes \hat{\mathbf{x}}_{ij}) \mathcal{T}(\hat{\mathbf{x}}_{ij}, \hat{\varOmega }, f) \end{aligned}$$
$$\begin{aligned} \mathcal{T}(\hat{\mathbf{x}}_{ij}, \hat{\varOmega }, f) = \mathrm{sinc}\left[ \frac{f}{2 f_{ij}} (1-\hat{\mathbf{x}}_{ij}\cdot \hat{\varOmega })\right] e^{i f/(2 f_{ij})(1-\hat{\mathbf{x}}_{ij}\cdot \hat{\varOmega })} \end{aligned}$$
and \(f_{ij} = c/(2 \pi L_{ij})\) is the instantaneous transfer frequency. Above the transfer frequency the response diminishes as \(\sim 1/f\).
Fig. 24

The LISA sensitivity curve in terms of characteristic strain, \(\sqrt{f S_n}\) is compared to three types of signal: an equal mass black hole binary at \(z=3\) with source-frame total mass \(M = 10^6 \, M_{\odot }\); the galactic verification binary SDSS J0651+2844 observed for 4 years; and a signal similar to the first LIGO detection GW150914 if the LISA observation started 5 years prior to merger and continued for 4 years

The above expression for the gravitational wave response can be used with the expression for X(t) to compute the sky and polarization averaged response to a gravitational wave. This can be further combined with estimates for the position and acceleration noise to produce a LISA sensitivity curve. The current noise estimates are [27]:
$$\begin{aligned} P_\mathrm{pos} = (1.5 \times 10^{-11} \, \mathrm{m})^2 \, \mathrm{Hz}^{-1}\, , \end{aligned}$$
$$\begin{aligned} P_\mathrm{acc} = (3 \times 10^{-15} \, \mathrm{m}\, \mathrm{s}^{-2})^2 \left( 1+ \left( \frac{ 0.4\, \mathrm{mHz}}{f}\right) ^2 \right) \, \mathrm{Hz}^{-1}\, . \end{aligned}$$
The resulting sensitivity curve, along with some representative LISA signals, is shown in Fig. 24. Details on how the various curves in the figure are calculated can be found in Robson et al. [48].

5.3 Pulsar Timing Arrays

Nature has provided us with extremely accurate galactic clocks in the form of pulsars—rapidly rotating Neutron stars which emit beams of radio waves that sweep past the Earth as the Neutron star rotates. The first pulsar, PSR B1919\(+\)21, was discovered by Jocelyn Bell and Antony Hewish in 1967. The number of pulsars known today is approaching 2,000. Most of these are so-called “classical” pulsars, with spin periods of 0.1–10 s. In 1982 the first millisecond pulsar was discovered [5]. Millisecond pulsars have rotational period in the range of about 1–10 ms, and are thought to be classical pulsars that have been spun-up by accreting material from a binary companion. The accretion is also thought to bury the magnetic field, which reduces the pulsar winds and results in a slower spin down rate. Millisecond pulsars typically have more consistent pulse profiles and more regular pulse periods than classical pulsars, making them much better clocks to use for gravitational wave detection. There are currently 300 known milli-second pulsars, and several new ones are discovered each year.

The idea of using pulsars to detect gravitation waves was first considered by Sazhin in 1978, and in a more general context by Detweiler [17] in 1979, who suggested that cross-correlation of the signals from multiple pulsars could be used to separate noise disturbances from gravitational wave signals. In 1983 Hellings and Downs [26] computed the cross-correlation of the pulse arrival times for pairs of pulsars in the presence of an isotropic stochastic gravitational wave background. They found that the correlation followed the curve
$$\begin{aligned} C(\theta ) = 1 +\frac{3(1-\cos \theta )}{2}\left( \ln \left( \frac{1-\cos \theta }{2}\right) - \frac{1}{6} \right) \, , \end{aligned}$$
where \(\theta \) is the angle between the line of sight to the two pulsars. With \(N_p\) pulsars we get \(N_p(N_p-1)/2\) measurements of \(C(\theta )\) across a wide range of angles. This correlation pattern is the smoking gun signature that we look for in pulsar timing array searches for gravitational waves. The idea of establishing a pulsar timing array—a collection of millisecond pulsars that are monitored on a regular basis, was first proposed by Forster and Baker in 1990. Since then, three pulsar timing efforts have been undertaken, the Parkes Pulsar Timing Array (PPTA) in Australia, the European Pulsar Timing Array (EPTA) in countries across Europe and the North American NanoHertz Observatory for Gravitational Waves (NANOGrav) in the United States and Canada. The data from all three projects is now analyzed together under the auspices of the International Pulsar Timing Array (IPTA).

Using pulsars to detect gravitational waves sounds simple enough—gravitational waves should perturb the arrival time of the pulses and lead to the distinct correlation pattern predicted by Hellings and Downs—but in practice there are many challenges to overcome. First, both the pulsars and the radio receivers are in constant relative motion, so we have to account for a multitude of effects that contribute to changes in the light propagation time. Another issue is that the individual radio pulses have different shapes, and many thousands of pulses have to be stacked together to arrive at a consistent pulse profile that can be used for the timing. Telescope availability and constraints on observing schedules mean that the timing measurements are irregularly spaced with gaps of one two weeks between observations. The noise levels in each measurement can also vary widely.

The basic equation governing the sensitivity of a radio telescope is the radiometer equation, which leads to the following expression for the noise in the pulse arrival times:
$$\begin{aligned}&\sigma = \left( \frac{S_\mathrm{psr}}{\mathrm{mJy}}\right) ^{-1} \left( \frac{T_\mathrm{rec} + T_\mathrm{sky}}{\mathrm{Kelvin}}\right) \left( \frac{G}{\mathrm{KJy}^{-1}}\right) ^{-1} \left( \frac{\varDelta \nu }{\mathrm{MHz}}\right) ^{-1/2} \nonumber \\&\times \left( \frac{t_\mathrm{int}}{\mathrm{sec}}\right) ^{-1/2}\left( \frac{W}{P}\right) ^{3/2} \left( \frac{P}{\mathrm{ms}}\right) \, \mathrm{ns}\, . \end{aligned}$$
Here \(S_\mathrm{psr}\) is the pulsar flux density—brighter pulsars are better; \(T_\mathrm{rec},T_\mathrm{sky}\) are the system and sky temperature—cool detectors are better; G is the antenna gain—bigger antenna are better; \(\varDelta \nu \) is the bandwidth—wide band systems are now standard; \(t_\mathrm{int}\) is the integration time, longer is better; W is the pulse width—the narrower the better; and P is the pulse period—the shorter the better. Using the worlds largest radio dishes with wideband receivers and with the brightest millisecond pulsars, it is now possible to measure pulse arrival times with an accuracy of \(\sigma \sim 10\) ns.
Fig. 25

An illustration of some of the factors that complicate pulsar timing observations. Millisecond pulsars are often found in binary systems that are moving at high velocity through the galaxy. Interstellar dispersion spreads the pulse signals, with the higher frequency radio waves arriving before those at lower frequency. The radio telescopes are on the surface of the Earth, which rotates on its axis and orbits the solar barycenter in a complex orbit that depends on all the other bodies in the solar system

The pulse arrival times are impacted by many factors. The intrinsic pulse period changes slowly with time, and this can be modeled using a Taylor series expansion for the pulsar period:
$$\begin{aligned} P(t) = P_0 + \dot{P}_0 (t-t_0)+ \frac{1}{2}\ddot{P}_0 (t-t_0)^2 + \cdots \, . \end{aligned}$$
Usually three terms are enough to model the period changes over many decades. As shown in Fig. 25, there are also a host of propagation effects that need to be accounted for. Together with the model for the intrinsic period changes, these propagation effects define the timing model. The predicted pulse arrival time, \(t_\mathrm{pred}\) can be expanded:
$$\begin{aligned} t_\mathrm{pred} = t_\mathrm{PSR} + \varDelta _\odot + \varDelta _\mathrm{ISM} + \varDelta _\mathrm{B} \, , \end{aligned}$$
where \(t_\mathrm{PSR}\) is the emission time at the pulsar, which is modeled using Eq. (135); \(\varDelta _\odot \) maps the arrival time at the telescope to the arrival time at the solar barycenter; \(\varDelta _\mathrm{ISM}\) gives the propagation delay from the pulsar system barycenter to the solar barycenter; and \(\varDelta _\mathrm{B}\) maps the pulse emission time from the pulsar to the binary barycenter. The difference between the predicted pulse arrival time and the observed pulse arrival time is called a timing residual, and these residuals are what are used to search for gravitational waves. Each term in the timing model can be expanded into a collection of contributions. The mapping from the telescope to the solar barycenter can be expanded:
$$\begin{aligned} \varDelta _\odot = \varDelta _\mathrm{C} + \varDelta _\mathrm{A} + \varDelta _{\mathrm{R}_\odot } + \varDelta _{\mathrm{E}_\odot } + \varDelta _{\mathrm{S}_\odot } + \cdots \, , \end{aligned}$$
where \(\varDelta _\mathrm{C}\) are clock corrections; \(\varDelta _\mathrm{A}\) are atmospheric delays; \(\varDelta _{\mathrm{R}_\odot }\) is the Roemer delay due to the finite speed of light; \( \varDelta _{\mathrm{E}_\odot }\) is the Einstein delay due to time dilation for moving clocks and clocks running slower in strong gravitational fields; and \(\varDelta _{\mathrm{S}_\odot }\) is the Shapiro delay due to light propagation in a curved spacetime. The Roemer delay depends on the distance between the telescope and the solar barycenter, which is in turn derived from the solar Ephemeris model. Uncertainties in the location of the solar barycenter, which are of order hundreds of meters, have emerged as one of the major sources of uncertainty in the timing model. The mapping from the pulsar to the binary barycenter shares many of the same elements as the solar barycentering:
$$\begin{aligned} \varDelta _\mathrm{B} = \varDelta _{\mathrm{R}_B} + \varDelta _{\mathrm{E}_B} + \varDelta _{\mathrm{S}_B} + \varDelta _{\mathrm{A}_B} \dots \, . \end{aligned}$$
Again we have the Roemer delay, \(\varDelta _{\mathrm{R}_B}\); the Einstein delay \(\varDelta _{\mathrm{E}_B}\); and the Shapiro delay \(\varDelta _{\mathrm{S}_B}\); but in addition we have the aberration delay \( \varDelta _{\mathrm{A}_B}\) due to the apparent location of the pulsar being changed by transverse velocity. Once again the Roemer delay includes the full orbital model, which includes relativistic post-Keplerian corrections due to the strong gravitational fields and high velocities encountered in these binaries. Finally there is the propagation delay from the pulsar system to the solar barycenter:
$$\begin{aligned} \varDelta _\mathrm{ISM} = \varDelta _{\mathrm{VP}} + \varDelta _{\mathrm{ISD}} + \varDelta _{\mathrm{E}_S} + \cdots \, , \end{aligned}$$
where \(\varDelta _{\mathrm{VP}}\) is the vacuum propagation delay—a Roemer type delay that includes the changing distance to the binary; a delay due to interstellar dispersion, \(\varDelta _{\mathrm{ISD}} \sim D/\nu ^2\), which scales inversely with the square of the radio frequency; and the Einstein delay \( \varDelta _{\mathrm{E}_S} \) due to the special relativistic time dilation caused by the relative velocity of the binary barycenter. Examples of how some of the terms impact the timing model are shown in Fig. 26. Full details of the timing model can be found in Hobbs et al. [28].
Fig. 26

Examples of timing residuals when there are errors in the timing model. a An error in the pulse period \(P_0\); b an error in the pulse period derivative \(\dot{P}_0\); c an error in the sky location, resulting in an error in \(\varDelta _{\mathrm{VP}}\); d and error in the pulsar proper motion, resulting in errors in \(\varDelta _{\mathrm{VP}}\) and \(\varDelta _{\mathrm{E}_S}\)

Pulsar timing observations span decades and observations are made \(\sim \) weekly, resulting in a sensitivity to gravitational wave signals in the frequency band \(f\sim [10^{-9}, 10^{-6}]\) Hz. The main astrophysical source is this band is thought to be supermassive binary black holes with masses in the range \(M\sim 10^8 M_\odot - 10^{10} M_\odot \) that are tens of thousand of years from merger. The combined signal from many hundreds of systems is thought to produce an almost stochastic background that can be detected by cross-correlating the timing residuals from many pulsars. The current status of the search for the correlation pattern by the NANOGrav collaboration [3] is shown in Fig. 27. While the current uncertainties in the correlation measure look to be dauntingly large, simulations suggest that a detection should be possible within the next five years [34].
Fig. 27

The current status of attempts to measure the Hellings–Downs correlation curve (dotted red line) using data from the NANOGrav 11 year data release. While the measurement uncertainties are currently very large, they are expected to drop significantly in the next few years, opening the way for a detection

6 Gravitational Waves from Binary Systems

Binary systems are a prime target for gravitational wave detectors. Binaries made up of ordinary stars are not so promising since they glom together before reaching significant orbital velocities. Compact stellar remnants, such as white dwarfs, Neutron stars and black holes are much better candidates, as are the supermassive black holes found near the centers of galaxies. Binary black hole system are able to reach orbital velocities close to the speed of light before the individual black holes merge to form a larger black hole. The binary dynamics, for both black holes and Neutron stars, is usually divided into three regions: inspiral, merger and ringdown. During the early inspiral the stars follow almost Keplerian orbits, but as the orbit shrinks due to the emission of gravitational waves the orbits become increasingly non-Keplerian, exhibiting interesting relativistic effects such as periastron precession and orbital plane precession. The merger is highly relativistic, and sources strong, dynamical gravitational fields. Neutron star mergers also involve high density material colliding at high velocities. Two black holes merge to form a single distorted black hole, which then sheds the distortions during the ringdown phase. The end state of a Neutron star merger is more complicated, and may involve the formation of a single massive Neutron star that later collapses to form a black hole. Both the massive Neutron star and the final black hole produce ringdown radiation.

In Sect. 3.3 we saw how gravitational waves can be computed in the weak field, slow motion limit. Using more sophisticated techniques, it is possible to continue the weak field expansion of Einstein’s equations order-by-order in the orbital velocity v in what is know as the post-Newtonian expansion (PN) of Einstein’s equations [45]. The PN expansion breaks down close to merger, and other approximations to Einstein’s equations have been developed to cover highly relativistic systems. The self-force program considers the motion of a small compact body with mass m, about a much larger compact body with mass M, and employs an expansion of Einstein’s equations in the small mass ratio parameter \(q=m/M\) [44, 53]. Another approach is to solve the Einstein equations numerically. Solving Einstein’s equations on a computer is a very challenging task, which has required the reformulation of Einstein’s equations and the development of many ingenious numerical techniques  [35]. The current state-of-the art numerical relativity simulations work well for moderate mass ratios and high velocities, but break down for system with large mass ratios and low velocities. Several schemes have been develop to bridge the gap between the PN approximation and numerical relativity, and to provide analytic waveforms that are valid through inspiral, merger and ringdown. The leading approach recasts the dynamics in terms of an effective one body (EOB) metric, using a transformation similar to that used to solve the Kepler problem in Newtonian gravity [9]. The reformulation improves the convergence properties of the PN expansion. The EOB description is completed by modeling the merger in terms of a single distorted black hole. The current coverage of the binary parameter space, expressed in terms of the orbital velocity v and the mass ratio \(q=m/M\), is shown in Fig. 28. For a recent review see Buonanno and Sathyaprakash [10].
Fig. 28

The regions of the binary system parameter space covered by state-of-the art Post-Newtonian (PN), Self Force (SF) and Numerical Relativity (NR) methods. The Effective One Body (EOB) approach seeks to extend the PN approximation to cover the final plunge, merger and ringdown

In these lectures I will give a brief introduction to the PN expansion, and I will skip the description of the self force program, EOB and numerical relativity. Excellent reviews of these other approaches, along with a far more thorough treatment of the the PN approach, can be found in Poisson [44], Van de Meent [53], Poisson and Will [45], Buonanno and Sathyaprakash [10], Brügmann [8].

6.1 Post-Newtonian Expansion

The PN approach is an expansion of the Einstein field equations in powers of \(GM/r \sim v^2\) where M is the total mass of the system, r is the orbital separation and v is the orbital velocity. Several different approaches have been used to compute the PN expansion, including matched asymptotic expansions, solution of the “relaxed” field equations, and effective field theory techniques. For brevity I will skip the derivations and simply quote some of the key results. The relative acceleration of two bodies is expanded:
$$\begin{aligned} \mathbf{a} = \underbrace{ {\mathbf{a}_\mathrm{N}} }_\mathrm{0 PN}+ \underbrace{ \mathbf{a}_\mathrm{1 PN}}_\mathrm{1 PN} + \underbrace{\mathbf{a}_\mathrm{SO}}_\mathrm{1.5 PN} + \underbrace{\mathbf{a}_\mathrm{2 PN}}_\mathrm{2 PN} + \underbrace{\mathbf{a}_\mathrm{SS}}_\mathrm{2 PN}+ \underbrace{\mathbf{a}_\mathrm{RR}}_\mathrm{2.5 PN} + \cdots \, . \end{aligned}$$
The leading order term is just the usual Newtonian acceleration
$$\begin{aligned} \mathbf{a}_\mathrm{N} = - {G M \over r^2} {\hat{\mathbf{r}}} . \end{aligned}$$
Th expressions for the high-order corrections are quite lengthy, so to simplify the notation I will adopt natural units: \(G=c=1\). The first order correction comes in at order \(v^2\), and is responsible for periastron precession:
$$\begin{aligned} \mathbf{a}_\mathrm{1 PN}= - {M \over r^2} \left\{ { \hat{r}} \left[ (1+3\eta )v^2 - 2(2+\eta ){M \over r} - {3 \over 2} \eta \dot{r}^2 \right] -2(2-\eta ) {\dot{r}}{} \mathbf{v} \right\} . \end{aligned}$$
Here \(\eta = m_1 m_2/M^2 = \mu /M\) is the dimensionless mass ratio, \(\mu = m_1 m_2/M\) is the reduced mass and \(M=m_1+m_2\) is the total mass. The next correction to the acceleration enters at order \(v^3\), and is due to spin-orbit couplingHere \(\delta m \equiv m_1 - m_2\) is the mass difference, \(\mathbf{S} \equiv \mathbf{S_1}+\mathbf{S_2}\) is the total spin and \({\varvec{\Delta }} \equiv M(\mathbf{S_2}/m_2 -\mathbf{S_1}/m_1)\). Two effects enter at order \(v^4\). The first is due to gravitational self interaction (gravity gravitates)
$$\begin{aligned}&\mathbf{a}_\mathrm{2 PN} = - {M \over r^2} \biggl \{ { \hat{r}} \biggl [ {3 \over 4} (12+29\eta ) \left( {M \over r} \right) ^2 + \eta (3-4\eta )v^4 + {15 \over 8} \eta (1-3\eta )\dot{r}^4 \nonumber \\& - {3 \over 2} \eta (3-4\eta )v^2 \dot{r}^2 - {1 \over 2} \eta (13-4\eta ) {M \over r} v^2 - (2+25\eta +2\eta ^2) {M \over r} \dot{r}^2 \biggr ] \nonumber \\& - {1 \over 2} \dot{r} \mathbf{v} \left[ \eta (15+4\eta )v^2 - (4+41\eta +8\eta ^2) {M \over r} -3\eta (3+2\eta ) \dot{r}^2 \right] \biggr \} , \end{aligned}$$
while the second is due to spin-spin coupling:
$$\begin{aligned} \mathbf{a}_\mathrm{SS} = - {3 \over \mu r^4} \biggl \{ {\hat{r}} (\mathbf{S}_1 \cdot \mathbf{S}_2) + \mathbf{S}_1 (\hat{r} \cdot \mathbf{S}_2) + \mathbf{S}_2 ( \hat{r} \cdot \mathbf{S}_1) - 5 { \hat{r}} ({ \hat{r} \cdot \mathbf{S}_1})(\hat{r} \cdot \mathbf{S}_2) \biggr \} \, . \end{aligned}$$
All the terms considered so far are time reversal invariant and do not cause the energy and angular momentum of the orbit to evolve. The first non-time-reversal-invariant term arises at order \(v^5\) and is due to radiation reaction:
$$\begin{aligned} \mathbf{a}_\mathrm{RR} = {8 \over 5} \eta {M^2 \over r^3} \left\{ \dot{r} {\hat{\mathbf{r}}} \left[ 18v^2 + {2 \over 3} {M \over r} -25 \dot{r}^2 \right] - \mathbf{v} \left[ 6v^2 - 2 {M \over r} -15 \dot{r}^2 \right] \right\} . \end{aligned}$$
The radiation reaction term causes the orbit to decay. To see this, consider a circular orbit with \(\dot{r}=0\) and \(v^2 = M/r\). The torque due to the radiation reaction force is given by
$$\begin{aligned} \varvec{\tau }_{RR} = \mu \mathbf{r} \times \mathbf{a}_\mathrm{RR} = -\frac{32}{5} \frac{\eta }{M} v^8 \mathbf{L}_\mathrm{N} \end{aligned}$$
where \(\mathbf{L}_\mathrm{N} =\mu \mathbf{r} \times \mathbf{v}\) is the Newtonian orbital angular momentum. Because the radiation reaction torque is directed against the orbital angular momentum, it causes the orbit to decay.
In addition to specifying the relative acceleration of the two masses, the PN equations of motion also describe the evolution of the spins of the two bodies, which evolve due to relativistic spin-orbit and spin-spin interactions. Up to 2 PN order the total angular momentum \(\mathbf{J} = \mathbf{L}+\mathbf{S}_1 + \mathbf{S}_2\) is conserved, and the spins and orbital angular momentum obey the precession equation
$$\begin{aligned}&\dot{\mathbf{L}}_\mathrm{N} = -{1 \over r^3} \biggl \{ \left[ \mathbf{L}_\mathrm{N }\times \left( {7 \over 2} \mathbf{S} + {3 \over 2}{\delta m \over m}{\varvec{\Delta }} \right) \right] + 3 ( \hat{r} \cdot \mathbf{S}_1)(\hat{r} \times \mathbf{S}_2) + 3( \hat{r} \cdot \mathbf{S}_2)( \hat{r} \times \mathbf{S}_1) \biggr \} , \nonumber \\&\dot{\mathbf{S}}_1 = {1 \over r^3} \biggl \{ (\mathbf{L}_\mathrm{N} \times \mathbf{S}_1)\left( 2+{3 \over 2} {m_1 \over m_2}\right) - \mathbf{S}_2 \times \mathbf{S}_1 + 3(\hat{r} \cdot \mathbf{S}_2) {{\hat{\mathbf{r}}} \times \mathbf{S}_1} \biggr \} , \nonumber \\&\dot{\mathbf{S}}_2 = {1 \over r^3} \biggl \{ (\mathbf{L}_\mathrm{N} \times \mathbf{S}_2)\left( 2+{3 \over 2} {m_1 \over m_2}\right) - \mathbf{S}_1 \times \mathbf{S}_2 + 3(\hat{r} \cdot \mathbf{S}_1) {{\hat{\mathbf{r}}} \times \mathbf{S}_2} \biggr \} . \end{aligned}$$
Figure 29 shows snapshots of how the individual spins \(\mathbf{S}_1\), \(\mathbf{S}_2\) and the orbital angular momentum \(\mathbf{L}\) evolve due to spin-orbit and spin-spin interactions at 2 PN order. At this order there is no dissipation and the magnitude of the spins and the orbital angular momentum are constant. The total angular momentum \(\mathbf{J}\) remains constant at 2 PN order. When dissipation is included the magnitude of \(\mathbf{L}\) and \(\mathbf{J}\) both decrease, but the orientation of the total angular moment \(\hat{J}\) and the magnitude of the spins remain constant to a high degree of accuracy. A complete analytic solution to the 2 PN order (non-dissipative) spin precession equations was only found quite recently. The motion can be described using Jacobi elliptic functions. The solution has since been extended to include 2.5 PN order dissipative effects.
Fig. 29

Snapshots showing an example of how the individual spins \(\mathbf{S}_1\), \(\mathbf{S}_2\) and the orbital angular momentum \(\mathbf{L}\) evolve due to spin-orbit and spin-spin interactions at 2 PN order

6.2 Circular Newtonian Binary

It is instructive to consider the gravitational waves generated by the simplest binary system imaginable—a circular binary at Newtonian order. Writing locations of the two masses as \(\mathbf{x}_1\) and \(\mathbf{x}_2\), the motion can be described using center-of-mass \(\mathbf{x}_\mathrm{COM} = (m_1 \mathbf{x}_1 + m_2 \mathbf{x}_2)/M\) and relative coordinates \(\mathbf{x} = \mathbf{x}_1 - \mathbf{x}_2 = r \, {\hat{r}}\). At leading Newtonian order \(\ddot{\mathbf{x}}_\mathrm{COM} = 0\) and \(\ddot{\mathbf{x}} = -(M/r^2) \hat{r}\). We can choose coordinates where the orbital motion is restricted to the xy plane and (Fig. 30)
$$\begin{aligned} \mathbf{x} = -r \sin (\omega t) \hat{x} + r \cos (\omega t) \hat{y} \end{aligned}$$
where the equations of motion demand \(\omega ^2 r^3 = M\). The individual masses follow the orbits
$$\begin{aligned} \mathbf{x}_1 = \frac{m_2}{M} \left( -r \sin (\omega t) \hat{x} + r \cos (\omega t) \hat{y}\right) \, , \quad \mathbf{x}_2 = \frac{m_1}{M} \left( r \sin (\omega t) \hat{x} - r \cos (\omega t) \hat{y}\right) \, . \end{aligned}$$
The mass quadrupole moment tensor is given by
$$\begin{aligned} Q^{ij} = \int d^3 x \, \rho (t,\mathbf{x}) \left( x^i x^j - \frac{1}{3} r^2 \delta ^{ij} \right) \end{aligned}$$
with mass density
$$\begin{aligned} \rho (t,\mathbf{x}) = m_1 \delta (\mathbf{x} - \mathbf{x}_1(t) ) + m_2 \delta (\mathbf{x} - \mathbf{x}_2(t) ) \, . \end{aligned}$$
The non-vanishing components of the mass quadrupole tensor are then given by
$$\begin{aligned}&Q^{xx} = \mu r^2 \left( \sin ^2(\omega t) -\frac{1}{3}\right) , \quad Q^{yy} = \mu r^2 \left( \cos ^2(\omega t) -\frac{1}{3}\right) , \nonumber \\&Q^{xy} = -\mu r^2 \sin (2\omega t), \quad Q^{zz} = -\frac{1}{3} \mu r^2 . \end{aligned}$$
To compute the gravitational wave strain we need the second time derivatives of these quantities, which are given by
$$\begin{aligned}&\ddot{Q}^{xx} = \frac{ 2 \mu M}{r} \cos (2 \omega t), \quad \quad \ddot{Q}^{yy} = - \frac{ 2 \mu M}{r} \cos (2 \omega t)\nonumber \\&\ddot{Q}^{xy} = \frac{ 4 \mu M}{r} \sin (2\omega t), \quad \quad \ddot{Q}^{zz} = 0 . \end{aligned}$$
Fig. 30

The coordinate systems we are using to describe a circular, Newtonian binary. Panel a is for a binary where the orbital angular momentum is aligned with the z axis of the coordinate system, while panel b is for the case where the orbit has a general orientation

Using the coordinate system defined in panel (a) of Fig. 30 and recalling Eq. (105), we find that the TT waveform \(h_{ij}^\mathrm{TT} =\frac{2}{R} \, P_{ijkl} \ddot{Q}^{kl}(t-r)\) has non-vanishing components given by
$$\begin{aligned}&h_{uu} = - h_{vv} = h_+ = \frac{ 2 \mu M}{r R} (1+\cos ^2\theta ) \cos (2 \omega t + 2\phi ),\nonumber \\&h_{uv} = h_{vu} = h_\times = \frac{ 4 \mu M}{r R} \cos \theta \sin (2 \omega t + 2\phi ) \, . \end{aligned}$$
For an orbit with a general orientation, as shown in panel (b) of Fig. 30 the waveform is given by
$$\begin{aligned}&h_{uu} = - h_{vv} =\cos (2\psi )h_+ + \sin (2\psi ) h_\times ,\nonumber \\&h_{uv} = h_{vu} = -\sin (2\psi )h_+ + \cos (2\psi ) h_\times \, , \end{aligned}$$
$$\begin{aligned}&h_+ = \frac{ 2 \mu M}{r R} (1+\cos ^2\iota ) \cos (2 \omega t),\nonumber \\&h_\times = \frac{ 4 \mu M}{r R} \cos \iota \sin (2 \omega t) \, . \end{aligned}$$
where \(\iota \) is the orbital inclination with respect to the line of sight to the binary and \(\psi \) is the polarization angle, which are given by
$$\begin{aligned}&\cos \iota = \hat{n}\cdot \hat{L}= -\hat{k}\cdot \hat{L},\nonumber \\&\tan \psi = \frac{\hat{v}\cdot (\hat{n}\times \hat{L})}{\hat{u}\cdot (\hat{n}\times \hat{L})} \, . \end{aligned}$$
Using the Kepler condition \(\omega ^2 r^3 = M\) we can re-express the gravitational wave amplitude in terms of the orbital angular frequency:
$$\begin{aligned}&h_+ = \frac{ 2 \mathcal{M}^{5/3} \omega ^{2/3} }{R} (1+\cos ^2\iota ) \cos (2 \omega t),\nonumber \\&h_\times = \frac{ 4 \mathcal{M}^{5/3} \omega ^{2/3}}{R} \cos \iota \sin (2 \omega t) \, . \end{aligned}$$
where \(\mathcal{M} = (m_1 m_2)^{3/5}/M^{1/5}\) is the chirp mass. Using Eq. (88) we can compute the energy radiated per unit solid angle averaged over a wave cycle:
$$\begin{aligned} \left( \frac{d P}{d \varOmega } \right) _\mathrm{quad} = \frac{R^2}{16 \pi } \langle \dot{h}_+^2 + \dot{h}_\times ^2 \rangle = \frac{2 \mu ^2 r^4 \omega ^6}{\pi } \left[ \left( \frac{1 + \cos ^2 \iota }{2}\right) ^2 + \cos ^2\iota \right] \,. \end{aligned}$$
Note that the emission depends on the inclination \(\iota \), but not on the polarization angle \(\psi \). Integrating over \(\iota ,\psi \) we find that the total energy flux is equal to
$$\begin{aligned} P_\mathrm{quad} = \frac{32 \mu ^2 r^4 \omega ^6}{5} = \frac{32}{5} (\mathcal{M} \omega )^{10/3} = \frac{32}{5} \eta ^2 v^{10}\,. \end{aligned}$$
We can now use an energy balance argument to compute the back-reaction on the orbit. The orbital energy in the Newtonian limit is given by
$$\begin{aligned} E = \frac{1}{2} \mu v^2 - \frac{\mu M}{r} = - \frac{\mu M}{2 r} = -\frac{1}{2} \mathcal{M}^{5/3} \omega ^{2/3} \,. \end{aligned}$$
Setting \(dE/dt = -P_\mathrm{quad}\) yields the balance equation
$$\begin{aligned} \dot{\omega }= \frac{96}{5} \mathcal{M}^{5/3} \omega ^{11/3} \,. \end{aligned}$$
Integrating the balance equation we find that the orbital frequency will grow with time:
$$\begin{aligned} \omega (t) = \frac{1}{\mathcal{M}} \left( \frac{5 \mathcal{M}}{256(t_c-t)}\right) ^{3/8} \, . \end{aligned}$$
Here \(t_c\) is a reference time where the orbital frequency formally becomes infinite. In reality the black holes or Neutron stars will merge before this point, and the divergence in the frequency indicates a break down of the PN description. Note that the amplitude of the signal is proportional to \(\omega ^{2/3}\), so both the frequency and amplitude increase with time. Signals that increase in volume and pitch are called chirps, and since the combination of masses \(\mathcal{M}\) sets the timescale for the evolution it is called the chirp mass. Higher order terms in the PN expansion modify the expression for the chirp (164), but even the leading order expression does a very good job of modeling the time evolution of a neutron star merger, as can been seen in the comparison between the binary neutron start signal GW170817 seen in the LIGO Livingston detector and leading order prediction for the frequency evolution with time shown in Fig. 31.
Fig. 31

A whitened spectrogram of the LIGO Livingston data showing the Neutron star merger GW170817. The yellow line visible in the upper panel shows the frequency evolution of the observed signal, while the dashed white line overlaying signal in the lower panel was generated using Eq. (164) with chirp mass \(\mathcal{M} = 1.19 M_\odot \)

Fig. 32

An example of a leading PN order chirp waveform

The orbital frequency evolution (164) can be integrated with respect to time to give the orbital phase evolution:
$$\begin{aligned} \varPhi (t) = \int \omega (t) dt = \varPhi _c - \frac{1}{32} \left( \frac{256(t_c-t)}{5 \mathcal{M}}\right) ^{5/8} \, , \end{aligned}$$
and the hence the full chirp signal
$$\begin{aligned} h(t) = \frac{4 \mathcal{M}}{R} \left( \frac{5 \mathcal{M}}{256(t_c-t)}\right) ^{1/4} \cos 2 \varPhi (t) \, . \end{aligned}$$
An example of a chirp waveform is show in Fig. 32.

6.3 Stationary Phase Approximation

PN waveforms are computed in the time domain, but in many cases gravitational wave data analysis is carried out in the frequency domain. The time dominated waveforms can be sampled and transformed to the frequency domain via a Fast Fourier Transform (FFT), but it is more efficient to analytically transform the waveforms using the stationary phase approximation (SPA). The SPA works very well for a wide range of PN waveforms.
Fig. 33

A graphical illustration of why the stationary phase approximation works. Near the stationary point \(t_*\) the integrand is slowly varying and gives a non-zero contribution. Away from the stationary point the integrand oscillates rapidly and averages to zero

The Fourier transform of a signal is defined:
$$\begin{aligned} \tilde{h}(f) = \int _{-\infty }^{\infty } h(t) e^{2\pi i ft} \, dt = \int _{-\infty }^{\infty } A(t) e^{-2\pi i \varphi (t)} e^{2\pi i ft} \, dt \, , \end{aligned}$$
where we have written h(t) in terms of the time dependent amplitude A(t) and gravitational wave phase \(\varphi (t)\). The SPA is computed at the stationary time \(t_*\) where \(\dot{\varphi }(t_*) = 2 \pi f\). Taylor expanding the phase about this point we have
$$\begin{aligned} \varphi (t) = \varphi (t_*) + 2 \pi f (t-t_*) + \frac{1}{2} \ddot{\varphi }(t_*) (t-t_*)^2 + \cdots \end{aligned}$$
The amplitude is assumed to be slowly varying and can be treated as constant near the stationary point (Fig. 33). The Fourier integral becomes
$$\begin{aligned} \tilde{h}(f)\simeq & {} A(t_*) e^{i(2\pi f t_*- \varphi (t_*))} \int _{-\infty }^{\infty } e^{-\frac{i}{2} \ddot{\varphi }(t_*) (t-t_*)^2} \, dt \nonumber \\= & {} A(t_*) e^{i(2\pi f t_*- \varphi (t_*) - \pi /4)} \left( \frac{2\pi }{|\ddot{\varphi }(t_*)|}\right) ^{1/2} \, . \end{aligned}$$
The SPA breaks down if A(t) varies too rapidly, or if \(\ddot{\varphi }(t_*)\) vanishes at the stationary point. The later can occur for systems with spin precession. Both conditions are violated for black hole ringdowns. Applying the SPA to the leading order PN waveform (165) yields
$$\begin{aligned} \tilde{h}(f) \simeq \left( \frac{5}{6}\right) ^{1/2} \frac{\mathcal{M}^{5/6}}{R\pi ^{2/3}} \, f^{-7/6} \, e^{i(2\pi f t_c -\phi _c -\pi /4 +3/4 (8 \mathcal{M} f)^{-5/3})} \, . \end{aligned}$$

6.4 Eccentric Newtonian Binary

Orbital eccentricity introduces several interesting effects in the waveforms. At Newtonian order the orbital motion is given by the Kepler solution for the orbital phase \(\phi \) and radial separation r:
$$\begin{aligned}&\phi = \phi _0 + \mathrm{arctan}\left( \left( \frac{1+e}{1-e}\right) ^{1/2} \tan \frac{u}{2} \right) \nonumber \\&r = a(1-e\cos u) \nonumber \\&\omega t = u - e\sin u \,. \end{aligned}$$
The orbital angular frequency \(\omega \) and the semi-major axis a are related by the Kepler equation: \(\omega ^2 a^3 = M\). The orbital eccentricity can be expressed in terms of the energy and angular moment as
$$\begin{aligned} e^2 = 1 + \frac{2 E L^2}{ M^2 \mu ^2} \, . \end{aligned}$$
The gravitational waveforms can be computed using the same steps as for the circular case, though the algebra gets considerably more involved. The final result for the plus and cross polarization states in the TT gauge are:We see that the waveforms now depend on the first, second and third harmonics of the orbital phase \(\phi \). But the harmonic structure is even richer than this as the orbital phase \(\phi \) does not evolve linearly with time. Indeed, each harmonic of the orbital frequency introduces an infinite collection of harmonics of the orbital frequency via the relation
$$\begin{aligned} \cos \phi = -e + \frac{2(1-e^2)}{e}\sum _{k=1}^{\infty } J_k(ke) \cos (k \omega t) \, , \end{aligned}$$
where \(J_k\) are Bessel functions of the first kind. In the limit \(e\rightarrow 0\) only the \(k=1\) term survives. Conversely, in the limit \(e\rightarrow 1\) it no longer make sense to describe the waveform in terms of individual harmonics, and we find instead that the signal is better described in terms of discrete bursts of radiation at periapse. Figure 34 shows the plus and cross polarizations for systems with \(e=0.5\) and \(e=0.9\) viewed at an inclination angle of \(\iota =60^\circ \).
Fig. 34

Plus and cross waveforms for eccentric binaries at Newtonian order. The panel on the left has \(e=0.5\) while the panel on the right has \(e=0.9\). Systems will large eccentricities produce a burst of radiation at periapse

The emission of gravitational waves takes away energy and angular momentum from the system. The energy and angular momentum loss can be computed using Eqs. (89) and (91):
$$\begin{aligned}&\frac{dE}{dt} = \frac{96 \mu ^2 M^3}{5 a^5} \frac{1}{(1-e^2)^{7/2}} \left( 1 + \frac{73}{24} e^2 + \frac{37}{96} e^4 \right) \nonumber \\&\frac{dL}{dt} = -\frac{32 \mu ^2 M^{5/2}}{5 a^{7/2}} \frac{1}{(1-e^2)^{2}} \left( 1 + \frac{7}{8} e^2 \right) \, . \end{aligned}$$
Combining these equations with \(E=-M\mu /(2a)\), \(L^2 = \mu ^2 M a(1-e^2)\) and \(\omega ^2 a^3 = M\) yields adiabatic evolution equations for the orbital frequency and the eccentricity:
$$\begin{aligned}&\frac{d\omega }{dt} = \frac{32 \mathcal{M}^{5/3} \omega ^{11/3}}{5} \frac{1}{(1-e^2)^{7/2}} \left( 1 + \frac{73}{24} e^2 + \frac{37}{96} e^4 \right) \nonumber \\&\frac{de}{dt} = -\frac{304 \mathcal{M}^{5/3} \omega ^{8/3}}{15} \frac{e}{(1-e^2)^{5/2}} \left( 1 + \frac{121}{304} e^2 \right) \, . \end{aligned}$$
If we drop terms of order \(e^2\) and higher the system can be recast as \(d\ln e /d\ln \omega \approx -19/18\), which tells us that systems loose roughly a decade in eccentricity for every decade in frequency. Typical binary systems are thought to form with low velocities and low orbital frequencies. Even if the binaries are initially very eccentric, the emission of gravitational waves will circularize the system before merger. There are however some exotic formation scenarios involving three body effects or gas discs that can result in systems entering the sensitive band of a gravitational wave detector with significant eccentricity, so it is important to include eccentricity in the waveform models so as to be able to detect system from these alternative formation channels. Figure 35 shows the plus and cross waveforms when radiation reaction is included. The evolution of the eccentricity and orbital frequency is also shown.
Fig. 35

The panel on the left shows the leading order plus and cross waveforms for an eccentric binary. The panel on the right shows the evolution of the orbital frequency and eccentricity for the same system

Going beyond the leading order Newtonian description of the orbital motion introduces qualitatively new effects, the most important being periapse precession, which enters at 1-PN order.

6.5 Spinning Binaries

The spin of the component masses can have a significant impact on the gravitational wave signal. Spin-orbits effects first enter at 1.5 PN order, and spin-spin effects enter at 2-PN order. Spin impacts the waveforms in two distinct ways. First, the spins impact the orbital dynamics and gravitational wave emission, and hence the phasing of the waves. The phasing is modified regardless of the relative orientation of the spins and the orbital angular momentum. Second, if the spins and the orbital angular momentum are not aligned (or anti-aligned), the spins and the orbital angular momentum will precess according to Eqs. (148). The precession of the orbital plane causes the inclination angle and polarization angle to vary with time according to Eq. (158), resulting in a modulation of the amplitude of the signal. For spin-precessing systems the phase modulations can oscillate, leading to oscillations in the frequency. Thus information about the spins is encoded in the amplitude modulation (AM) and frequency modulation (FM) of the signal. Note that these modulations lead to a break down of the stationary phase approximation, and more sophisticated methods have to be used to compute the Fourier transforms of the signals.
Fig. 36

Plus and cross waveforms at 2.5 PN order for a quasi-circular binary showing the effects of spin precession. The second set of panels zooms in on the signal close to merger. In this example the amplitude modulation is especially pronounced in the cross polarization

Figure 36 shows the plus and cross waveforms for a spinning binary computed at 2.5-PN order. The system shown had \(m_1= 20 M_\odot \), \(m_2= 15 M_\odot \), \(e=0\) and spin magnitudes \(\chi _1 = |\mathbf{S}_1|/m_1^2=0.7\) and \(\chi _2 = |\mathbf{S}_2|/m_2^2=0.5\). The spins were mis-aligned with the orbital angular momentum such that configuration at \(t=0\) had \(\mathrm{acos}(\hat{L} \cdot \hat{S}_1) = 85^\circ \), \(\mathrm{acos}(\hat{L} \cdot \hat{S}_2) = 82^\circ \) and \(\mathrm{acos}(\hat{S}_1 \cdot \hat{S}_2) = 110^\circ \).

7 Science Data Analysis

The output from a gravitational wave detector is a time series d(t). Usually we have multiple detectors and hence multiple times series. The data can be aggregated into a vector \(\mathbf{d}\), with components that are labeled by detector name and a time stamp. Fundamentally, gravitational wave data analysis is time-series analysis, and many of the standard tools of time series analysis get applied, such as band-pass filters, windows, FFTs, spectral estimators, wavelet transforms etc. In these lectures I will gloss over these low level (yet essential) data processing steps and focus on the higher level aspects of the analysis.

The literature on gravitational wave data analysis can be befuddling. There is talk of “matched filtering”, detection statistics, false alarm rates, time-slides and parameter estimation. I will get to all of those topics, but I will start with a much simpler description in terms of Bayesian inference [23, 51], where everything we need to know is summarized in a single function—the posterior distribution. Bayesian inference requires just two ingredients, the likelihood function—which turns out to be the noise model, and a prior, which turns out to be the signal model. Once those are defined the rest of the process is mechanical.
Fig. 37

The basic principle behind all gravitational wave data analysis is that the residual (data minus signal) should be consistent with noise

The data will have contributions from detector noise \(\mathbf{n}\) and gravitational wave signals \(\mathfrak {h} \). The response of the detectors we will be considering is linear, so we may write
$$\begin{aligned} \mathbf{d} = \mathbf{n}+ \mathfrak {h} \, . \end{aligned}$$
Given \(\mathbf{d}\) we would like to infer \(\mathfrak {h}\). To do so we need models for the instrument noise and for the gravitational wave signals. For now I will assume that the instrument noise is stochastic, and can be described by some probability distribution \(p(\mathbf{n})\). The gravitational wave signal model \(\mathbf{h}\) may be deterministic or stochastic, and might be based on solutions to the Einstein field equations or some more generic model, such as a collection of wavelets. In all cases we demand that the residual \(\mathbf{r} = \mathbf{d} - \mathbf{h}\), given by the data minus the model, must be consistent with noise: \(\mathbf{r} \sim p(\mathbf{n})\). In other words, the noise model defines the likelihood function. This basic principle is illustrated in Fig. 37.
To give a concrete example, consider a Gaussian noise model. The likelihood function is then
$$\begin{aligned} p(\mathbf{d} | \mathbf{h}, \mathcal{H}) = \frac{1}{(2\pi \, \mathrm{det}{} \mathbf{C})^{1/2}} \, e^{- \frac{1}{2}(\mathbf{d} - \mathbf{h}) \cdot \mathbf{C}^{-1} \cdot (\mathbf{d} - \mathbf{h})}\, , \end{aligned}$$
where \(\mathbf{C}\) is the noise correlation matrix. (Recall the way to read notation such as \(p(\mathbf{d}|\mathbf{h}, \mathcal{H})\) is: “the probability of observing the data \(\mathbf{d}\) given the presence of a gravitational wave signal \(\mathbf{h}\) under model \(\mathcal{H}\)”). The exponent in the likelihood is proportional to the chi-squared of the model, and is given by a double sum over detectors I and data samples k:
$$\begin{aligned} \chi ^2(\mathbf{h}) = (\mathbf{d} - \mathbf{h}) \cdot \mathbf{C}^{-1} \cdot (\mathbf{d} - \mathbf{h}) = (d_{Ik} - h_{Ik}) C^{-1}_{(I k)(J m)} (d_{Jm} - h_{Jm}) \, . \end{aligned}$$
In most cases the noise is uncorrelated between detectors: \(C_{(Ik)(Jm)} = \delta _{IJ} S_{Ikm}\), with \(S_{Ikm}\) the noise correlation matrix for detector I. An important point is that the noise matrix is itself an a priori unknown quantity that has to be inferred from the data. If the noise is stationary the correlations only depend on the time lag between samples, and the correlation matrix can be diagonalized by transforming to the frequency domain where
$$\begin{aligned} S_{Ikm} = \delta _{km} \, S_I(f_k) \, . \end{aligned}$$
The noise modeling is then reduced to inferring the power spectrum \(S_I(f_k)\) for each detector. In some applications, such as pulsar timing, where the data is unevenly sampled in time and the noise is highly non-stationary, the likelihood has to be computed directly in the time domain, and special techniques have to be used to tame the computational cost associated with inverting the large noise correlation matrices.
To find the posterior distribution for the signal model \(p( \mathbf{h}, \mathcal{H}| \mathbf{d})\) we apply Bayes theorem:
$$\begin{aligned} p( \mathbf{h}, \mathcal{H}| \mathbf{d}) = \frac{p(\mathbf{d} | \mathbf{h}, \mathcal{H}) p(\mathbf{h}, \mathcal{H})}{p(\mathbf{d}, \mathcal{H})} \, , \end{aligned}$$
where \(p(\mathbf{h}, \mathcal{H})\) is the prior the defines our signal model and \(p(\mathbf{d}, \mathcal{H})\) is the normalization factor
$$\begin{aligned} p(\mathbf{d}, \mathcal{H}) = \int p(\mathbf{d} | \mathbf{h}, \mathcal{H}) p(\mathbf{h} ,\mathcal{H}) d\mathbf{h} \, . \end{aligned}$$
The normalization factor is variously known as the marginal likelihood or the model evidence. Figure 38 shows examples of waveform posteriors computed by the BayesWave [14] algorithm for a collection of LIGO/Virgo events.
In many instances we are less interested in the waveforms themselves and more interested in the parameters that define the signals. In Bayesian inference we marginalize over (integrate out) quantities we are not interested in. For example, suppose that we have a model for the signals \(p(\mathbf{h}, \varvec{\theta }, \mathcal{H})\) that is described by parameters \(\varvec{\theta }\). We can marginalized over \(\mathbf{h}\) to arrive at a new (marginal) likelihood that only involves the model parameters:
$$\begin{aligned} p(\mathbf{d}| \varvec{\theta }, \mathcal{H}) = \int p(\mathbf{d} | \mathbf{h}, \varvec{\theta }, \mathcal{H}) p(\mathbf{h} , \varvec{\theta }, \mathcal{H}) d\mathbf{h} \, . \end{aligned}$$
The posterior distribution for the model parameters follows from Bayes’ theorem:
$$\begin{aligned} p( \varvec{\theta } \vert \mathbf{d}, \mathcal{H}) = \frac{ p(\mathbf{d} \vert \mathcal{H}, \varvec{\theta }) p(\varvec{\theta } \vert \mathcal{H}) }{p(\mathbf{d} \vert \mathcal{H})} \, , \end{aligned}$$
$$\begin{aligned} p(\mathbf{d} \vert \mathcal{H}) = \int p(\mathbf{d} \vert \mathcal{H}, \varvec{\theta }) p(\varvec{\theta } \vert \mathcal{H}) d\varvec{\theta } \, , \end{aligned}$$
Fig. 38

Waveform posteriors \(p( \mathbf{h}, \mathcal{H}| \mathbf{d})\) showing the 90% credible regions of the waveform reconstructions by the BayesWave algorithm for a collection of LIGO/Virgo detections

is the model evidence and \(p(\varvec{\theta } \vert \mathcal{H})\) is the prior on the signal parameters. Examples of signal models that are used in gravitational wave analyses include theoretical waveform templates for binary mergers, wavelet based models for generic bursts and probabilistic models for stochastic signals. Both the template based models and the wavelet based models map the gravitational wave signal to a parameterized function \(\mathbf{h}(\theta )\):
$$\begin{aligned} p(\mathbf{h} \vert \mathcal{H}, \varvec{\theta }) = \delta (\mathbf{h} - \mathbf{h}(\varvec{\theta })) \, . \end{aligned}$$
For Gaussian detector noise the marginal likelihood is then given by
$$\begin{aligned} p(\mathbf{d}| \varvec{\theta }, \mathcal{H}) = \frac{1}{(2\pi \, \mathrm{det}{} \mathbf{C})^{1/2}} \, e^{- \frac{1}{2}(\mathbf{d} - \mathbf{h}(\varvec{\theta })) \cdot \mathbf{C}^{-1} \cdot (\mathbf{d} - \mathbf{h}(\varvec{\theta }))}\, . \end{aligned}$$
If the noise is stationary and uncorrelated between detectors, the \(\chi ^2\) term in the likelihood can be written as
$$\begin{aligned} \chi ^2(\varvec{\theta }) = (\mathbf{d} - \mathbf{h}(\varvec{\theta }) | (\mathbf{d} - \mathbf{h}(\varvec{\theta })) \end{aligned}$$
where we have introduced the noise weighted inner product
$$\begin{aligned} (\mathbf{a} | \mathbf{b} ) = \sum _I \int _0^\infty \frac{ 2( \tilde{a}_I(f) \tilde{b}_I^*(f) + \tilde{a}_I^*(f) \tilde{b}_I(f))}{S_I(f)} \, df \, . \end{aligned}$$
Here the inner product has been written in its conventional form in terms of a integral over frequency. In practice the data will always have a finite duration \(T_\mathrm{obs}\) and a finite sample rate dt, and the integral gets replaced by a sum over frequencies \(f_k = k/T_\mathrm{obs}\) from \(k=0\) to \(k=T_\mathrm{obs}/(2 dt)\).

For binary mergers the waveform templates \(\mathbf{h}(\varvec{\theta })\) are built from the \(h_+,h_\times \) polarizations states computed using the techniques discussed in Sect. 6. The source frame waveforms have to be convolved with the instrument response as described in Sect. 4. For a fully general binary black hole system the parameter vector \(\varvec{\theta }\) will have 17 components. Seven of the parameters are time invariant: the two masses \(m_1,m_2\); the dimensionless spin magnitudes \(\chi _1 = |\mathbf{S}_1|/m_1^2\), \(\chi _2 = |\mathbf{S}_2|/m_2^2\), the sky location \((\theta ,\phi )\) and the luminosity distance to the source \(D_L\). The other ten parameters have to be referenced to some particular orbital separation, which defines a reference time \(t_*\). The parameters defined at \(t_*\) are the four spin components \(\hat{S}_1,\hat{S}_2\); the overall phase \(\phi _*\); the eccentricity e and periapse angle \(\phi _e\); and two angles that define the orientation of the total angular momentum vector \(\mathbf{J } = \mathbf{L} + \mathbf{S}_1 + \mathbf{S}_2\). There are many alternative ways to parameterize the signals. For example, the spin/orbit parameters can also be described in terms of the angles \((\theta _L, \phi _L)\), \((\theta _1,\phi _1)\) and \((\theta _2,\phi _2)\) between \(\mathbf{L}, \mathbf{S}_1,\mathbf{S}_2\) and \(\mathbf{J }\) at the reference time. The merger time \(t_c\) is often used to set the time reference, and the chirp mass \(\mathcal{M}\) and total mass M are often used in place of the individual masses \(m_1,m_2\). Priors can be placed on these parameters using information from past astronomical observations and from theoretical considerations. For example, we might assume that binary systems follow the distribution of galaxies on the sky, which at large distances goes over to a uniform distribution. The range of spin magnitudes can be limited to the region [0, 1] so as to avoid naked singularities.

With wavelet based models the templates are given by a sum of wavelets. For example, the original version of the BayesWave algorithm wrote the two polarization states as a sum of Morlet–Gabor continuous wavelets:
$$\begin{aligned}&h_+(\varvec{\theta })(t) = \sum _{i=1}^N A_i e^{-(t-t_i)^2/\tau _i^2} \cos (2 \pi f_i(t-t_i) + \phi _i) \nonumber \\&h_\times (\varvec{\theta })(t) = \varepsilon h_+(\varvec{\theta })(t) \, , \end{aligned}$$
where \(\varepsilon \) sets the ellipticity of the signal, the extremes being \(\varepsilon =\pm 1\) for circular polarization and \(\varepsilon =0\) for linear polarization. The number of wavelets, N, can be varied. The full template \(\mathbf{h}(\varvec{\theta })\) folds in the detector response, which brings in a dependance on the sky location and polarization angle. The full parameter vector \(\varvec{\theta }\) has dimension \(4+5N\).
For a stochastic gravitational wave signal, such as might be produced by inflation or other violent processes in the early Universe, the gravitational wave amplitudes \(h_+,h_\times \) are random variables. For a Gaussian stochastic signal we are interested in inferring the spectrum \(S_h(f)\). The prior on the signal model is then
$$\begin{aligned} p(\mathbf{h} \vert \mathcal{H}, \mathbf{S}_{h}) = \frac{1}{\sqrt{\mathrm{det}(2\pi \mathbf{S}_h)}} \, e^{-\frac{1}{2} \mathbf{h}^\dag \mathbf{S}_h^{-1} \mathbf{h}} \, . \end{aligned}$$
Theoretical models can provide priors on the shape and amplitude of the spectrum \(p(\mathbf{S}_h)\), and on the degree of anisotropy and polarization. For an un-polarized, statistically isotropic, stationary Gaussian stochastic background and un-correlated stationary Gaussian instrument noise, the integral in (183) yields the likelihood [15]
$$\begin{aligned} p(\mathbf{d}| \mathbf{S}_h) = \frac{1}{(2\pi \, \mathrm{det}{} \mathbf{G})^{1/2}} \, e^{- \frac{1}{2}(\mathbf{d} \cdot \mathbf{G}^{-1} \cdot \mathbf{d} )}\, , \end{aligned}$$
$$\begin{aligned} G_{(Ik),(Jm)}= \delta _{km}\left( \delta _{IJ} \, S_I(f_k) + \gamma _{IJ}(f_k) \,S_h(f_k)\right) \, . \end{aligned}$$
Here \( \gamma _{IJ}(f_k)\) describes how the common gravitational wave signal is correlated between detectors [49]. In the interferometry literature it is called the overlap reduction function, while in the pulsar timing literature it is called the Hellings–Downs curve. The stochastic signal can be separated from the stochastic noise since the signal is correlated between detectors, while the noise is not.

7.1 Posterior Distributions, Bayesian Learning and Model Evidence

The posterior distribution \(p( \varvec{\theta } \vert \mathbf{d}, \mathcal{H})\) and the model evidence \(p(\mathbf{d} \vert \mathcal{H})\) summarize everything about the model \(\mathcal{H}\) that can be gleaned from the data. The posterior distribution can be used to compute point estimates for each parameter such as the mean, median and mode, as well as credible intervals for one or more parameters. For example, the mean value for parameter \(\theta ^j\) is given by
$$\begin{aligned} \bar{\theta }^j = \int \theta ^j \, p( \varvec{\theta } \vert \mathbf{d}, \mathcal{H}) \, d\varvec{\theta } \, . \end{aligned}$$
The mode, or peak, of the posterior distribution defines the maximum a posteriori (MAP) parameter values \(\varvec{\theta }_\mathrm{MAP}\). In many instances the posterior distribution will be multi-modal, with multiple local maxima.
In typical gravitational wave analyses the parameter dimension is large, and it is not possible to plot the full posterior distribution. Instead lower dimensional marginal distributions are shown such as the probability distribution for a single parameter \(\theta ^j\):
$$\begin{aligned} p(\theta ^j )= \int p( \varvec{\theta } \vert \mathbf{d}, \mathcal{H}) \, \prod _{k\ne j} d \theta ^k \, , \end{aligned}$$
or the joint probability distribution for a pair of parameters \(\theta ^i,\theta ^j\):
$$\begin{aligned} p(\theta ^i,\theta ^j )= \int p( \varvec{\theta } \vert \mathbf{d}, \mathcal{H}) \, \prod _{k\ne i,j} d \theta ^k \, . \end{aligned}$$
It has become common practice to display posterior distributions using “corner plots” that show the joint distribution for each pair of parameters along with the marginal distribution for each individual parameter. An example of such a corner plot is shown in Fig. 39 derived from a simulation of a spinning binary black hole inspiral observed by the advanced LIGO/Virgo instruments. It is difficult to fit all \(d(d+1)/2=120\) corner plot panels for a \(d=15\) dimensional posterior into a single graph, so I have instead selected 11 of the more interesting parameters and shown them in two corner plots with 5 and 6 rows respectively.
Fig. 39

One and two dimensional marginalize posterior distributions for a simulated spinning binary black hole inspiral observed by the advanced LIGO/Virgo instruments. The colored regions in the two dimensional plots show the “Gaussian equivalent” 1-sigma, 2-sigma and 3-sigma regions. The shaded region in the one dimensional plots shows the 68% “1-sigma” credible regions for each parameter

In Fig. 39 the posterior density is colored in terms of credible regions that contain a designated fraction of the total probability. There are many different ways to define credible regions. Two of the more popular choices at central credible regions and minimum volume central regions. The credible regions show in Fig. 39 are of the minimum volume variety. Figure 40 illustrates the central and minimum area credible regions for a one dimensional probability distribution. The boundaries of the central credible region, \(x_\mathrm{min}\) and \(x_\mathrm{max}\), that contain a fraction \(\alpha \) of the total probability are defined:
$$\begin{aligned} \int ^{x_\mathrm{min}} p(x) dx = \int _{x_\mathrm{max}} p(x) dx = \frac{\alpha -1}{2} \, . \end{aligned}$$
Central credible intervals are often used to quote the uncertainties on individual parameters. For example, the mass parameters for the example shown in Fig. 39 can be quoted in terms of the mean values and the boundaries of the 1-sigma equivalent credible region: \(m_1=(14.95^{+0.38}_{-0.44})M_\odot \), \(m_2=(10.04^{+0.31}_{-0.21})M_\odot \). Note that the credible intervals in this case are not symmetric about the mean values.
The minimum volume credible region can be made up of multiple disjoint parts. It is defined by a minimum density \(p_*\) such that
$$\begin{aligned} \int \left[ p(x) \ge p_*\right] dx = \alpha \, . \end{aligned}$$
In other words, only regions with densities above \(p_*\) are included in the minimum volume credible region. By construction, the volume (here the length of the region with \(p(x)\ge p_*\)) will always less than or equal to that of any other credible region containing the same total probability.
Fig. 40

The shaded regions show two types of credible regions. Each contains the same fraction of the probability density. On the left we have the central credible region, and on the right we have the minimum area credible region

Bayesian inference is a pure expression of the scientific method. Bayes’ theorem describes how we learn from data, and the Bayesian odds ratio quantifies our belief in competing hypotheses. Bayes theorem describes how our prior belief \(p(\varvec{\theta })\) is updated to our posterior belief \(p(\varvec{\theta },\mathbf{d}_1)\) after incorporating, via the likelihood function, the information contained in data \(\mathbf{d}_1\). The updated probability distribution for \(\varvec{\theta }\) becomes our new prior, which can then be updated by additional data \(\mathbf{d}_2\) to give the posterior distribution \(p(\varvec{\theta },\mathbf{d}_1, \mathbf{d}_2)\). Note that we get the same result if we start with data \(\mathbf{d}_2\) to arrive at the posterior \(p(\varvec{\theta },\mathbf{d}_2)\), which then serves as the prior when incorporating data \(\mathbf{d}_1\) to yield \(p(\varvec{\theta },\mathbf{d}_2, \mathbf{d}_1)=p(\varvec{\theta },\mathbf{d}_1, \mathbf{d}_2)\). This is true whether the data are independent or dependent. The amount we learn about a hypothesis \(\mathcal{H}\) from data \(\mathbf{d}\) can be quantified in terms of how different the posterior distribution is from the prior distribution, which we can measured using the Kullback–Leibler divergence:
$$\begin{aligned} I_\mathrm{KL} = \int p( \varvec{\theta } \vert \mathbf{d}, \mathcal{H}) \log _2 \left( \frac{p( \varvec{\theta } \vert \mathbf{d}, \mathcal{H})}{p( \varvec{\theta } \vert \mathcal{H})} \right) d\varvec{\theta } \quad (\mathrm{bits}). \end{aligned}$$
If the posterior distribution is much more concentrated than the prior distribution then the information gain is large. Conversely, if someone has very strong prior beliefs, they learn little, even when confronted with a vast amount of evidence.
The probability that a model \(\mathcal{H}_k\) describes the data \(\mathbf{d}\) is given by Bayes’ theorem:
$$\begin{aligned} p(\mathcal{H}_k | \mathbf{d}) = \frac{ p(\mathbf{d} | \mathcal{H}_k) p( \mathcal{H}_k) }{ p(\mathbf{d}) = \sum _j p(\mathbf{d} | \mathcal{H}_j) p( \mathcal{H}_j)} \, . \end{aligned}$$
Here \(p(\mathbf{d} | \mathcal{H}_k)\) is the marginal likelihood, or evidence for model \(\mathcal{H}_k\) and \(p( \mathcal{H}_k)\) is our prior belief in the model. The numerator is a normalizing factor that sums (or integrates) over all possible models. In most applications it is impossible to write down all possible models, and we instead consider the Bayesian odds ratios between pairs of competing models:
$$\begin{aligned} \mathcal{O}_{ij} = \frac{ p(\mathcal{H}_i | \mathbf{d})}{p(\mathcal{H}_j | \mathbf{d}) }= \left( \frac{ p(\mathbf{d} | \mathcal{H}_i)}{p(\mathbf{d} | \mathcal{H}_j)} \right) \left( \frac{p( \mathcal{H}_i) }{p( \mathcal{H}_j)}\right) = \mathcal{B}_{ij} \mathcal{P}_{ij}\, . \end{aligned}$$
The unknown normalization factor cancels out in the odds ratio. The first term in parentheses is the evidence ratio, or Bayes factor \(\mathcal{B}_{ij}\) for the two models, while the second term in parentheses is the prior odds ratio \(\mathcal{P}_{ij}\). The Bayes factor can be used to measure the significance of an event by comparing the evidence for the noise-only model to the evidence for the signal+noise model. A word of caution however, is that utility of such Bayes factors are only as good as the models. For example, if you are using a Gaussian model for the noise, when in reality the noise is non-Gaussian, then the Bayes factors between the noise-only and signal\(+\)noise model will not be a reliable measure of whether an astrophysical signal is indeed present in the data.

7.2 Maximum Likelihood and the Fisher Information Matrix

Before continuing the discussion of signal detection and parameter estimation, it is instructive to digress a little and consider maximum likelihood parameter estimation and error forecasting using the Fisher Information Matrix. These topics are usually discussed in the classical, or frequentist, approach to gravitational wave detection, but they are also closely related to Taylor series expansion of Bayesian posterior distributions.

To simplify the discussion we will assume the noise is stationary and gaussian with a known spectrum. The log likelihood is then proportional to the chi-squared given in Eq. (188), and local maxima of the likelihood can be found by setting the derivative with respect to the model parameters to zero. The maximum likelihood solution can be found by Taylor expanding the signal model about the true parameters \(\varvec{\theta }_T\).
$$\begin{aligned} \mathbf h ( \varvec{\theta }) = \mathbf h _T+\partial _i\mathbf h _T\varDelta \theta ^i + \frac{1}{2}\partial _i \partial _j\mathbf h _T\varDelta \theta ^i \varDelta \theta ^j+ \mathcal {O}\left( \varDelta \theta ^{3}\right) . \end{aligned}$$
Here we are using the shorthand notation \(\mathbf h _T = \mathbf h ( \varvec{\theta }_T)=\mathfrak {h}\) and \(\partial _i\mathbf h _T = \partial _{\theta ^i} \mathbf h ( \varvec{\theta })\vert _{\varvec{\theta } = \varvec{\theta }_T}\). The Taylor expansion of the chi-squared is then
$$\begin{aligned} \chi ^2( \varvec{\theta } ) = (\mathbf n |\mathbf n ) - 2(\mathbf n |\partial _i\mathbf h _T)\varDelta \theta ^i+\left[ (\partial _i\mathbf h _T| \partial _j\mathbf h _T) - (\mathbf n |\partial _i\partial _j\mathbf h _T)\right] \varDelta \theta ^i \varDelta \theta ^j + \mathcal{O}(\varDelta \theta ^3) \, . \end{aligned}$$
Setting \(\partial _{k} \chi ^2( \varvec{\theta } ) = 0\) we find
$$\begin{aligned} \varDelta \theta ^k_\mathrm{ML} = \varGamma ^{k l}(\mathbf n |\partial _l\mathbf h _T) \, , \end{aligned}$$
where \(\varGamma _{kj} = (\partial _k\mathbf h _T| \partial _l\mathbf h _T)\) is the Fisher information matrix, and \(\varGamma ^{kj} = \varGamma _{kj}^{-1}\) is its matrix inverse. In arriving at Eq. (204) we have dropped terms coming from the inner product \((\mathbf n |\partial _k\partial _l\mathbf h _T)\) since they are of order \(\varDelta \theta \), and keeping them generates higher order corrections. What Eq. (204) tells us is that best-fit parameter values are perturbed away from the true parameter values by an amount that depends on how well variations in the signal can mimic noise. Put another way, the maximum likelihood solution is able to achieve a higher likelihood by fitting some of the noise with the signal model.
In the frequentist approach one considers multiple repetitions of the measurement and computes expectation values. Assuming zero mean, gaussian noise we have \(\mathrm{E}[\tilde{n}(f)] = 0\) and \(\mathrm{E}[\tilde{n}(f) \tilde{n}^*(f')] = \frac{1}{2}S_n(f) \delta (f-f')\). Using these expressions is it easy to show that
$$\begin{aligned} \mathrm{E}[\varDelta \theta ^k_\mathrm{ML}] = 0 \, , \end{aligned}$$
$$\begin{aligned} \mathrm{E}[\varDelta \theta ^k_\mathrm{ML} \varDelta \theta ^l_\mathrm{ML}] = \varGamma ^{kl} \, . \end{aligned}$$
In other words, the parameter errors should follow a Gaussian distribution with zero mean, and with a covariance matrix given by the inverse of the Fisher information matrix.
The frequentist viewpoint is reasonable if you are interested in forecasting the performance of an observatory, but it makes no sense to consider multiple realizations of an actual observation: you can not expect the Universe to repeat the same black hole merger multiple times. For actual observations it makes more sense to take a Bayesian approach. The best-fit parameters (in a maximum likelihood sense) are obtained in the same way, but now we interpret Eq. (204) to be the perturbation due to the actual noise realization present in the data. We can expand about the maximum likelihood solution by writing \(\varDelta \theta ^k = \varDelta \theta ^k_\mathrm{ML} + \delta \theta ^k\) and re-expanding the chi-squared:
$$\begin{aligned} \chi ^2( \varvec{\theta } ) = \chi ^2_\mathrm{ML} + \varGamma _{ij}\delta \theta ^i \delta \theta ^j + \mathcal{O}(\delta \theta ^3) \, . \end{aligned}$$
where \( \chi ^2_\mathrm{ML} = (\mathbf n |\mathbf n ) - \varGamma ^{ij} (\mathbf n |\partial _i\mathbf h _T)(\mathbf n |\partial _j\mathbf h _T)\) is the maximum likelihood value for the chi-squared. Rather than evaluate the derivatives in the Fisher matrix at the unknown true parameter values, we can evaluate them at the known maximum likelihood parameter values since the difference between the two is next order in the expansion. Thus we can approximate the likelihood in the vicinity of the maximum likelihood solution as
$$\begin{aligned} p(\mathbf{d} \vert \mathcal{H}, \varvec{\theta }) \simeq {\sqrt{\mathrm{det} ( \varvec{\varGamma }/2\pi )}} \, e^{-\frac{1}{2} \varGamma _{ij}\delta \theta ^i \delta \theta ^j} \, . \end{aligned}$$
We can perform a similar expansion of the posterior distribution about the MAP parameter values:
$$\begin{aligned} p( \varvec{\theta } \vert \mathbf{d}, \mathcal{H}) \simeq {\sqrt{\mathrm{det} ( \varvec{\varUpsilon }/2\pi )}} \, e^{-\frac{1}{2} \varUpsilon _{ij}\delta \theta ^i \delta \theta ^j} \, . \end{aligned}$$
$$\begin{aligned} \varUpsilon _{ij} = \varGamma _{ij} -\partial _i \partial _j \ln p( \varvec{\theta } \vert \mathcal{H}) \, . \end{aligned}$$
For flat priors the second term vanishes and the posterior matches the likelihood. The quadratic expansion of the posterior (209) is a multi-variate Gaussian distribution with correlation matrix \(\varvec{\varUpsilon }^{-1}\). The end result is that the Bayesian and frequentist approaches yield similar results in this instance, even though the philosophy behind the two approaches is very different.
The expansion about the MAP parameters can also be used to estimate the model evidence using the Laplace approximation for exponential integrals:
$$\begin{aligned} p(\mathbf{d} \vert \mathcal{H}) \approx p(\mathbf{d} \vert \mathcal{H}, \varvec{\theta }_\mathrm{MAP} )\frac{ p(\varvec{\theta }_\mathrm{MAP} \vert \mathcal{H})}{ \sqrt{\mathrm{det} ( \varvec{\varUpsilon }/2\pi )} }\, . \end{aligned}$$
The term \(1/\sqrt{\mathrm{det} (\varvec{\varUpsilon }/2\pi )}\) can be interpreted as the one-sigma posterior volume, \(V_\sigma \). For uniform priors, \(p(\varvec{\theta } \vert \mathcal{H}) = V^{-1}\), where V is the prior volume and \(p(\mathbf{d} \vert \mathcal{H}, \varvec{\theta }_\mathrm{MAP} )\) is equal to the maximum likelihood \(\mathcal{L}_\mathrm{max}\). Thus for uniform (or slowly varying) priors, we have
$$\begin{aligned} p(\mathbf{d} \vert \mathcal{H}) \approx \mathcal{L}_\mathrm{max} \left( \frac{ V_\sigma }{V}\right) \, . \end{aligned}$$
What this tell us is that the evidence is higher for models that fit the data better (higher \(\mathcal{L}_\mathrm{max}\)), but there is a Occam penalty (the second term in the above equation) that works against models that have a large number of parameters.

7.3 Frequentist Detection Statistics

Suppose that we have two models, one where a signal is present, \(\mathcal{H}_1\), and one where there is only noise \(\mathcal{H}_0\). Assuming uniform priors and using the Laplace approximation, the log Bayes factor between the model is given by
$$\begin{aligned} \log \mathcal{B}_{10} = \frac{p(\mathbf{d} \vert \mathcal{H}_1)}{p(\mathbf{d} \vert \mathcal{H}_0)}\approx & {} \log \left( \frac{p(\mathbf{d} \vert \mathcal{H}_1, \varvec{\theta }_\mathrm{ML})}{p(\mathbf{d} \vert \mathcal{H}_0)} \right) + \mathrm{Occam \; terms} \nonumber \\\approx & {} \log \mathcal{L}_\mathrm{max} + \mathrm{Occam \; terms} \end{aligned}$$
where \(\log \mathcal{L}_\mathrm{max}\) is the maximum of the log likelihood ratio. For Gaussian noise the log likelihood ratio is given by
$$\begin{aligned} \log \mathcal{L}= & {} \frac{1}{2}\left( (\mathbf{d} - \mathbf{h} | \mathbf{d} - \mathbf{h}) - (\mathbf{d}| \mathbf{d} ) \right) \nonumber \\= & {} (\mathbf{d} |\mathbf{h})-\frac{1}{2}(\mathbf{h} |\mathbf{h}) \, . \end{aligned}$$
A gravitational wave template \(\mathbf{h}\) can always be written in terms of an overall amplitude \(\rho \) such that \(\mathbf{h} = \rho \hat{\mathbf{h}}\) where \((\hat{\mathbf{h}} | \hat{\mathbf{h}}) = 1\). If we maximize the log likelihood ratio with respect to \(\rho \) by setting \(\partial _\rho \log \mathcal{L} =0\) we find
$$\begin{aligned} \rho = (\mathbf{d} | \hat{\mathbf{h}}) \, , \end{aligned}$$
$$\begin{aligned} \log \mathcal{L} = \frac{1}{2} \rho ^2 \, . \end{aligned}$$
The quantity \(\rho \) is referred to as the matched filter detection statistic, and can be used to define the matched filter signal-to-noise ratio. In the absence of the signal, \(\mathfrak {h} =0\), we have
$$\begin{aligned} \mathrm{E}[\rho _{\mathfrak {h} =0}]= & {} 0 \nonumber \\ \mathrm{Var}[\rho _{\mathfrak {h} =0}]= & {} \left( \mathrm{E}[\rho ^2_{\mathfrak {h} =0}] - \mathrm{E}^2[\rho _{\mathfrak {h} =0}]\right) = (\hat{\mathbf{h}} | \hat{\mathbf{h}}) = 1 \, . \end{aligned}$$
When a signal is present and it is matched by the the filter, \(\mathbf{h}= \mathfrak {h}\), we have
$$\begin{aligned} \mathrm{E}[\rho _{\mathbf{h}=\mathfrak {h}}] = ({\mathfrak {h}} | {\mathfrak {h}})^{1/2} \end{aligned}$$
giving an expected signal-to-noise ratio of
$$\begin{aligned} \mathrm{SNR} = \frac{\mathrm{E}[\rho _{\mathbf{h}=\mathfrak {h}}] }{\mathrm{Var}[\rho _{\mathfrak {h} =0}]^{1/2} } = ({\mathfrak {h}} | {\mathfrak {h}})^{1/2} \, . \end{aligned}$$
The maximization of the likelihood can be performed algebraically with respect to an overall constant amplitude \(\rho \) and phase \(\phi _0\) by writing
$$\begin{aligned} \mathbf{h}(\varvec{\lambda }) = \rho \mathbf{h}_c(\varvec{\kappa }) \cos (\phi _0) + \rho \mathbf{h}_s(\varvec{\kappa }) \sin (\phi _0) \, \end{aligned}$$
where \((\mathbf{h}_c, \mathbf{h}_c)=(\mathbf{h}_s| \mathbf{h}_s)=1\), \((\mathbf{h}_c| \mathbf{h}_s)=0\) and \(\varvec{\kappa } = \varvec{\lambda }/\{\rho , \phi _0\}\). The log likelihood ratio then becomes
$$\begin{aligned} \log \mathcal{L}(\varvec{\lambda })= & {} \rho (\mathbf{d} | \mathbf{h}_c(\varvec{\kappa }))\cos (\phi _0) + \rho (\mathbf{d} | \mathbf{h}_s(\varvec{\lambda }))\sin (\phi _0) -\frac{1}{2} \rho ^2 \nonumber \\= & {} \rho \varrho (\varvec{\kappa }) \cos (\phi _0 - \varphi (\varvec{\kappa } )) -\frac{1}{2} \rho ^2 \, . \end{aligned}$$
$$\begin{aligned} \varrho (\varvec{\kappa }) = \sqrt{ (\mathbf{d} | \mathbf{h}_c(\varvec{\kappa }))^2 + (\mathbf{d} | \mathbf{h}_s(\varvec{\kappa }))^2 } \, , \end{aligned}$$
$$\begin{aligned} \varphi (\varvec{\kappa }) = \mathrm{arctan} \left( \frac{(\mathbf{d} | \mathbf{h}_s(\varvec{\kappa }))}{(\mathbf{d} | \mathbf{h}_c(\varvec{\kappa }))} \right) \, . \end{aligned}$$
The likelihood is maximized by setting \(\phi _0 = \varphi (\varvec{\kappa })\) and \(\rho = \varrho (\varvec{\kappa })\) so that \(\log \mathcal{L}_{\mathrm{max} \; \{\rho , \phi _0\}}= \varrho ^2(\varvec{\kappa }) /2\). The quantity \(\varLambda (\varvec{\kappa }) = 2 \log \mathcal{L}_{\mathrm{max} \; \{\rho , \phi _0\}}= \varrho ^2(\varvec{\kappa })\) can be shown to follow a non-central chi-squared distribution with two degrees of freedom, \(\varLambda \sim \chi ^2_2(\mathrm{SNR}^2)\), with non-centrality parameter equal to the signal-to-noise ratio squared \(\mathrm{SNR}^2 = ({\mathfrak {h}} | {\mathfrak {h}})\). Absent a signal, the distribution reduces to a central chi-squared distribution with two degrees of freedom, otherwise known as a Rayleigh distribution.
Fig. 41

Probability distributions for the \(\varLambda \)-statistic in Gaussian noise for pure noise, and for data containing a \(\mathrm{SNR}=5\) signal

Figure 41 shows the theoretical probability distribution for the \(\varLambda \) detection statistic for pure noise, and for data containing a \(\mathrm{SNR}=5\) signal. A false alarm occurs when we incorrectly conclude there is a signal present. Setting a false alarm probability of <1% corresponds to requiring that \(\varLambda > 13.28\). A false dismissal occurs when we conclude there is no signal when one is indeed present. Using a \(1\%\) false alarm threshold implies that there is a \(4.38\%\) chance that we will falsely dismiss a \(\mathrm{SNR}=5\) signal. For the first detections of gravitational waves the LIGO and Virgo collaborations were very conservative, and demanded that the false alarm rate, i.e. the number of random events mistaken for gravitational wave signals per unit time, should be very small. Setting a false alarm rate (FAR) of one per 100,000 years over a one year stretch of observation corresponds to a false alarm probability of \(p_\mathrm{FA} = T_\mathrm{obs} \times \mathrm{FAR} = 10^{-5}\).

An important caveat to the preceding discussion is that the distribution shown for the \(\varLambda (\varvec{\kappa })\) pertains for fixed values of the parameters \(\varvec{\kappa }\). In an actual search the correct parameter values are not known a priori, and they must be searched over to find the values that maximize the log likelihood. The probability distribution for the search statistic maximized over all parameters, \(\varLambda _\mathrm{max}\), does not follow a Rayleigh distribution, and in all but the simplest cases must be computed numerically.

7.4 Searches for Gravitational Waves

The approach to detecting gravitational waves vary between and within collaborations. For example, the Parkes Pulsar Timing Array (PPTA) collaboration have traditionally used frequentist techniques, as have groups within the LIGO and Virgo collaborations that search for compact binary mergers. In contrast, the North American NanoHertz Gravitational Observatory (NANOGrav) collaboration takes a predominantly Bayesian approach, as does the LIGO-Virgo BayesWave group. For strong signals the likelihood is highly peaked around the maximum value, and the Bayesian and frequentist approaches yield very similar results. To keep the discussion focused, I will restrict attention to the template based searches for compact binary mergers of the kind performed by the LIGO and Virgo collaborations. As we saw in the previous section, the Bayesian evidence for a signal being present in stationary, Gaussian data can be approximated by the maximum likelihood statistic \(\varLambda \). In practice the LIGO and Virgo data are not perfectly stationary and Gaussian, and slightly different search statistics are used, and the probability distribution for these statistics under the noise hypothesis are derived empirically from the data. Since it is not known a priori if a given stretch of data contains a signal, the noise properties are determined by first scrambling the data to remove the possibility of detecting a signal. This is done by introducing relative time shifts to the data that are greater than the light travel time between the detectors, ensuring that any signals present appear as noise fluctuations in the shifted data. Just a few weeks of data can be used to simulate millions of years of signal-free observation, making it possible to estimate the probability distribution for the noise down to very small false alarm probabilities (or false alarm rates).

To compute the maximum likelihood statistic \(\varLambda \) (or its equivalent), the likelihood has to be computed and maximized. For some parameters the maximization can be performed algebraically, using the procedure described in Eqs. (220)–(223), while for others a direct search has to be performed. The direct search is usually performed by discretizing the parameter space and performing a grid search with a bank of templates. The spacing of the grid is chosen such that adjacent templates have significant overlap to ensure that signals that lie between grid points are not missed. When the inner-products in the likelihood are computed in the Fourier domain, the \(\varLambda \) statistic can be maximized with respect to the overall time offset \(t_0\) by computing the complex time series
$$\begin{aligned} z(\varvec{\kappa }', t_0) = 4 \int _0^\infty \frac{ \tilde{d}(f) \tilde{h}_c(\varvec{\kappa }; f)}{S_n(f)} \, e^{2 \pi f t_0} df \, = \varrho (\varvec{\kappa }', t_0) e^{i \varphi (\varvec{\kappa }', t_0)} \end{aligned}$$
where \(\varvec{\kappa }' = \varvec{\kappa }/\{t_0\} = \varvec{\lambda }/\{\rho , \phi _0, t_0\}\). The complex time series can be efficiently computed by an inverse fast Fourier transform (iFFT), and the maximum value of \(\varrho (\varvec{\kappa }, t_0)\) read off directly from the iFFT. The maximization procedure can be applied to networks of detectors by extending the inner products to a sum over the detector network, and likewise generalizing Eq. (224). With a network of detectors the maximization over \(\rho ,\phi _0\) can be be generalized to cover a larger set of parameters, including the inclination and polarization angles, using an extension of the method described here, yielding a quantity known as the F-statistic. The overall time shift \(t_0\) can still be maximized over, but the difference in arrival times between detector form part of the collection of parameters \(\varvec{\kappa }'\) that still have to be searched over. In principle the method described here can be used to maximize \(t_0\) over the entire observation time, but in practice the data is broken up into smaller chunks and the maximization performed on each chunk separately. This is done because the data has occasional gaps when one or more instruments are off-line, and because the noise is not perfectly stationary, so the spectral densities \(S_n(f)\) change over time and have to be computed for each chunk. Another benefit of performing the search on short stretches of data is that signals can be picked up in near real time, allowing alerts to be sent out so that electromagnetic observatories can look for counterparts to the gravitational wave events.
Once the likelihood has been maximized with respect to amplitude, phase and arrival time, it remains to maximize over parameters that control the shape of the signal, such as the masses and spins of a binary system. This is typically done using a grid search over a bank of unit normalized waveform templates. Ideally the grid is laid out to provide uniform coverage of the signal space, with a spacing that ensures that only a very small fraction of potentially detectable signals are missed. In practice it is difficult to achieve perfectly uniform spacing, and the computational cost of evaluating the likelihood at each point on the grid requires a trade off between coverage and speed. A useful analogy is that of a trawler pulling a fishing net. If the holes in the net are too large then smaller fish will not get caught, but if the holes in the net are too small the drag will slow down the trawler and limit the number of fish that can be caught. The placement of the grid is guided by considering the overlap between two signals with parameters \(\varvec{\lambda }\) and \(\varvec{\theta }\), expressed in terms of the match
$$\begin{aligned} \mathrm{M}(\varvec{\lambda }, \varvec{\theta }) = \frac{ (\mathbf{h}(\varvec{\lambda }) | \mathbf{h}(\varvec{\theta }))}{\sqrt{(\mathbf{h}(\varvec{\lambda }) | \mathbf{h}(\varvec{\lambda }) (\mathbf{h}(\varvec{\theta }) | \mathbf{h}(\varvec{\theta })} }\, . \end{aligned}$$
For nearby signals we can write \(\varvec{\theta } = \varvec{\lambda } + \varDelta \varvec{\lambda }\) and Taylor expand in \(\varDelta \lambda ^\mu \):
$$\begin{aligned} \mathrm{M}(\varvec{\lambda }, \varvec{\lambda } + \varDelta \varvec{\lambda }) = 1 - \frac{1}{2}\left( \frac{(h_{,\mu } | h_{,\nu })}{(h|h)} - \frac{(h| h_{,\mu )}(h| h_{,\nu }) }{(h|h)^2} \right) \varDelta \lambda ^\mu \varDelta \lambda ^\nu + \cdots \, . \end{aligned}$$
The quantity in brackets is called the template metric \(g_{\mu \nu }\), which defines a distance measure in the Riemannian geometry associated with the inner product \((\mathbf{a}|\mathbf{b})\) [41, 42]. We recognize the first term in the template metric as the Fisher matrix divided by the signal-to-noise ratio squared. Using \(\mathbf{h} = \rho \hat{\mathbf{h}}\) we see that the second term is equal to \(\varGamma _{\rho \mu } \varGamma _{\rho \nu } /(\rho ^2 \varGamma _{\rho \rho })\) so that
$$\begin{aligned} g_{\mu \nu }= & {} \frac{(h_{,\mu } | h_{,\nu })}{(h|h)} - \frac{(h| h_{,\mu })(h| h_{,\nu })}{(h|h)^2} \nonumber \\= & {} \frac{1}{\rho ^2} \left( \varGamma _{\mu \nu } - \frac{\varGamma _{\rho \mu } \varGamma _{\rho \nu } }{\varGamma _{\rho \rho }}\right) \, . \end{aligned}$$
We recognize the term in brackets in the second line of the above equation to be the Fisher matrix projected onto a sub-space that is independent of the overall amplitude \(\rho \). Since the likelihood is directly maximized with respect to \(\phi _0\) and \(t_0\) these terms can be removed from the template metric using a sequence of projections:
$$\begin{aligned} g'_{\mu \nu } = g_{\mu \nu } - \frac{g_{\phi _0 \mu }g_{\phi _0 \nu }}{g_{\phi _0 \phi _0}} \end{aligned}$$
$$\begin{aligned} g''_{\mu \nu } = g'_{\mu \nu } - \frac{g'_{t_0 \mu }g'_{t_0 \nu }}{g'_{t_0 \phi _0}} \, . \end{aligned}$$
The match, maximized over amplitude, phase and time offset, is equal to the fitting factor
$$\begin{aligned} \mathrm{FF} = 1 - \frac{1}{2} g''_{\mu \nu } \varDelta \lambda ^\mu \varDelta \lambda ^\nu \, , \end{aligned}$$
which defines the fraction of the signal-to-noise ratio of the signal \(\mathbf{h}(\varvec{\lambda })\) that can be captured by the template \(\mathbf{h}(\varvec{\lambda }+\varDelta \varvec{\lambda })\). Since the signal-to-noise ratio scales inversely with distance, and since the volume of space grows as the cube of the distance, the fraction of detectable events captured by the grid search scales as \(\mathrm{FF}^3\). Demanding that at least 90% of events are detected sets a threshold of \(\mathrm{FF} \sim 0.97\). Placing the templates on a hyper-cubic lattice to ensure a \(\ge \) \(\mathrm{FF}\) overlap yields cells with volume  [41, 42]
$$\begin{aligned} \varDelta V = 2^{d} \left( \frac{(1-\mathrm{FF})}{d}\right) ^{d/2} \end{aligned}$$
where \(d=\mathrm{dim}(\varvec{\kappa '})= \mathrm{dim}(\varvec{\lambda })-3=D-3\). The total number of templates required is equal to the total parameter volume \(V= \int \sqrt{ g''} \, d^d \kappa \) divided by the cell size \(\varDelta V\).

7.5 Bayesian Parameter Estimation

We have seen that Bayesian inference can be used to compute posterior distributions for the gravitational waveforms \(\mathbf{h}( \varvec{\theta })\) and the parameters \(\varvec{\theta }\) that describe the signal model, and additionally the model evidence. Bayes’ theorem tells us that once the signal model and likelihood are defined and the prior distributions specified, the calculation of the posterior distributions and evidence comes down to computing a challenging multi-dimensional integral. It is only in the last two decades that efficient computational techniques, coupled with a increase in micro-processor speed, have made it possible to carry out the necessary computations for real-world applications. Bayesian inference is now rapidly supplanting classical (frequentist) statistics in many branches of science, including gravitational wave astronomy. There are two main approaches used to carry out the Bayesian computation. The first is the Markov Chain Monte Carlo (MCMC) approach  [7, 24], which has its roots in statistical mechanics, and the second is Nested Sampling [52], which uses a stochastic Lebesgue integration technique. The MCMC approach produces samples from the posterior distribution without directly evaluating the evidence integral, while Nested Sampling computes the evidence without directly sampling the posterior distribution. With a little extra work the MCMC approach can be used to compute the evidence, and the posterior distributions can be recovered as a by-product of the Nested Sampling approach, so both methods provide a comprehensive framework in which to carry out Bayesian inference. In my own research I exclusively use the MCMC approach as I find it to be better suited to the kinds of models I work with, which are generally of the trans-dimensional variety. Trans-dimensional modeling expands the usual sampling of model parameters to sampling across models in a large model space. Here I will focus on the MCMC approach as it is the one I am most familiar with.

A Markov process is a stochastic process where the current state depends only on the previous state. A Markov process is uniquely defined by the transition probability \(p(\mathbf{x} | \mathbf{y})\) from state \(\mathbf{x}\) to state \(\mathbf{y}\), and is characterized by a unique stationary distribution \(\pi (\mathbf{x})\) if the transitions are reversible and satisfy detailed balance \(p(\mathbf{x}, \mathbf{y} ) = p(\mathbf{y} | \mathbf{x}) \pi (\mathbf{x}) = p(\mathbf{x} | \mathbf{y}) \pi (\mathbf{y})\), and are additionally aperiodic and positive recurrent (so that the return time to a given state is finite). The transition probability can be factored into the product of a proposal distribution \(q(\mathbf{y} | \mathbf{x})\) and an acceptance probability \(H(\mathbf{y} | \mathbf{x})\):
$$\begin{aligned} p(\mathbf{y} | \mathbf{x}) = q(\mathbf{y} | \mathbf{x}) H(\mathbf{y} | \mathbf{x}) \, . \end{aligned}$$
Substituting this expression into the detailed balance condition we have
$$\begin{aligned} \frac{H(\mathbf{y} | \mathbf{x})}{H(\mathbf{x} | \mathbf{y})} = \frac{ \pi (\mathbf{y}) q(\mathbf{x} | \mathbf{y})}{ \pi (\mathbf{x}) q(\mathbf{y} | \mathbf{x}) }\, . \end{aligned}$$
Metropolis and Hastings suggested the choice
$$\begin{aligned} H(\mathbf{y} | \mathbf{x})= \mathrm{min}\left( 1, \frac{ \pi (\mathbf{y}) q(\mathbf{x} | \mathbf{y})}{ \pi (\mathbf{x}) q(\mathbf{y} | \mathbf{x})} \right) \, , \end{aligned}$$
which automatically satisfies the detailed balance condition (233) since either \(H(\mathbf{y} | \mathbf{x})=1\) and \(H(\mathbf{x} | \mathbf{y}) = \pi (\mathbf{x}) q(\mathbf{y} | \mathbf{x})/(\pi (\mathbf{y}) q(\mathbf{x} | \mathbf{y}))\) or \(H(\mathbf{x} | \mathbf{y})=1\) and \(H(\mathbf{y} | \mathbf{x})= \pi (\mathbf{y}) q(\mathbf{x} | \mathbf{y})/(\pi (\mathbf{x}) q(\mathbf{y} | \mathbf{x}))\). In our application we want to use the Metropolis–Hastings algorithm to generate the posterior distribution so we set \(\pi (\mathbf{x}) = p(\mathbf{x} | \mathbf{d}, M)\) resulting in the acceptance probability for the state transition \(\mathbf{x} \rightarrow \mathbf{y}\):
$$\begin{aligned} H(\mathbf{y} | \mathbf{x})= \mathrm{min}\left( 1, \frac{ p(\mathbf{d}|, \mathbf{y}, M) p(\mathbf{y}, M) q(\mathbf{x} | \mathbf{y})}{ p(\mathbf{d}|, \mathbf{x}, M) p(\mathbf{x}, M) q(\mathbf{y} | \mathbf{x})} \right) \, . \end{aligned}$$
Note that the evidence \(p(\mathbf{d}, M)\) cancels in the Metropolis–Hastings (MH) ratio, so we only need the prior and likelihood, as illustrated in Fig. 42.
Fig. 42

How the sausage is made: the Metropolis–Hastings MCMC algorithm is a flexible approach for carrying out Bayesian inference

The MCMC algorithm proceeds by drawing some initial state \(\mathbf{x}_{i=1} \sim p(\mathbf{x}, M)\) followed by the loop
  • propose a new state \(\mathbf{y} \sim q(\mathbf{y} | \mathbf{x}_i)\)

  • evaluate the MH ratio \(H(\mathbf{y} | \mathbf{x}_i)\)

  • draw a random deviate \(\alpha \sim U(0,1)\)

  • if \(H(\mathbf{y} | \mathbf{x}_i) > \alpha \) accept the new state, \(\mathbf{x}_{i+1}=\mathbf{y}\), otherwise \(\mathbf{x}_{i+1}=\mathbf{x}_i\)

  • increment \(i\rightarrow i+1\) and repeat

Assuming the process has converged, the collection of samples \(\{ \mathbf{x}_1, \mathbf{x}_2, \ldots \}\) generated by this algorithm represent fairs draws from the posterior distribution \(p(\mathbf{x} | \mathbf{d}, M)\). The posterior samples can be used to estimate confidence intervals etc. It is often necessary to discard some number of samples from the beginning of the chain since it can take many iterations before the chain locks onto the region of high posterior density (known as the burn-in phase). With efficient proposal distributions the burn-in phase can be kept very short. The number of iterations needed depends on several factors. One factor is the degree of correlation between successive samples. The MH procedure generates correlated samples, and the degree of correlation can be measured by computing, for example, the auto-correlation length for each parameter. The number of independent samples can be estimated by dividing the total number of samples by the auto-correlation length of the most highly correlated parameter. But then there is the question of how many independent samples are needed, to which the answer depends on what you want to compute, and to what accuracy. For example, it takes many more samples to estimate a \(95\%\) credible region to \(1\%\) relative error than it does to estimate a \(90\%\) credible region to \(10\%\) relative error. The cost also increases with dimension, for example computing credible regions for the 2-d sky position of a source takes many more samples than computing the equivalent credible region for just the azimuthal angle.

The most important ingredient in a MCMC implementation is the proposal distribution. From (233) we see that the ideal proposal distribution would be the target distribution, \(q(\mathbf{x} | \mathbf{y}) = \pi (\mathbf{x})\), since then \(H(\mathbf{y} | \mathbf{x})= 1\) and every proposed jump would be accepted, and each sample would be independent. But if we knew the target distribution (in our case the posterior distribution), and how to draw from it, there would be no need to perform the MCMC! In lieu of using the posterior distribution, we can instead compute approximations to the posterior distribution and use those as proposal distributions. For example, we can approximate the posterior distribution in the neighborhood of a local maximum using multivariate normal distributions with correlation matrices given by the inverse of the Fisher information matrix, as was done in Eqs. (208) and (209). To do this we need to locate maxima of the likelihood, which can be done using the algebraically maximized log likelihood employed in the searches (see Sect. 7.4), and either a grid search, or more efficient maximization schemes such as random re-start hill climbers, particle swarms, or genetic algorithms. Finding maxima of the likelihood surface can be computationally challenging, especially when the model dimension its high and/or the likelihood is expensive to compute, making it necessary to reduce the parameter dimension by ignoring less important parameters, and by using approximations to the likelihood function that are less expensive to compute. These approximate maps of the likelihood surface make for good global proposal distributions that can help the MCMC explore all the modes of a multi-modal posterior distribution.

The Fisher matrix approximation (208) also serves as a good local proposal distribution [13] as it takes into account correlations between parameters. To draw from the multi-variate normal distribution (208) we first find the eigenvalues \(e_i\) and associated eigenvectors \(\mathbf{v}_i\) of the Fisher matrix, then propose jumps:
$$\begin{aligned} \mathbf{y} = \mathbf{x}_{i} + \frac{\beta }{\sqrt{e_j}}\, \mathbf{v}_j \, , \end{aligned}$$
where \(j \sim U[1,\mathrm{dim}(\mathbf{x})]\) and \(\beta \sim \mathcal{N}(0,1)\). The scaling by \(1/\sqrt{e_j}\) yields a \(68\%\) acceptance rate if the Fisher matrix provides a faithful description of the posterior distribution. In practice the acceptance rate is lower due to the approximation being imperfect.
Another proposal distribution that is very effective at exploring parameter equations goes by the name differential evolution (DE) [6]. The procedure is very simple. First we collect some subset of \(N_h\) past samples from the Markov chain, called the history, \(\{ \mathbf{z} \}\), then propose a jump:
$$\begin{aligned} \mathbf{y} = \mathbf{x}_{i} + \gamma (\mathbf{z}_j - \mathbf{z}_k) \end{aligned}$$
where \(j ,k \sim U[1,N_h]\) and \(\gamma \sim \mathcal{N}(0,1.68/\sqrt{\mathrm{dim}(\mathbf{x})})\). Here the scaling of the jumps, \(\gamma \), is optimal for posteriors that follow a multi-variate normal distribution. The idea behind DE is that the vector \(\mathbf{z}_j - \mathbf{z}_k\) connecting past samples provides a good guess for the separation of future samples. In many applications DE has proven to be an incredibly effective proposal distribution, especially in situations where there are strong correlations between parameters. Some care has to be taken when employing DE as the use of past samples means that it is not strictly Markovian. The procedure can be shown to be asymptotically Markovian, meaning that if iterated long enough the samples will approach the stationary distribution. In practice this means having a sufficient number of independent posterior samples in the history. I typically keep \(10^3\) samples in a rolling history file by adding every 100th sample from the chain, and discarding the oldest sample from the history.
Fig. 43

The standard MCMC recipe used by the Montana gravitational wave astronomy group

The final ingredient in my standard MCMC recipe, shown in Fig. 43, is to run multiple chains in parallel, and to allow exchanges between the chains. This procedure is variously called Parallel Tempering or Replica Exchange. The term tempering is taken from simulated tempering (also called simulated annealing), wherein the likelihood is flattened by raising it to a fractional power \(\beta \), known as the inverse temperature. The terminology is borrowed from statistical mechanics and metallurgy, with the log likelihood playing the role of the energy. Each chain explores the annealed posterior distribution
$$\begin{aligned} p( \varvec{\theta } \vert \mathbf{d}, \mathcal{H})_{\beta _i} = \frac{ p(\mathbf{d} \vert \mathcal{H}, \varvec{\theta })^{\beta _i} p(\varvec{\theta } \vert \mathcal{H}) }{p(\mathbf{d} \vert \mathcal{H})_{\beta _i}} \, . \end{aligned}$$
Only samples from the \(\beta _1=1\) “cold” chain can be used to define valid credible intervals etc, but the other chains serve an important purpose through parameter exchanges, and as a means to compute the model evidence. Periodically, swaps are proposed between the current state \(\mathbf{x}_i\) of the \(\beta _i\) chain and the current state \(\mathbf{x}_j\) of the \(\beta _j\) chain, and the swaps are accepted with probability
$$\begin{aligned} H_{ij} = \mathrm{min}\left( 1, \frac{ p(\mathbf{d} \vert \mathcal{H}, \mathbf{x}_j )^{\beta _i} p(\mathbf{d} \vert \mathcal{H}, \mathbf{x}_i )^{\beta _j} }{p(\mathbf{d} \vert \mathcal{H}, \mathbf{x}_i )^{\beta _i} p(\mathbf{d} \vert \mathcal{H}, \mathbf{x}_j)^{\beta _j}} \right) \, . \end{aligned}$$
Note that only the likelihoods appear in the exchange probability. Parallel tempering is very effective at exploring multi-modal posteriors, as the hot chains explore a much flatter likelihood landscape that allows for free movement between local maxima, while the cold chains tend to lock onto high probability solutions and serve as “memory” for the ensemble (Fig. 44).
Fig. 44

Parallel tempering employs multiple chains on an inverse temperature ladder. Chains with higher temperatures (smaller \(\beta \)’s) see a flatter likelihood surface (left panel) and are able to move more freely between local maxima. Exchanges between the chains (right panel) allow for the information to be shared and for good solutions to filter down to the cold chain, where the posterior samples are stored

The spacing of the temperature ladder and the temperate range covered must be carefully chosen. A good rule of thumb is that the hottest chain should have \(\beta \approx 1/\mathrm{SNR}^2\). The reasoning is that \(\beta \) factor re-scales the noise weighted inner product such that the effective signal-to-noise is \(\mathrm{SNR}^2_\beta = \beta \, \mathrm{SNR}^2\), and we want \(\mathrm{SNR}^2_\beta \approx 1\) for the hottest chain, rending the annealed likelihood sufficiently flat that the hot chain explores the full prior volume. The spacing of the chains has to be chosen such that chain exchanges are often accepted. If the chains are too widely spaced the inter-chain exchange probability gets very small, and the chains stop communicating. Conversely, if the chains are spaced too closely it takes a prohibitively large number of chains to cover the necessary temperature range. If the likelihood is well approximated by a multi-variate normal distribution it can be shown that the optimal spacing in geometric: \(\beta _{i+1} = c \, \beta _i\) for some constant c. For the more complicated likelihood surfaces encounter in real-world analyses it is often necessary to use adaptive schemes to find the optimal placement of the temperature ladder. For simple examples a geometric spacing with \(c=0.8\) is usually a good choice, which for a \(\mathrm{SNR}=20\) signal requires \(N_c = -2 \log (\mathrm{SNR})/\log c = 27\) chains for full coverage.

A very useful by-product of parallel tempering is that it allows us to compute the model evidence \(p(\mathbf{d} \vert \mathcal{H})\). The calculation is modeled after the calculation of the partition function in statistical mechanics:
$$\begin{aligned} \log p(\mathbf{d} \vert \mathcal{H}) = \int _{0}^{1} \mathrm{E}[\log p(\mathbf{d} \vert \mathcal{H}, \mathbf{x})]_{\beta }\, d\beta \, \end{aligned}$$
where the expectation value is computed with respect to the annealed posterior distribution \(p( \varvec{\theta } \vert \mathbf{d}, \mathcal{H})_{\beta }\). In practice we can approximate the above integral by the sum over the average likelihood at each inverse temperatures \(\beta _i\), multiplied by the temperate spacing \(\varDelta \beta = \beta _{i+1}-\beta _i\).

7.6 Worked Example—Sinusoidal Signal

As a simple example, consider the signal from a slowly evolving binary system observed by a single LIGO-type detector. To keep the analysis tractable we will assume that the observation time \(T_\mathrm{obs}\) is large compared to the orbital period \(T=2/f_0\), but small compared to the evolution chirp timescale \(\tau = f/\dot{f}\). We will further assume that the observation time is short compared to timescale over which the detector moves so we can treat the detector as static. In other words, we are considering multiple cycles of a monochromatic signal with gravitational wave frequency \(f_0\). Combining Eqs. (110) and (159) yields the detector response
$$\begin{aligned} h(t)= & {} \frac{2 \mathcal{M}^{5/3} \omega ^{2/3} }{R} \left( F_+(\varvec{\varOmega }, \psi ) (1+\cos ^2 \iota ) \cos (2 \omega (t-t_0) +2 \varphi _0) \right. \nonumber \\&\left. +\, F_\times (\varvec{\varOmega }, \psi ) (2\cos \iota ) \sin (2 \omega (t-t_0)+2\varphi _0) \right) \nonumber \\= & {} A_0 \cos (2 \pi f_0 (t-t_0)+\phi _0) \end{aligned}$$
where \(\omega = \pi f_0\),
$$\begin{aligned} A_0= \frac{2 \mathcal{M}^{5/3} \omega ^{2/3} }{R} \sqrt{F^2_+(\varvec{\varOmega }, \psi ) (1+\cos ^2 \iota )^2 + 4 F^2_\times (\varvec{\varOmega }, \psi ) \cos ^2 \iota }\, , \end{aligned}$$
$$\begin{aligned} \phi _0 = \mathrm{arctan}\left( \frac{F_\times (\varvec{\varOmega }, \psi ) (2\cos \iota )}{F_+(\varvec{\varOmega }, \psi ) (1+\cos ^2 \iota )}\right) +2\varphi _0\, . \end{aligned}$$
With a single detector we are unable to separately measure the sky location \(\varvec{\varOmega }\), distance R, inclination and polarization angles \(\iota ,\psi \) and initial orbital phase \(\varphi _0\), but only combinations of these parameters that fix the overall amplitude \(A_0\) and gravitational wave phase \(\phi _0\) seen in the detector. And absent a measurable chirp, \(\tau \), we are unable to measure the chirp mass \(\mathcal{M}\).
Figure 45 shows simulated data generated with white Gaussian noise co-added to a signal with amplitude \(A_0=1\), frequency \(f_0=1\) and initial phase \(\phi _0 = \pi \). The variance of the noise was adjusted to yield a match-filter signal-to-noise of \(\mathrm{SNR}=\sqrt{50} = 7.07\). While the signals looks to be buried in the noise when viewed in the time domain, it is readily apparent in the Fourier power spectrum.
Fig. 45

Simulated data with a \(\mathrm{SNR}=7.07\) sinusoidal signal embedded in white noise. On the left is the data in the time domain, and on the right is the power spectrum in the Fourier domain, where the signal is apparent as a spike at \(f=1\) Hz

Because the signal is monochromatic we can use Parseval’s theorem to compute the SNR directly in the time domain:
$$\begin{aligned} \mathrm{SNR^2} = (\mathbf{h}| \mathbf{h}) = 4 \int \frac{\tilde{h}(f) \tilde{h}^*(f)}{S_n(f)} \, df = \frac{2}{S_n(f_0)} \int _{0}^{T_\mathrm{obs}} h^2(t) dt = \frac{A_0^2}{S_n(f_0)} \, T_\mathrm{obs} \, . \end{aligned}$$
The elements of the Fisher information matrix can be evaluated in the same way as the SNR. Using the parameter set \((A_0, f_0, \phi _0, t_0)\) we find
$$\begin{aligned} \varvec{\varGamma }= \mathrm{SNR}^2 \left( \begin{array}{cccc} \frac{1}{A_0^{2}} &{} \frac{1}{2 f_0 A_0} &{} 0 &{} 0 \\ \\ \frac{1}{2 f_0 A_0} &{} \frac{4 \pi ^2 T_\mathrm{obs}^2}{3} &{} \pi T_\mathrm{obs} &{} \; 2 \pi ^2 f T_\mathrm{obs} \\ \\ 0 &{} \pi T_\mathrm{obs} &{} 1 &{} -2 \pi f \\ \\ 0 &{} \; 2 \pi ^2 f T_\mathrm{obs} &{} \; -2 \pi f &{} 4 \pi ^2 f^2 \end{array} \right) \end{aligned}$$
Before attempting to estimate the parameter correlation matrix by inverting the Fisher matrix, it is instructive to look at the correlation matrix \(\gamma _{ij} = \varGamma _{ij}/\sqrt{\varGamma _{ii} \varGamma _{jj} }\):
$$\begin{aligned} \varvec{\gamma }= \left( \begin{array}{cccc} 1 &{} \frac{\sqrt{3}}{4\pi f_0 T_\mathrm{obs}} &{} 0 &{} 0 \\ \\ \frac{\sqrt{3}}{4\pi f_0 T_\mathrm{obs}} &{} 1 &{}\frac{\sqrt{3}}{2} &{} \; \frac{\sqrt{3}}{2} \\ \\ 0 &{} \frac{\sqrt{3}}{2} &{} 1 &{} -1 \\ \\ 0 &{} \; \frac{\sqrt{3}}{2} &{} \; -1 &{} 1 \end{array} \right) \end{aligned}$$
A problem immediately becomes evident: the initial phase \(\phi _0\) and initial time \(t_0\) are fully anti-correlated and the \(\phi _0,t_0\) sub-matrix is singular, rendering the full Fisher matrix singular. Physically the degeneracy corresponds to keeping the sum \(\phi _0 - 2\pi f_0 t_0\) constant in the gravitational wave phase. We can avoid the degeneracy by eliminating one of the redundant parameters, in this case \(t_0\). The reduced Fisher matrix is then
$$\begin{aligned} \varvec{\varGamma }= \mathrm{SNR}^2 \left( \begin{array}{ccc} \frac{1}{A_0^2} &{} -\frac{1}{2 f_0 A_0} &{} 0 \\ \\ -\frac{1}{2 f_0 A_0} &{} \frac{4 \pi ^2 T_\mathrm{obs}^2}{3} &{} \pi T_\mathrm{obs} \\ \\ 0 &{} \pi T_\mathrm{obs} &{} 1 \end{array} \right) \end{aligned}$$
with inverse
$$\begin{aligned} \varvec{\varGamma }^{-1} \approx \frac{1}{\mathrm{SNR}^2} \left( \begin{array}{cccc} A_{0}^2 &{} 0 &{} 0 \\ \\ 0 &{} \frac{3}{ \pi ^2 T_\mathrm{obs}^2} &{} \frac{3}{\pi T_\mathrm{obs}} \\ \\ 0 &{} \frac{3}{\pi T_\mathrm{obs}} &{} 4 \end{array} \right) \end{aligned}$$
where we have used the fact that \(f_0 T_\mathrm{obs} \gg 1\) to simplify the final expression. The diagonal elements of \(\varvec{\varGamma }^{-1}\) yield estimates for the 1-sigma parameter uncertainties:
$$\begin{aligned} \sigma _{A_0}= & {} \frac{A_0}{\mathrm{SNR}} \nonumber \\ \sigma _{f_0}= & {} \frac{\sqrt{3}}{\mathrm{SNR} \, \pi T_\mathrm{obs}} \nonumber \\ \sigma _{\phi _0}= & {} \frac{2}{\mathrm{SNR}} \, . \end{aligned}$$
Note that the parameter uncertainties all scale inversely with the SNR, which grows as the square root of the observation time. The error in the frequency decreases even more quickly with time. In general, quantities that impact the evolution of the phase are better constrained than those that impact the amplitude. The off-diagonal elements in \(\varvec{\varGamma }^{-1}\) tell us about the correlations in the parameter uncertainties.

Signal Search

The template metric (227) defines the line element
$$\begin{aligned} ds^2 = g_{\mu \nu } d\lambda ^\mu d\lambda ^\nu = \frac{4 \pi ^2 T_\mathrm{obs}^2}{3} df_0^2 +2 \pi T_\mathrm{obs} df_0 d\phi _0 + d\phi _0^2 \end{aligned}$$
Maximizing with respect to the overall phase \(\phi _0\) using the procedure described in Sect. 7.3 reduces the search to just one parameter—the gravitational wave frequency \(f_0\). The template spacing is then given by the line element
$$\begin{aligned} ds^2 = g'_{\mu \nu } d\lambda ^\mu d\lambda ^\nu = \frac{ \pi ^2 T_\mathrm{obs}^2}{3} df_0^2 \, . \end{aligned}$$
The grid spacing to achieve a fitting factor \(\mathrm{FF}\) is then
$$\begin{aligned} \varDelta f_0 = \frac{2 \sqrt{3}}{ \pi T_\mathrm{obs}} \sqrt{1-\mathrm{FF}} \, . \end{aligned}$$
For example, setting \(FF = 0.97\) yields a spacing of \(\varDelta f_0 \approx 0.2 /T_\mathrm{obs}\). If want the search to cover signals that complete between 50 and 250 oscillations during the observation time, then the parameter volume is
$$\begin{aligned} V_{f_0}= \int _{50/T_\mathrm{obs}}^{250/T_\mathrm{obs}} \sqrt{g'} \, df_0 = \frac{200 \, \pi }{\sqrt{3}} \, . \end{aligned}$$
With \(d=1\) and \(FF = 0.97\) the cell size is \(\varDelta V_{f_0} = 0.35\), and the number of templates in the bank is \(N= V_{f_0}/\varDelta V_{f_0} \approx 1000\). In this simple case with a one dimensional grid we can also compute the number of templates as the ratio of the search range \(200/T_\mathrm{obs}\) divided by the template spacing \(\varDelta f_0\).
Fig. 46

Probability distributions for the \(\varLambda _\mathrm{max}\)-statistic in pure Gaussian noise, and for data containing a sinusoidal signal with \(\mathrm{SNR}=7\)

Fig. 47

The output of the grid search over \(f_0\) for data containing a sinusoidal signal with \(\mathrm{SNR}=7\). The panel on the left shows the matched filter statistic while the panel on the right shows the match between the best fit template at each frequency and the injected signal. The peak in the matched filter statistic exceeds the 1% false alarm probability detection threshold (shown as a dashed line)

The probability distribution for the search statistic, \(\varLambda _\mathrm{max}\), maximized over \(A_0,\phi _0\) and \(f_0\), was determined empirically by repeating the search using \(10^4\) simulated noise realizations. The distributions are displayed in Fig. 46 for pure noise, and for noise co-added to a signal with \(\mathrm{SNR}=7\). Setting a \(1\%\) false alarm rate yields a detection threshold of \(\varLambda _\mathrm{max} = 22\), and a false dismissal probability of \(0.5\%\) for signals with \(\mathrm{SNR} = 7\).

The output of the grid search over \(f_0\) for the simulated data set is shown in Fig. 47. The maximum likelihood template had \(f_0=0.999\), \(A_0=0.898\) and \(\phi _0=2.75\). The match between this template and the injected signal was \(\mathrm{M}=0.98\). The false alarm probability is too low to be reliably estimated from the empirically derived probability distribution for the noise hypothesis shown in Fig. 46, but the significance is greater than \(3-\sigma \) (Gaussian equivalent, \(p_\mathrm{FA} < 0.3\%\)).

Parameter Estimation

The posterior distribution for the source parameters can be derived using the MCMC recipe described in Sect. 7.5 and illustrated in Fig. 43. Uniform priors were assumed for all parameters with ranges \(A_0 \in [0,10]\), \(f_0 \in [0.5,2.5]\) and \(\phi _0 \in [0,2\pi ]\). A mixture of proposal distributions were used, made up of a global proposal, a multi-variate normal Fisher matrix using Eq. (247), and a differential evolution proposal. Parallel tempering was employed with 30 chains geometrically spaced by a factor of \(c=0.87\). The global proposal was constructed by normalizing the output of the matched filter search, \(\varLambda _\mathrm{max}(f_0)\), shown in Fig. 47, and drawing \(f_0\) from this distribution, while simultaneously drawing \(A_0\) and \(\phi _0\) from their prior distributions. Note that we could have drawn \(A_0\) and \(\phi _0\) from some distribution centered on their maximum likelihood values for each \(f_0\), but uniform draws were sufficient for this simple example.
Fig. 48

The panel on the left displays a corner plot of the posterior distribution produced using the MCMC algorithm for the simulated data shown in Fig. 45 containing a \(\mathrm{SNR} = 7.07\) signal. The panel on the right shows the Fisher matrix approximation to the posterior distribution, centered on the maximum likelihood values for the parameters. Overall the agreement is good, but note that the 2-d probability contours for the MCMC derived posterior distributions are not perfectly elliptical

Figure 48 compares the posterior distribution derived by the MCMC algorithm to the predictions of the Fisher information matrix for the data shown in Fig. 45. The agreement is very good, but if you look closely you will see that the 2-d probability contours for the MCMC derived posterior distributions are not perfectly elliptical.
Fig. 49

The average log likelihood as a function of inverse temperature \(\beta \) for the signal model and the noise model. The area under these curves provides an estimate for the log of the model evidence. The area between the curves provides an estimate for the log Bayes factor

Since we are using parallel tempering it is also possible to compute the model evidence. On its own the evidence for the signal model is not very interesting. What we would like to do is to compute the Bayes factor between the signal model and the noise model. But out current noise model has no parameters, so its evidence is not defined. To remedy this we can treat the amplitude spectral density of the white noise, \(\sigma \), as a free parameter in the noise model. To be consistent, we also allow \(\sigma \) to vary in the signal model. Figure 49 shows the average log likelihood as a function of inverse temperate \(\beta \) from parallel tempered MCMC runs for the the noise model and the signal model. The area under these curves provides and estimate for the log evidence via Eq. (240). The area between the curves provides and estimate for the log Bayes factor between the two models, which here gives \(\log B_{S/N} = 14.1\), showing strong evidence for a signal being present in the data.
Fig. 50

The full multi-modal posterior distribution when the amplitude range is extended to negative values. The combination of the global proposal and parallel tempering allow the MCMC algorithm to fully explore both modes

The posterior distributions for this idealized example were mono-modal and well approximated by a multi-variate normal distribution, and the full MCMC machinery we employed was not needed. We can however make the problem a little more challenging by widening the prior range on the amplitude to include negative values, \(A_0 \in [-10,10]\) which results in a multi-modal likelihood surface since solutions with parameters \((-A_0, \phi _0+\pi )\) produce identical likelihoods to those with parameters \((A_0,\phi _0)\). Figure 50 shows the posterior distributions for the noise and signal parameters when the amplitude is allowed to take negative values. The combination of the global proposal and parallel tempering allow the MCMC algorithm to fully explore both modes. Without parallel tempering or the global proposal the chain remains stuck on a single mode of the posterior.


  1. 1.
    Abadie, J., et al.: A gravitational wave observatory operating beyond the quantum shot-noise limit: squeezed light in application. Nat. Phys. 7, 962–965 (2011). arXiv:1109.2295 [quant-ph]
  2. 2.
    Abbott, B.P., et al.: Calibration of the advanced LIGO detectors for the discovery of the binary black-hole merger GW150914. Phys. Rev. D 95, 062003 (2017). arXiv:1602.03845 [gr-qc]
  3. 3.
    Arzoumanian, Z., et al.: The NANOGrav 11-year data set: pulsar-timing constraints on the stochastic gravitational-wave background. Astrophys. J. 859, 47 (2018). arXiv:1801.02617 [astro-ph.HE]
  4. 4.
    Audley, H., et al.: Laser interferometer space antenna (2017). arXiv:1702.00786 [astro-ph.IM]
  5. 5.
    Backer, D.C., Kulkarni, S.R., Heiles, C., Davis, M.M., Goss, W.M.: A millisecond pulsar. Nature 300, 615–618 (1982)ADSCrossRefGoogle Scholar
  6. 6.
    Braak, C.J.F.T.: A Markov Chain Monte Carlo version of the genetic algorithm differential evolution: easy Bayesian computing for real parameter spaces. Stat. Comput. 16, 239–249 (2006). ISSN:1573-1375
  7. 7.
    Brooks, S., Gelman, A., Jones, G., Meng, X.: Handbook of Markov Chain Monte Carlo. CRC Press, Boca Raton (2011). ISBN:9781420079425
  8. 8.
    Brügmann, B.: Fundamentals of numerical relativity for gravitational wave sources. Science 361, 366–371 (2018). ISSN:0036-8075
  9. 9.
    Buonanno, A., Damour, T.: Effective one-body approach to general relativistic two-body dynamics. Phys. Rev. D 59, 084006 (1999). arXiv:gr-qc/9811091 [gr-qc]
  10. 10.
    Buonanno, A., Sathyaprakash, B.S.: In: Ashtekar, A., Berger, B.K., Isenberg, J., MacCallum, M.E. (eds.) General Relativity and Gravitation: A Centennial Perspective, pp. 287–346. Cambridge University Press, Cambridge (2015)Google Scholar
  11. 11.
    Carroll, S., Carroll, S.: Spacetime and Geometry: An Introduction to General Relativity. Addison-Wesley, New York (2004). ISBN:9780805387322Google Scholar
  12. 12.
    Cornish, N.J.: Alternative derivation of the response of interferometric gravitational wave detectors. Phys. Rev. D 80, 087101 (2009). arXiv:0910.4372 [gr-qc]
  13. 13.
    Cornish, N.J., Crowder, J.: LISA data analysis using MCMC methods. Phys. Rev. D 72, 043005 (2005). arXiv:gr-qc/0506059 [gr-qc]
  14. 14.
    Cornish, N.J., Littenberg, T.B.: BayesWave: Bayesian inference for gravitational wave bursts and instrument glitches. Class. Quantum Gravity 32, 135012 (2015). arXiv:1410.3835 [gr-qc]
  15. 15.
    Cornish, N.J., Romano, J.D.: Towards a unified treatment of gravitational-wave data analysis. Phys. Rev. D 87, 122003 (2013). arXiv:1305.2934 [gr-qc]
  16. 16.
    Creighton, J., Anderson, W.: Gravitational-Wave Physics and Astronomy: An Introduction to Theory, Experiment and Data Analysis. Wiley, New York (2012). ISBN:9783527636044Google Scholar
  17. 17.
    Detweiler, S.: Pulsar timing measurements and the search for gravitational waves. Astrophys. J. 234, 1100–1104 (1979)ADSCrossRefGoogle Scholar
  18. 18.
    Einstein, A.: Über das Relativitätsprinzip und die aus demselben gezogenen Folgerungen. (German) [On the relativity principle and the conclusions drawn from it]. Jahrbuch der Radioaktivität und Elektronik 4, 411–462 (1908)Google Scholar
  19. 19.
    Einstein, A.: Die Feldgleichungen der Gravitation. Sitzungsberichte der Königlich Preußischen Akademie der Wissenschaften (Berlin), pp. 844–847 (1915)Google Scholar
  20. 20.
    Einstein, A., Infeld, L., Hoffmann, B.: The gravitational equations and the problem of motion. Ann. Math. (2) 39, 65–100 (1938a)Google Scholar
  21. 21.
    Einstein, A., Infeld, L., Hoffmann, B.: The gravitational equations and the problems of motion. Ann. Math. (2) 41, 455–564 (1938b)Google Scholar
  22. 22.
    Estabrook, F.B., Wahlquist, H.D.: Response of Doppler spacecraft tracking to gravitational radiation. Gen. Relativ. Gravit. 6, 439–447 (1975)ADSCrossRefGoogle Scholar
  23. 23.
    Gelman, A., et al.: Bayesian Data Analysis. CRC Press, Boca Raton (2013). ISBN:9781439898208Google Scholar
  24. 24.
    Gilks, W., Richardson, S., Spiegelhalter, D.: Markov Chain Monte Carlo in Practice. Taylor & Francis, New York (1995). ISBN:9780412055515Google Scholar
  25. 25.
    Gralla, S.E., Wald, R.M.: A rigorous derivation of gravitational selfforce. Class. Quantum Gravity 25 [Erratum: Class. Quantum Gravity 28, 159501 (2011)], 205009 (2008). arXiv:0806.3293 [gr-qc]
  26. 26.
    Hellings, R.W., Downs, G.S.: Upper limits on the isotropic gravitational radiation background from pulsar timing analysis. Astrophys. J. 265, L39–L42 (1983)ADSCrossRefGoogle Scholar
  27. 27.
    Hewitson, M.: LISA science study team, LISA science requirements document, Issue 1.0 (2018).
  28. 28.
    Hobbs, G., Edwards, R., Manchester, R.: Tempo2, a new pulsar timing package. 1. Overview. Mon. Not. R. Astron. Soc. 369, 655–672 (2006). arXiv:astro-ph/0603381 [astro-ph]
  29. 29.
    Isaacson, R.A.: Gravitational radiation in the limit of high frequency. II. Nonlinear terms and the effective stress tensor. Phys. Rev. 166, 1272–1279 (1968b)Google Scholar
  30. 30.
    Isaacson, R.A.: Gravitational radiation in the limit of high frequency. I. The linear approximation and geometrical optics. Phys. Rev. 166, 1263–1271 (1968a)Google Scholar
  31. 31.
    Izumi, K., Sigg, D.: Advanced LIGO: length sensing and control in a dual recycled interferometric gravitational wave antenna. Class. Quantum Gravity 34, 015001.
  32. 32.
    Jackson, J.: Classical Electrodynamics. Wiley, New York (1975)Google Scholar
  33. 33.
    Kawamura, S., et al.: The Japanese space gravitational wave antenna: DECIGO. Class. Quantum Gravity 28, 094011 (2011)ADSCrossRefGoogle Scholar
  34. 34.
    Kelley, L.Z., Blecha, L., Hernquist, L., Sesana, A., Taylor, S.R.: The gravitational wave background from massive black hole binaries in Illustris: spectral features and time to detection with pulsar timing arrays. Mon. Not. R. Astron. Soc. 471, 4508–4526 (2017). arXiv:1702.02180 [astro-ph.HE]
  35. 35.
    Lehner, L., Pretorius, F.: Numerical relativity and astrophysics. Annu. Rev. Astron. Astrophys. 52, 661–694 (2014).
  36. 36.
    Levin, J., Perez-Giz, G.: A periodic table for black hole orbits. Phys. Rev. D 77, 103005 (2008). arXiv:0802.0459 [gr-qc]
  37. 37.
    Maggiore, M.: Gravitational Waves. Volume 1, Theory and Experiments. Oxford University Press, Oxford (2007). ISBN:9780191717666Google Scholar
  38. 38.
    Mathur, S.D.: What are fuzzballs, and do they have to behave as firewalls? In: Proceedings, 14th Marcel Grossmann Meeting on Recent Developments in Theoretical and Experimental General Relativity, Astrophysics, and Relativistic Field Theories (MG14) (In 4 Volumes): Rome, Italy, 12–18 July 2015, vol. 1, pp. 64–81 (2017)Google Scholar
  39. 39.
    Mino, Y., Sasaki, M., Tanaka, T.: Gravitational radiation reaction to a particle motion. Phys. Rev. D 55, 3457–3476 (1997).
  40. 40.
    Misner, C., Thorne, K., Wheeler, J.: General Relativity. W. H. Freeman and Company, San Francisco (1968)Google Scholar
  41. 41.
    Owen, B.J.: Search templates for gravitational waves from inspiraling binaries: choice of template spacing. Phys. Rev. D 53, 6749–6761 (1996). arXiv:gr-qc/9511032 [gr-qc]
  42. 42.
    Owen, B.J., Sathyaprakash, B.S.: Matched filtering of gravitational waves from inspiraling compact binaries: computational cost and template placement. Phys. Rev. D 60, 022002 (1999). arXiv:gr-qc/9808076 [gr-qc]
  43. 43.
    Pais, A.: Subtle is the Lord: The Science and the Life of Albert Einstein. Oxford University Press, Oxford (2005). ISBN:9780192806727
  44. 44.
    Poisson, E.: The motion of point particles in curved spacetime. Living Rev. Relativ. 7, 6 (2004). ISSN:1433-8351
  45. 45.
    Poisson, E., Will, C.: Gravity: Newtonian, Post-Newtonian, Relativistic. Cambridge University Press, Cambridge (2014). ISBN:9781107032866
  46. 46.
    Quinn, T.C., Wald, R.M.: Axiomatic approach to electromagnetic and gravitational radiation reaction of particles in curved spacetime. Phys. Rev. D 56, 3381–3394 (1997).
  47. 47.
    Rakhmanov, M.: Fermi-normal, optical, and wave-synchronous coordinates for spacetime with a plane gravitational wave. Class. Quantum Gravity 31, 085006 (2014). arXiv:1409.4648 [gr-qc]
  48. 48.
    Robson, T., Cornish, N., Liu, C.: The construction and use of LISA sensitivity curves. Class. Quantum Gravity 36, 105011 (2019). arXiv:1803.01944 [astro-ph.HE]
  49. 49.
    Romano, J.D., Cornish, N.J.: Detection methods for stochastic gravitational-wave backgrounds: a unified treatment. Living Rev. Relativ. 20, 2 (2017). arXiv:1608.06889 [gr-qc]
  50. 50.
    Schutz, B., Schutz, D.: A First Course in General Relativity. Cambridge University Press, Cambridge (1985). ISBN:9780521277037Google Scholar
  51. 51.
    Sivia, D., Skilling, J.: Data Analysis: A Bayesian Tutorial. Oxford University Press, Oxford (2006). ISBN:9780198568315Google Scholar
  52. 52.
    Skilling, J.: Nested sampling for general Bayesian computation. Bayesian Anal. 1, 833–859 (2006).
  53. 53.
    Van de Meent, M.: Modelling EMRIs with gravitational self-force: a status report. J. Phys.: Conf. Ser. 840, 012022.

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of PhysicsMontana State UniversityBozemanUSA

Personalised recommendations