Having spent quite a while investigating linear algebra and solvign systems of linear equations, we now turn to a specific kind of linear puzzle: interpolating points with polynomials. This can be used to approximate functions, for example by interpolating a few values or measured data points. Polynomials are simple, so it can be helpful to use them as replacements for more complicated functions
We have already studied one kind of polynomial approximation, the Taylor polynomials, which can be quite accurate in a small neighborhood of a point. More general polynomial interpolation can spread the accuracy out, although they too tend to be poor estimates at values far from the known points (which we will attempt to address with splines).
7.1 Existence of Interpolating Polynomials
A polynomial of degree at most \(n\) has \(n+1\) free parameters, its coefficients, so we should expect it these to be determined by \(n+1\) values. In other words, if \(p\) is a polynomial defined over \(K\) and we are given \(n+1\) points \(x_0,...,x_n\) in \(K\), then we expect that the points \((x_i,p(x_i))\) determine \(p\).
So, a polynomial interpolation problem is the following: given \(n+1\) points \((x_i,y_i)\), can we find a polynomial of degree at most \(n\) such that \(y_i=p(x_i)\)? We will show that it is always possible to solve such a problem.
We translate this into a linear algebra question as follows. The collection \(\mathcal P_n\) of polynomials with coefficients in \(K\) and degree at most \(n\) is an \(n+1\) dimensional vector space. Given some \(a\in K\), the crucial point is that \(p\mapsto p(a)\) is a linear map from \(\mathcal P_n \to K\), and therefore \[ p \mapsto (p(x_0),...,p(x_n))\] is a linear map to \(K^{n+1}\), a linear map between two \(n+1\) dimensional vector spaces. This map is denoted \(\Phi_{x_0,...,x_n}\) or just \(\Phi\) when the \(x_i\) are understood from the context, and called the evaluation homomorphism.
The interpolation problem, therefore, turns into solving \[\Phi_{x_0,...,x_n} (p) = (y_0,...,y_n),\] and therefore for fixed \(x_i\) the interpolation problem is solvable for all choices of \(y_i\) precisely when \(\Phi\) is invertible. Since the map is linear, it actually suffices to prove that \(\Phi\) is injective.
Lemma 7.1 Let \(p(x) \in K[x]\) be a polynomial. The long division of \(p(x)\) by \(x-a\) is \(p(a)\). Therefore, \(x-a\) divides \(p(x)\) if and only if \(p(a) = 0\). Moreover, a nonzero polynomial of degree \(n\) is divisible by at most \(n\) linear factors.
Proof. Induction on degree. Verifying the final claim is where we use the fact that \(K\) is a field.
Theorem 7.1 Given distinct values \(x_0,...,x_n\) in \(K\), the associated evaluation map \(\Phi\) is an isomorphism. In particular, every interpolation problem has a solution and that solution is unique.
Proof. Well, since \(\Phi\) is a linear map, it suffices to prove that it is injective, which is further reduced to showing that \(\Phi(p) = 0\) implies \(p=0\). If that were to happen, then \(p(x_i) = 0\) for \(0\leq i \leq n\). By Lemma 7.1, this implies \(x-x_i\) divides \(p(x)\), so it has \(n+1\) distinct linear factors, and so \(p=0\).
It is worth noting that this can be improved to an interpolation even when the \(x_i\) are not unique, as long as multiplicities are specified, and sufficiently many derivatives are specified at repeated points. The Taylor polynomial is the special case of interpolation at a single point with multiplicity \(n\).
7.2 Interpolation in Coordinates
There are many different ways to write polynomials, which largely come down to choices of basis on \(\mathcal P_n\). While it is good to know that the interpolation problem has a solution, we would like to write down the solution, solving \(\Phi p = y\) in coordinates.
7.2.1 Vandermonde
The standard basis is \(1,x,...,x^n\). The matrix representation of \(\Phi\) in these coordinates is the Vandermonde matrix:
While it is a natural place to start, this matrix tends to be difficult to invert. It is, though, quite simple to implement
def vandermonde_matrix(nodes): # From the nodes x_i, constructs the vandermonde matrix return matrix([[x^i for i in range(0,len(nodes))] for x in nodes])def vandermonde_interpolation(points): # Takes points (xi,yi) and returns the interpolating polynomial.1 A = vandermonde_matrix([xi for xi,yi in points]) b = vector([yi for (xi,yi) in points]) # coefficients and polynomial c = A.inverse()*b x = var('x') return sum( c[i]*x^i for i in range(0,len(c)) )# vandermonde_interpolation([(-1,1),(1,1),(0,0)])
1
The system of equations to solve is \(Ac = b\). The vector \(c\) will be the coefficients.
7.2.2 Newton
The standard basis is independent of the \(x_i\), but if we tune our basis to the chosen coordinates, we might get something better. Consider the vectors \[\{1, (x - x_0), (x-x_0)(x-x_1), \dots, (x-x_0)\cdots(x-x_{n-1})\}.\] Let \(u_k\) be the \(k\)th vector in this list, starting from \(k=0\). The degree of \(u_k\) is \(k\), and polynomials with distinct degrees are linearly independent, and we have \(n+1\) of them, so they must comprise a basis.
Elements of this basis have the property \(u_k(x_j) = 0\) when \(j\leq k\). Thus, the matrix representation for \(\Phi\) simplifies quite a bit:
This matrix representation is lower triangular, and so we expect the associated system to be easy to solve.
7.2.3 Lagrange
The evaluation maps \(\phi_i\) given by \(\phi_i(p)=p(x_i)\) are homomorphisms from \(\mathcal P_n\) to \(K\). This makes them members of the dual, which brings to mind dual bases. If the evaluation maps can be realized as the duals associated to a nice basis, they’ll be extremely easy to understand with respect to that basis: the matrix representation of \(\Phi\) is the identity! (when \(K^{n+1}\) is given its standard basis)
The basis \(\{q_0,...,q_n\}\) we seek should satisfy \[\phi_i(q_j) = \begin{cases} 1 & i=j,\\0 & i\neq j.\end{cases}\]
We define preliminary polynomials \[q_i(x) = \prod_{j\neq i} (x-x_j),\] and some auxiliary numbers \(s_i = q_i(x_i)\).
By construction \(q_i(x_i)\) is nonzero and \(q_i(x_j) = 0\) when \(j\neq i\). This is nearly the dual property. This means the polynomials \(l_i(x) = q_i(x)/s_i\) are dual to the evaluation maps. Among other things, this also guarantees that they’re a basis.
Duality gives rise to the formula \[p(x) = \sum_{i=0}^n = p(x_i) l_i(x),\] and therefore the points \((x_i,y_i)\) are interpolated by \[\sum_{i=0}^n y_i l_i(x).\]
A small tweak improves the basis from a computational perspective. First, the constant \(1\) interpolates the points \((x_i,1)\) and therefore \[1 = \sum_{i=0}^n l_i(x).\]
Also, if we let \(Q(x) = \prod_{i=0}^n x-x_i\), then \[q_i(x) = \frac{Q(x)}{x-x_i}.\]
We can combine these to rewrite the interpolating polynomial in what is called the barycentric form:
For information about this variation, see Barycentric Lagrange Interpolation by Jean-Paul Berrut and Lloyd N. Trefethen. As they point out, the barycentric form reduces the number of calculations required to evaluate polynomials from this form. The reason for this is easy to see – the common factor of \(Q\), contains many of those calculations, so by canceling this factor, we save some computer time. Moreover, the most expensive calculations are things like \(s_i\), which depends only on the \(x\)-coordinates. Since those are usually considered fixed in advance, while we vary the \(y\)-coordinates, computing these in advance and reusing them can save even more time.
7.3 Adding New Points
Suppose we’ve interpolated points \((x_0,y_0),...,(x_n,y_n)\), and then want to incorporate another point \((x_{n+1},y_{n+1})\). We’d like to take advantage of the existing interpolation to save time. Plus, this would give us a way to inductively interpolate a long list of points.
Adding a new point requires going up in dimension. For each of the coordinate systems above, this changes the basis: - the new Vandermonde basis is the old one plus a new monomial, - the new Newton basis is the old one plus a new polynomial, - the new Lagrange basis requires changing all of the old basis elements in a somewhat complicated way, along with adding a new element.
For the first two bases, where we just add a new element, it would be idea if we could keep the old coefficients and just calculate one new one. In the Vandermonde case, however, it’s clear that this is too much to ask – you can’t just add a new leading term with a good coefficient and still preserve anything else. All the coefficients are expected to change. The Newton basis, however, will not have this issue.
7.3.1 Newton (again)
The starting basis is \[\{1, (x - x_0), (x-x_0)(x-x_1), \dots, (x-x_0)\cdots(x-x_{n-1})\},\] and the new basis element is \[t(x) = (x-x_0)\cdots(x-x_{n-1})(x-x_n).\]
This new element is zero on all \(x_i\) with \(i\leq n\), so if we add a multiple of it to any polynomial \(p(x)\) to get some polynomial \(q(x)\), then we will have \(q(x_i) = p(x_i)\) at those \(x_i\) – the polynomial \(q(x)\) still interpolates the original points. The new element is nonzero at \(x_{n+1}\), and so by picking an appropriate coefficient, we can pick any value we want for \(q(x_n)\). In particular, if we want \(q(x) = p(x) + c t(x)\) to include the new point \((x_n,y_n)\), we need to solve \[y_n=q(x_n) = p(x_n) + ct(x_n)\] for \(c\), which is just \[c = \frac{y_n - p(x_n)}{t(x_n)}.\]
This gives us a simple recursive recipe for the coefficients of in the Newton basis (one which is equivalent to solving the associated triangular system of equations by back-substitution). Namely, for points \((x_0,y_0),...,(x_n,y_n)\) we define coefficnts \(c_i\) and polynomials \(p_i(x)\) as follows: - \(c_0 = y_0\), - \(p_0 = c_0 \cdot 1 = y_0\), - For \(k\geq 1\), \[\begin{align*}
c_k &= \frac{y_k - p_{k-1}(x_k)}{(x_k - x_0)\cdots(x_k - x_{k-1})},\\
p_k(x) &= p_{k-1}(x) + c_k(x - x_0)\cdots(x - x_{k-1}).
\end{align*}\]
Ending with a final polynomial \(p_n(x)\), that interpolates the points: \[p_n(x) = c_0 + c_1(x-x_0) + c_2(x-x_0)(x-x_1) + ... + c_n(x-x_0)...(x-x_{n-1}).\]
It is worth noting that the quick-evaluation trick we employed when studying Neumann series also applies to a polynomial of this form: \[p_n(x) = c_0 + (x-x_0)(c_1 + (x-x_1)(c_2 + (x-x_2)(c_3 + ... + (x-x_{n-1})(c_n)... ))).\]
In other words, if we let \(f_k(x) = c_k + (x-x_k)(x)\), then we can write \(p(x)\) as the composition \[p(x) = (f_0 \circ f_1 \circ ... \circ f_{n-1})(x).\]
As before, this requires fewer arithmetic operations to evaluate.
7.3.2 Combining Interpolations
In this variation, we start from two interpolations that we’d like to patch together, rather than a single interpolation and a point.
Suppose that \(p(x)\) interpolates \((x_0,y_0),...(x_{n-1},y_{n-1})\) while \(q(x)\) interpolates \((x_1,y_1),...(x_{n},y_{n})\). It seems reasonable to hope that a combination of the two would interpolate all the points. For example, their averate \[\frac 12 p(x) + \frac 12 q(x)\] still interpolates the overlap \((x_1,y_1),...(x_{n-1},y_{n-1})\).
Of course, we expect that the degree will increase, so a linear combination will not be enough, but a weighted average is still essentially the right form: let \[h(x) = \frac{x_n - x}{x_n - x_0}p(x) + \frac{x - x_0}{x_n - x_0}q(x).\]
By its construction as a weighted average, \(h(x_i) = y_i\) when \(1\leq i \leq n-1\) since \(p(x_i) = q(x_i)\) at these points. At the endpoints,the weights vanish appropriately, so \[h(x_0) = \frac{x_n - x_0}{x_n - x_0}p(x_0) + \frac{x_0 - x_0}{x_n - x_0}q(x_0) = p(x_0) = y_0,\] and \[h(x_n) = \frac{x_n - x_n}{x_n - x_0}p(x_n) + \frac{x_n - x_0}{x_n - x_0}q(x_n) = q(x_n) = y_n.\]
By inspection, \(h(x)\) is a polynomial of degree at most \(n+1\), so it’s the correct interpolating polynomial.
The function \(h(x)\) is essentially obtained by linearly sliding from \(p(x)\) to \(q(x)\). A version of this sort of procedure is called a homotopy, and is something we will study in detail during the root-finding unit.
7.4 Divided Differences
Let’s focus on the case of interpolating values of a function: given some distinct \(x_0,...,x_n\) and a function \(f(x)\), we want to interpolate the points \((x_i,f(x_i)\).
Definition 7.1 To organize the bookkeeping, we write \(f[x_0,...,x_n]\) to be coefficient of \(x^n\) of the interpolating polynomial. This is called the \(n\)th divided difference of \(f\), with respect to the \(x_i\). It is well-defined because the interpolating polynomial is unique.
The divided differences have some interesting features:
If \(f\) is fixed, then we can view \(f[x_0,...,x_n]\) as a function of \(n+1\) variables. When \(n=0\), we have \(f[x_0] = f(x_0)\).
As a function, the divided difference is invariant under permutations of its arguments; the order of the \(x_i\) does not matter.
By definition, \(f[x_0,...,x_n]\) is the \(n+1\)th coordinate for the Vandermonde basis (the coefficient of the top degree term).
The divided difference is also the \(n+1\)th coordinate for the Newton basis, because in that version the interpolating polynomial has the form \[p_n(x) = \sum_{i = 0}^nc_i\prod_{j = 1}^{i-1}(x - x_j),\] and the only contribution to the \(x^{n}\) term is from \(c_k\prod_{j = 0}^{i-1}(x - x_j)\).
Inspecting the homotopy version, we obtain a recursive expression for the divided differences: \[f[x_0, ... , x_n] = \frac{f[x_1, \dots, x_n]-f[x_0, \dots, x_{n-1}]}{x_n - x_0}\] This expression is why it is called a divided difference of order \(n\): it is a quotient of a difference of \((n-1)\)th order divided differences. The first divided difference is the familiar difference quotient: \[f[x_0,x_1] = \frac{f[x_1]-f[x_0]}{x_1 - x_0} = \frac{f(x_1)-f(x_0)}{x_1 - x_0}.\]
The last observation tells us that if we take a limit \(x_0,x_1\to a\), then the divided differences converge to \(f'(a)\). This implies that the interpolating polynomials also converge, in a certain sense, to the best linear approximation to \(f(x)\) at \(x=a\). For higher order divided differences, careful study of the limit shows that the Taylor polynomial of degree \(n\) at \(x=a\) can be recovered as the limit as all \(x_i\) are taken to \(a\).
The recursion gives us a way to calculate divided differences:
Form an \((n+1)\) tuple \(d_0 = (f[x_0], f[x_1],\dots,f[x_n])\).
Then, starting at the 2nd entry, make \(d_1\) from \(d\) as follows \[\begin{align*}
& (f[x_0], &&f[x_1], &&f[x_2], &&\dots, &&f[x_n]&)\\
-& (&&f[x_0], &&f[x_1], &&\dots, &&f[x_{n-1}]&)\\
\div& (&&(x_1 - x_0), &&(x_2- x_1), &&\dots, &&(x_n - x_{n-1})&)\\
=& (f[x_0],&&f[x_0, x_1], &&f[x_1, x_2], &&\dots, &&f[x_{n-1}, x_n]&).\\
\end{align*}\]
Carrying out the same process from the third entry, make \(d_2\), which will look like \[
d_2 = (f[x_0], f[x_0, x_1], f[x_0, x_1, x_2], \dots, f[x_{n-2}, x_{n-1}, x_n]).
\]
Repeat a total of \(n\) times to obtain the final \[
d_n = (f[x_0], f[x_0, x_1], f[x_0, x_1, x_2], \dots, f[x_0, x_1, \dots , x_n]).
\] Its coordinates are the coefficients of the Newton interpolation.
7.5 Error Analysis
In the case we considered above, where we are trying to find a polynomial \(p(x)\) that interpolates the values of some function \(f(x)\), it would be useful to know how accurately \(p(x)\) approximates \(f(x)\). We don’t necessarily expect great accuracy – after you’ve worked out a few examples, you’ll notice that the graphs of interpolating polynomials can change dramatically when new points are added, and they tend to bounce up and down a great deal between points. In its defense, the interpolating polynomial does vary continuously as a function of the input points, so it responds reasonably well to perturbation.
Before we analyze the error in general, let’s return to one of our observations about divided differences. Namely, that the first order divided difference is the usual difference quotient: \[f[x_0,x_1] = \frac{f(x_1) - f(x_0)}{x_1-x_0},\] and more generally that the polynomial interpolation converges to the Taylor approximation at \(x=a\) as the \(x_i\) approach \(x=a\). This suggests that something like the error bound for Taylor approximations should apply.
So, to begin, we will show that the \(n\)th-derivative version of the Mean Value Theorem holds for the \(n\)th divided difference. First, a higher order version of Rolle’s Theorem.
Lemma 7.2 Let \(u\) be a function which is continuous on \([a,b]\) and \(n\)-times differentiable on \((a,b)\). Further assume that there are \(n+1\) distinct points \(x_0,...,x_n\) in \([a,b]\) at which \(u(x_i) = 0\). Then there is a point \(\xi\in (a,b)\) such that \(u^{(n)}(\xi) = 0\).
Proof. After possibly reordering, let’s assume that \(x_0<x_1<...<x_n\). If \(n=1\), this follows from the usual Rolle’s theorem applied to \([x_0,x_1]\subseteq [a,b]\). Suppose, then that \(n>1\).
We can apply the usual Rolle’s theorem to each of the \(n\) intervals \([x_i,x_{i+1}]\) for \(0\leq i \leq n-1\). This tells us \(u'\) has \(n\) distinct zeros in \([a,b]\). Our assumptions on \(u\) ensure that \(u'\) is \(n-1\) times differentiable on \([a,b]\). Since \(n>1\), it is differentiable at least once, hence continuous. By induction, the theorem says that there is some \(\xi \in (a,b)\) such that \[0 = (u')^{(n-1)}(\xi) = u^{(n)}(\xi),\] as was to be shown.
Theorem 7.2 Suppose \(f\) is continuous and \(n\)-times differentiable on \([a,b]\). Let \(x_0,...,x_n\) be distinct points in \([a,b]\). Then there is some \(\xi \in [a,b]\) such that \[f^{(n)}(\xi) = n! f[x_0,...,x_n].\]
Proof. The claim is definitely true for a polynomial \(p(x)\) of degree at most \(n\); in fact, the claim would hold for any \(\xi\) . A polynomial of degree \(n\) interpolates itself, so \(f[x_0,...,x_n]\) is precisely the leading coefficient, and the \(n\)th derivative \(p^{(n)}(x)\) will be a constant: \(n!\) times the degree \(n\) coefficient of \(p\).
Now let \(p(x)\) be the polynomial which interpolates \(f(x)\) at the \(x_i\). Consider \(f(x)-p(x)\). It is continuous, \(n\)-times differentiable, and vanishes at the \(n+1\) points \(x_0,...,x_n\), so by Lemma 7.2 there is some \(\xi\) such that \[(f-p)^{(n)}(\xi) = 0,\] so after a little rearranging we get \[f^{(n)}(\xi) = p^{(n)}(\xi) = n! f[x_0,...,x_n].\]
Finally, we’d like to use this to bound the error: how large can \(|f(x)-p(x)|\) be on \([a,b]\) if \(p\) is the order \(n\) interpolation to \(f(x)\) at some \(x_0,...,x_n\)? The trick is to view \(x\) as an extra point to be interpolated.
Theorem 7.3 Let \(f\) be a continuous and \(n+1\)-times differentiable function on \([a,b]\). Suppose that we have a bound \(|f^{(n+1)}(x)| < M\) in \([a,b]\). Let \(x_0,...,x_n\) be distinct points in \([a,b]\), and \(p(x)\) the polynomial which interpolates \(f(x)\) at those points. Then \[|f(x) - p(x)| \leq \frac{M}{(n+1)!} \prod_{i=0}^n (x-x_i)\] for all \(x\in [a,b]\).
Proof. Let \(p(x)\) interpolate \(f(x)\) at \(x_0,...,x_n\). First, we will verify the identity
Induction applied to the first term yields \(f(x) - q(x)\), where \(q(x)\) interpolates \(f(x)\) at \(x_0,...,x_{n-1}\). For the second term, note that it is exactly the coordinate (Newton form) of the new basis elementwhen expanding an interpolation from \(x_0,...,x_{n-1}\) to include \(x_n\). Therefore, it combines with \(q(x)\) to form the interpolating polynomial \(p(x)\) over those points!
By the higher order MVT, Theorem 7.2, we can find some \(\xi\) in \([a,b]\) such that \[f^{(n+1)}(\xi) = (n+1)! f[x_0,...,x_n,x].\]
Therefore, we’re able to rewrite the identity as \[f(x) - p(x) = \frac{f^{(n+1)}(\xi)}{(n+1)!} \prod_{i=0}^n (x-x_i),\] from which a bound on \(f^{(n+1)}(x)\) yields the claimed bound on \(|f(x) - p(x)|\).
The form of the bound suggests that well-chosen the \(x_i\) can improve the error bound by minimizing \(\prod_{i=0}^n x-x_i\). Interestingly, it turns out that evenly-spaced \(x_i\) are not the best choice in general. Instead, certain “Chebyshev nodes” do better at reducing this contribution to the error. These nodes are clustered somewhat more toward the endpoints of the interval.
7.6 Exercises
Exercise 7.1 For the following functions and \(x\)-coordinates, determine the interpolation at those coordinates and plot the function and the interpolating polynomial together.
\(\ln(x)\) at \(0.1,0.2,...,1.9,2.0\).
\(\sin(x)\) at \(-\pi,-\pi/2,0,\pi/2,\pi\).
\(e^x\) at \(-5,-1,0,1,2\).
7.7 Programming Problems
Exercise 7.2 Implement each interpolation method in three steps:
Given \(x_0,...,x_n\), construct the basis.
Express the evaluation map \(\Phi\) in terms of the basis.
Invert the matrix \([\Phi]\) and apply it to \(y_0,...,y_n\) to obtain the coefficients of the interpolating polynomial, and return it.