4 Norms and Spectra

4.1 Norms Under Linear Transformation

As was pointed out previously, the family of $L_p$ norms implicitly depend on the basis. It is natural to wonder how changing a basis would change the norm. Though we proved that it won’t affect the notion of convergence, we do sometimes care about actual values.

To start, we construct some transformed norms:

Theorem 4.1 Let $|\cdot|$ be a norm defined on a vector space $V$, and let $L:V\to V$ be an invertible linear transformation. The function $|\cdot|_L:V\to \mathbb R$ given by $|x|_L= |L^{-1}(x)|$ is also a norm.

Proof. By construction, $|x|_L = |L(x)|$ is nonnegative. If $|x|_L$ is zero, then $L(x) = 0$ because $|\cdot|$ is a norm. Since $L$ is invertible, this implies $x=0$.

If $c$ is a scalar, we calculate: \[\begin{align*} |cx|_L &= |L^{-1}(cx)|,\\ &= |cL^{-1}(x)|,\\ &= |c||L^{-1}(x)|,\\ &= |c||x|_L. \end{align*}\]

As for the triangle inequality,

\[\begin{align*} |x+y|_L &= |L^{-1}(x+y),|\\ &= |L^{-1}(x) + L^{-1}(y),|\\ &\leq |L^{-1}(x)| + |L^{-1}(y),|\\ &= |x|_L + |y|_L. \end{align*}\]

Suppose we’ve written $x$ in coordinates and $|\cdot|$ is an $L_p$ norm for those coordinates. The matrix $L$ can be viewed as changing the coordinates, in which case $|\cdot|_L$ is an $L_p$ norm for different coordinates. The induced norms behave as one might expect from this point of view.

Theorem 4.2 Let $|\cdot|$ be a norm on $V = \mathbb R^n$, and $L:V\to V$ an invertible linear map, and $|\cdot|_L$ the transformed norm. These induce norms on linear maps $||\cdot ||$ and $||\cdot||_L$, respectively. They are related by conjugacy \[||T||_L = || L^{-1} T L ||\] i.e. by change of basis.

Proof. Take any $u\in V$ and set $v = L^{-1}(u)$. Then,

\[\begin{align*} \frac{|Au|_L}{||u||_L} &= \frac{|L^{-1}Au|}{|L^{-1}u|},\\ &= \frac{|L^{-1}ALL^{-1}u|}{|L^{-1}u|},\\ &= \frac{|L^{-1}ALv|}{|v|}. \end{align*}\]

Since $L$ is invertible, when $u$ ranges over all vectors in $\mathbb R^n$, so too does $v$. Therefore,

\[\begin{align*} \|A\|_L\\ &= \sup\left\{\frac{|Au|_L}{|u|_L}: u\in\mathbb{R} \right\},\\ &= \sup\left\{\frac{|L^{-1}ALv|}{|v|} : v\in\mathbb{R} \right\},\\ &= ||L^{-1}AL||. \end{align*}\]

Example 4.1 Recall that a matrix $A$ is diagonalizable if there is an invertible matrix $P$ such that $P^{-1}AP = D$ is diagonal. In that case, its diagonal entries are the eigenvalues of $A$. Now, the above theorem tells us that

\[ ||A||_P = ||D|| \]

Due to structural simplicity, $||D||$ is usually very easy to calculate. For example, if we use the infinity norm, then $||D||$ is its largest row sum, which just means its largest diagonal entry. That’s the same as the largest eigenvalue of $A$, and so $||A||_P = \rho(A)$.

4.2 Spectral Radius

The example suggests a connection between matrix norms and spectra radius. Plus, it seems like eigenvalues should be special for calculating matrix norms, because

\[T(v) = \lambda v\]

means that

\[\frac{|T(v)|}{|v|} = \frac{|\lambda v|}{|v|} = |\lambda|.\]

It follows that $||T|| \geq \rho(T)$ for any induced matrix norm. More can be said.

Theorem 4.3 Let $T$ be an invertible linear transformation. Then \[\rho(T) = \inf_{||\cdot||} \{||T||\}\] where the min is taken over induced norms. In other words, the spectral radius is precisely the least possible (induced) norm of $||T||$.

Proof. The remark preceding the theorem tells us that $||T|| \geq \rho(T)$ for any induced norms $||\cdot ||$, and hence $\rho(T)$ is at most the infinimum of $||T||$. To conclude, we will to show that $\rho(T)$ is not strictly less than this infinimum. To that end, it suffices to prove that for all $\epsilon > 0$, there is some induced norm $||\cdot ||$ such that $\rho(T) \geq ||T|| - \epsilon$. In fact, it will turn out that a norm induced by a change of basis will do the job.

Note that the case where $T$ is diagonalizable can be handled using Example 4.1, because it tells us that $||T||_P = \rho(T)$ if $P$ corresponds to the matrix diagonalizing $T$. Unfortunately, not all matrices are diagonalizable. However, we established a second best: every matrix is upper triangularizable.

Pick a basis such that $T$ corresponds to an upper triangular matrix $U$. Let $M$ be a diagonal matrix with diagonal entries $c,c^2,...,c^n$. One can verify that $M^{-1}U M$ is also upper triangular with the same diagonal, but the off-diagonal entry are scaled: the diagonal above the main diagonal by $c$, the one above that by $c^2$, and so on. In particular, if we take $c$ to be sufficiently small (depending on the entries of $U$, we may assume that every off-diagonal entry has absolute value at most $\epsilon/n$. Conjugation by $M$ corresponds to a different choice of basis for upper-triangularizing $T$, so we may as well assume that $U$ has this property (small off-diagonal) from the start.

Now let $||\cdot||$ be the induced $L_\infty$ norm with respect to this basis. In the previous chapter, we proved that it was the largest absolute row sum. Each absolute row sum is of the form \[|\lambda| + \sum \textrm{some |off diagonal entries|},\] where $\lambda$ is an eigenvalue of $T$, and hence $|\lambda| \leq \rho(T)$. Our choice of basis ensures that the off-diagonals are small, so that entire sum is at most $(n-1)\epsilon/n < \epsilon$. Therefore, all the row sums are at most $\rho(T) + \epsilon$, meaning $||T|| \leq \rho(T) + \epsilon$, as was to be shown.

4.3 Gershgorin Circle Theorem

The same ideas that appeared above let us bound the eigenvalues of a matrix in terms of its off-diagonal row sums.

Theorem 4.4 Let $A$ be an $n\times n$ matrix over $\mathbb C$ and $\lambda$ be an eigenvalue. Suppose $x$ is an eigenvector associated with $\lambda$ and $i$ is the index at which $|x_i|$ is maximized (in other words, the index achieving $|x|_\infty$. Then,

\[|\lambda - a_{ii}| \le \sum_{j = 1, j\neq i}^n|a_{ij}|.\]

Geometrically, this says that $\lambda$ lies in the disk around the diagonal entry $a_{ii}$ of radius ${j = 1, ji}^n|a{ij}|, the sum of the off-diagonal entries in row $i$.

Proof. Scaling $x$, we may assume that $|x|_\infty = 1$; in other words, divide by $x_i$. After doing so, we have $|x_i| = 1$ and $|x_j| \leq 1$ if $j\neq i$. A multiple of an eigenvector is still an eigenvector with the same eigenvalue.

Expanding the left hand side of $Ax = \lambda x$ and comparing the $i$th entry, we obtain

\[ \lambda x_i = \sum_{j = 1}^na_{ij}x_j. \]

Subtracting by the term $a_{ii}x$ from both sides yields

\[ (\lambda - a_{ii})x_i= \sum_{j = 1, j\neq i}^na_{ij}x_j. \]

Take the absolute value of both sides. Since $|x_i| = 1$, the left hand side is $|\lambda - a_{ii}|$, which we want to bound, and when we apply the triangle inequality and the fact that $|x_j| \leq 1$ for $j\neq i$, we obtain

\[\begin{align*} |\lambda - a_{ii}| &= \left|\sum_{j=1,j=\neq i}^n a_{ij} x_j\right|,\\ &\leq \sum_{j = 1, j\neq i}^n|a_{ij}||x_j|,\\ &\leq \sum_{j = 1, j\neq i}^n|a_{ij}|, \end{align*}\]

which is precisely the desired inequality.

This is a useful result for making some initial restrictions on the eigenvalues of a given matrix. It produces the best results when the diagonal is large and the off-diagonal sums are small. Even when they aren’t, it is possible for a change of basis to improve the situation (indeed, Theorem 4.3 makes precisely this kind of choice).

Definition 4.1 A matrix $A$ is called (strictly) diagonally dominant if, for all $i$,

\[ |a_{ii}| > \sum_{j = 1, j\neq i}^n |a_{ij}|. \]

Corollary 4.1 Diagonally dominant matrices are invertible.

Proof. The diagonal dominance property tells us that the disks in Gershgorin’s theorem do not contain zero. If zero is not an eigenvalue, then the matrix must be invertible.

The following comparison result will be helpful in the following chapter. To motivate it, suppose we want to discuss how well a matrix $Q$ approximates another matrix $A$. At a first pass, one might use $||A-Q||$, a kind of absolute matrix error. If $Q$ is invertible, then we might use $I-Q^{-1}A$, since $Q^{-1}A$ should be near the identity if $Q^{-1}\approx A^{-1}$, which might mean $Q\approx A$.

Since the norm depends on various choices, we might like to eliminate it by taking the minimum over all norms, which by Theorem 4.3 is precisely the spectral radius. Having done so, $\rho (I-Q^{-1}A)$ seems like a plausible way to measure how close matrices are.

With that in mind, the following says that a diagonally dominant matrix is close to any “submatrix” obtained by deleting off-diagonal entries of $A$.

Theorem 4.5 Let $A = (a_{ij})$ be a diagonally dominant matrix and $Q=(q_{ij})$ be any matrix such that

$q_{ii} = a_{ii}$ for all $i$,
if $q_{ij} \neq 0$ then $q_{ij} = a_{ij}$.

Then $\rho(I - Q^{-1}A) < 1$.

We will present two proofs.

Proof. Let $\lambda$ be any eigenvalue of $I - Q^{-1}A$ and $v \neq 0$ be an associated eigenvector. Then,

\[ (I - Q^{-1}A)v = \lambda v \]

Therefore,

\[ (A - Q + \lambda Q)v =0 \]

Since $v\neq 0$, the matrix $(A - Q + \lambda Q)$ is not invertible and hence not diagonally dominant by Corollary 4.1. However, since $A$ is diagonally dominant and $Q$ “is a part of” $A$ that “contains” the diagonal, we see that $(A - Q + \lambda Q)$ actually is diagonally dominant whenever $|\lambda| \geq 1$. Its diagonal is larger by a factor of $\lambda$, but the row sums are increased at most by a factor of $\lambda$.

Therefore, $|\lambda| < 1$ and since it was arbitrary, we have $\rho(I- Q^{-1}A) < 1$.

Proof. Let $\lambda$ be any eigenvalue of $I - Q^{-1}A$ and $v$ be an associated eigenvector. Without loss of generality, we can choose $|x|_\infty = 1$ so that $|x_i| = 1$ for some $i$ and $|x_j| < 1$ for all $j\neq i$. We have

\[ (I - Q^{-1}A)v = \lambda v, \]

so $(Q-A)v = \lambda Qv$. Now focus on the $i$-th row of the above vector equation. Define

\[\begin{align*} S &= \{j: a_{ij} \neq q_{ij}\},\\ T &= \{j: a_{ij} = q_{ij} \text{ and } i\neq j\}. \end{align*}\]

In other words, $S$ is the set of all $j$ such that $a_{ij} \neq 0$ but $q_{ij} = 0$, whereas $T$ is the set of all $j$, besides the diagonal, such that $q_{ij}$ is copied from $a_{ij}$ . Now, we have

\[ - \sum_{j\in S}a_{ij}v_j = \lambda \left(a_{ii}v_i + \sum_{j\in T}a_{ij}v_j \right), \]

which gives

\[ \lambda a_{ii}v_i = - \sum_{j\in S}a_{ij}v_j - |\lambda|\sum_{j\in T}a_{ij}v_j. \]

Taking absolute value and applying the triangle inequality, we get

\[ |\lambda ||a_{ii}||v_i| \leq \sum_{j\in S}|a_{ij}||v_j| + |\lambda|\sum_{j\in T}|a_{ij}||v_j|. \]

Remember that $|v_i| = 1$ and $|v_j| \leq 1$. Therefore,

\[ |\lambda|| a_{ii}| \leq \sum_{j\in S}|a_{ij}| + |\lambda| \sum_{j\in T}|a_{ij}|, \]

which gives

\[ |\lambda| \leq \frac{\sum_{j\in S}|a_{ij}|}{| a_{ii}| - \sum_{j\in T}|a_{ij}|} < 1, \]

The last inequality follows from the fact that

\[ \sum_{j\in S}|a_{ij}| + \sum_{j\in T}|a_{ij}| \leq \sum_{j = 1, j\neq i}^n |a_{ij}| < |a_{ii}|. \]

Exercises

Exercise 4.1 Using the Gershgorin Circle Theorem, show that the following matrix is invertible: \[\begin{bmatrix} 4 & 1 & i\\ -2 & 8 & 1\\ 1 & -i & 3 \end{bmatrix}.\]

However, show that the Gershgorin Circle Theorem does not suffice to show that its square is invertible.

Exercise 4.2 Consider a monic polynomial \[p(x) = a_0 + a_1x + a_2x^2 + ... + a_{n-1}x^{n-1} + x^n.\] Show that the characteristic polynomial of \[\begin{bmatrix} 0&1&0&\dots&0\\ 0&0&1&\ddots&\vdots \\ \vdots & \ddots & \ddots & \ddots & 0\\ 0&\dots&\dots &0& 1 \\ -a_0 & -a_1 & \dots & \dots & -a_{n-1} \end{bmatrix}\] is $f(x)$. What does the Gershgorin circle theorem tell you about the roots of $f(x)$?

One surprising trick for numerically calculating roots of polynomials is to find matrices like this which have it as their characteristic polynomial, at which point our numerical methods for finding eigenvalues apply.

Exercise 4.3 Consider a matrix \[M = \begin{bmatrix} a & b \\ c & d \end{bmatrix}.\]

Suppose that $a,b,c,d\geq 0$. Prove that both of its eigenvalues are real. If $a,b,c,d>0$, prove that the largest eigenvalue is both nonzero and strictly larger than the other. Then, show that the eigenvector associated to its largest eigenvalue can be chosen to have positive entries.

Exercise 4.4 The spectral radius is closely related to norms. Which of the three norm properties does it have? Which does it lack? Provide proofs or give counterexamples, as the case may be.

Programming Problems

Exercise 4.5 Write a small tool which takes a matrix as input and plots its Gershgorin disks in the complex plane.