2 Linear Algebra

2.1 Vector Spaces

You likely aremember that linear algebra is the study of linear transformations of vector spaces. Vector spaces are collections of vectors with an addition and compatible scalar multiplication. To start, we define fields, where scalars live.

Definition 2.1 A field, usually denoted by the letter \(K\), is a set with distinguished elements \(0\) and \(1\), as well as two binary operations, \(+\) and \(\times\), addition and multiplication, such that:

addition and multiplication are commutative and associative,
\(0+x = x\) and \(1\times x = x\) for all \(x\in K\),
for any \(x \in K\) there is some \(-x\) such that \(x+(-x) = 0\),
for any nonzero \(x\in K\), there is some \(1/x\) such that \(x\times(1/x) = 1\).

Usually, we just write \(xy\) for \(x\times y\).

Now we can define vector spaces:

Definition 2.2 A vector space \(V\) over a field \(K\), or \(K\)-vector space, is a set of vectors (usually denoted \(V\) as well) with a distinguished vector \(0\), a binary operation \(+\), and a scalar multiplication \(\cdot : K\times V \to V\) which satisfy the following:

addition is commutative and associative,
\(0+v = v\) for all \(v\in V\),
\(1\cdot v = v\) and \(0\cdot v = 0\) (the left \(0\) is from \(K\), the right \(0\) is from \(V\))
scalar multiplication is associative-ish and distributes over vector addition; to wit, for all \(a,b\in K\) and \(v,w\in V\), \[a\cdot(b\cdot v) = (a\times b) \cdot v,\]
and \[a\cdot(v+w) = a\cdot v + a\cdot w.\]

As for fields, we generally suppress the \(\cdot\) and just write \(av\) to mean \(a\cdot v\).

Sometimes a subset of a vector space itself satisfies these conditions with those same operations. Such a subset is called a subspace of the original vector space.

Standard vector spaces, like the example of \(\mathbb R^2\) are quite easy to implement in Sage:

K = QQ # or RR or CC

1V = VectorSpace(K,2)
show(V)

2v = V([-1,2])
w = V([3,-3])

show(2*v)
show(v+w)

1: the 2 makes it \(K^2\).
2: cast a list of numbers into a vector

\(\displaystyle \newcommand{\Bold}[1]{\mathbf{#1}}\Bold{Q}^{2}\)

\(\displaystyle \left(-2,\,4\right)\)

\(\displaystyle \left(2,\,-1\right)\)

2.2 Bases and Dimension

From just a few vectors, you can produce many more by combining them with addition and scalar multiplication – sometimes, you even get the entire vector space. It’s helpful to pick out such sets of generators.

Definition 2.3 Let \(B\subseteq V\) be a subset of \(V\). The span of \(B\) is the set of all vectors in \(V\) which can be obtained as \(K\)-linear combinations of vectors in \(B\), i.e. those \(v\) in \(V\) for which there exist (finitely many) \(a_1,...,a_n\) in \(K\) and \(b_1,...,b_n\) in \(B\) such that \[v = \sum a_i b_i.\]

We say that \(B\) spans \(V\) if the span of \(B\) is \(V\).

Special among the spanning sets of \(V\) are its minimal spanning sets.

Definition 2.4 The set \(B\) that spans \(V\) is said to be a basis for \(V\) if it satisfies any of the following equivalent extra conditions:

no proper subset of \(B\) spans \(V\), or
there is no collection of nonzero \(a_i\) in \(K\) such that \(\sum a_i b_i = 0\), or
the expressions \(v = \sum a_ib_i\) in the definition of spanning are unique, meaning that if any of the \(a_i\) and \(b_i\) are changed, the equality will no longer be true.

Any set of vectors satisfying the 2nd and 3rd properties, but not necessarily spanning \(V\), is said to be linearly independent. A basis is a spanning set of linearly independent vectors.

Sage handles spans, bases, and dimension quite well.

K = QQ
V = VectorSpace(K,3)
B = [V([1,1,1]), V([-1,0,1]), V([2,2,1])]
W = V.span(B)
show(V == W)

\(\displaystyle \mathrm{True}\)

and also

L = RR
W = VectorSpace(L,5)
show(W)
show(W.basis())

\(\displaystyle \newcommand{\Bold}[1]{\mathbf{#1}}\Bold{R}^{5}\)

\(\displaystyle \left[\left(1.00000000000000,\,0.000000000000000,\,0.000000000000000,\,0.000000000000000,\,0.000000000000000\right), \left(0.000000000000000,\,1.00000000000000,\,0.000000000000000,\,0.000000000000000,\,0.000000000000000\right), \left(0.000000000000000,\,0.000000000000000,\,1.00000000000000,\,0.000000000000000,\,0.000000000000000\right), \left(0.000000000000000,\,0.000000000000000,\,0.000000000000000,\,1.00000000000000,\,0.000000000000000\right), \left(0.000000000000000,\,0.000000000000000,\,0.000000000000000,\,0.000000000000000,\,1.00000000000000\right)\right]\)

We will use the following standard facts about basis:

All vector spaces have a basis (by AC, say, although even without choice, lots of interesting vector spaces still have bases).
In fact, every spanning set contains a basis and any linearly independent subset of a vector space can be expanded to a basis.
Any two bases for a vector space have the same cardinality. That cardinality is the dimension of the vector space.
A vector space is largely determined by its dimension, in a sense to be made precise below.

The last fact boils down to the 3rd equivalent definition of a basis. When a \(K\)-vector space \(V\) has a basis \(B\), each \(v\) in \(V\) has a unique expression of the form \[v = \sum_{b\in B} a_bb.\] The identification of \(v\) with the coefficients \(a_b\) in \(K\) lets one view the vector space as \(K^{B}\), which amounts to \(K^{|B|}\). This respects its addition and scalar multiplication. What could be a rather wild object turns out to be no more complicated than a list of scalars. When the dimension is finite, meaning \(B\) is a finite set, this is especially useful, as then each vector is specified by just a finite amount of data (relative to \(K\)). This is convenient for carrying out actual calculations.

2.3 Linear Maps

More interesting than vector spaces are functions between them, especially functions which preserve their algebraic structure.

Definition 2.5 Let \(K\) be a field and \(V\) and \(W\) vector spaces over \(K\). A linear transformation from \(V\) to \(W\) is a function \(T:V\to W\) of vectors which is compatible with the operations on \(V\) and \(W\) as follows:

For all \(x,y\in V\), we have \[T(x+y) = T(x) + T(y)\]
For all \(a\in K\) and \(x\in V\), we have \[T(ax) = a T(x)\]

2.3.1 Matrices

As indicated above, a choice of basis gives us a way of representing vectors by lists of scalars, which are more amenable to computation. We would like to carry out a similar process for linear transformations. The key is that the linearity properties satisfied by linear transformations tell us that they are determined by their values on a basis of the domain, and those values can be expressed in terms of a basis of the codomain.

Definition 2.6 Let \(V\) and \(W\) be vector spaces over \(K\) and \(T:V\to W\) a linear map. For simplicity, assume the spaces are finite dimensional, of dimensions \(m\) and \(n\), respectively. Given ordered bases \(A = (a_1,...,a_m)\) and \(B = (b_1,...,b_n)\) of \(V\) and \(W\), we can construct a table of scalars, the matrix of \(T\) with respect to \(A\) and \(B\), as follows: for each \(a_i\), find the scalars \(t_{ij}\) such that

\[T(a_i) = \sum_{j=1}^n t_{ji} b_j,\]

then organize them into the table

\[\begin{bmatrix} t_{11} & t_{12} & \cdots & t_{1m}\\ t_{21} & t_{22} & \cdots & t_{1m}\\ \vdots & \vdots & \ddots & \vdots\\ t_{n1} & t_{n2} & \cdots & t_{nm} \end{bmatrix}\]

On way to visualize it: think of each \(T(a_i) = \sum t_{ji} b_j\) as a column vector indexed by \(j\) from top to bottom, and lay out the column vectors together consecutively.

Example 2.1 Consider the vector space \(V\) of polynomials of degree at most \(2\). Its usual basis is \(\{1,x,x^2\}\). Differentiation is a linear map from \(V\) to \(V\). We can calculate: \[\begin{align*} \frac{d}{dx} 1 &= 0 &= 0\cdot 1 + 0 \cdot x + 0\cdot x^2,\\ \frac{d}{dx} x &= 1 &= 1\cdot 1 + 0 \cdot x + 0\cdot x^2,\\ \frac{d}{dx} x^2 &= 2x &= 0\cdot 1 + 2 \cdot x + 0\cdot x^2. \end{align*}\]

Therefore, it is represented by the matrix \[\begin{bmatrix} 0 & 1 & 0\\ 0 & 0 & 2\\ 0 & 0 & 0 \end{bmatrix}.\]

This matrix succinctly describes \(T(a_i)\). If we think of \(a_i\) as a (column) vector with a \(1\) in index \(i\) and \(0\)s elsewhere, then when we multiply the matrix in the usual way, we obtain precisely the \(i\)th column, which as the coefficients \(t_{ji}\) that determine \(T(a_i)\).

Now, if you take a general \(v\) in \(V\) and write it as \(\sum c_i a_i\), then it can be thought of as a column vector of \(c_i\). You can calculate \(T(v)\) as follows: \[\begin{align*} T(v) &= T(c_1a1 + ... + c_na_n) \\ &= c_1 T(a_1) + ... + c_n T(a_n)\\ &= c_1\left(\sum_j t_{j1} b_j\right) + ... + c_n \left(\sum_j t_{jn} a_j\right)\\ &= \sum_j c_1 t_{j1} b_j + ... + \sum_j c_n t_{jn} b_j\\ &= \left(\sum_i c_i t_{1i} \right) b_1 + ... + \left(\sum_i c_i t_{ni}\right) b_n \end{align*}\] if you compare with the matrix multiplication rule, the coefficients match!

It is not difficult to verify from these observations that,

Proposition 2.1 The usual matrix multiplication corresponds to composition of functions: if \(S:U\to V\) and \(T:V\to W\) are linear transformations among vector spaces \(U,V,W\), and we have ordered bases \(A,B,C\) of them, then \(T\circ S\) is a linear transformation from \(U\) to \(W\) and \[[T\circ S]^A_C = [T]^B_C [S]^A_B\]

Proof. Expand both sides and compare.

Sage is quite good at handling matrices as representations of linear maps.

M = matrix(QQ,[[1,2,3,4,5],[0,1,0,1,0]])
V = VectorSpace(QQ,2)
W = VectorSpace(QQ,5)
T = linear_transformation(V,W,M)
show(T)

\(\displaystyle \newcommand{\Bold}[1]{\mathbf{#1}}\text{vector space morphism from } \Bold{Q}^{2}\text{ to } \Bold{Q}^{5}\text{ represented by the matrix } \left(\begin{array}{rrrrr} 1 & 2 & 3 & 4 & 5 \\ 0 & 1 & 0 & 1 & 0 \end{array}\right)\)

Sage uses the opposite multiplication convention, and views linear transformations as acting from the right, meaning \(T(x) = xM\) when \(M\) is a matrix representing \(T\), whereas your linear algebra class, and this one too, uses \(T(x) = Mx\). This is why the matrix is \(2\times 5\) instead of the \(5\times 2\) that you might expect.

Even if your vector spaces aren’t defined over the same field, sometimes it’s clear that there is a larger field over which they can both be defined. This is sometimes called extension of scalars or base extension. Sage will automatically extend matrices but it won’t automatically extend vector spaces. The first example will give an error, and the following three produce the same result:

M = matrix(QQ,[[2,4,8,16,35],[-1,0,-1,0,-1]])
try:
    S = linear_transformation(V,W,M)
except Exception as err:
    display(err)

V = VectorSpace(QQ,2)
W = VectorSpace(QQ,5)
VV = V.base_extend(RR)
WW = W.base_extend(RR)
1MM = matrix(RR,[[2,4,8,16,35],[-1,0,-1,0,-1]])
S = linear_transformation(VV,WW,MM)
show(S)

1: Note the RR instead of QQ

\(\displaystyle \newcommand{\Bold}[1]{\mathbf{#1}}\text{vector space morphism from } \Bold{R}^{2}\text{ to } \Bold{R}^{5}\text{ represented by the matrix } \left(\begin{array}{rrrrr} 2.00000000000000 & 4.00000000000000 & 8.00000000000000 & 16.0000000000000 & 35.0000000000000 \\ -1.00000000000000 & 0.000000000000000 & -1.00000000000000 & 0.000000000000000 & -1.00000000000000 \end{array}\right)\)

V = VectorSpace(RR,2)
W = VectorSpace(RR,5)
1M = matrix(QQ,[[2,4,8,16,35],[-1,0,-1,0,-1]])
S = linear_transformation(V,W,M)
show(S)

1: Note the QQ instead of RR

V = VectorSpace(RR,2)
W = VectorSpace(RR,5)
1M = matrix([[2,4,8,16,35],[-1,0,-1,0,-1]])
S = linear_transformation(V,W,M)
show(S)

1: Here, notice that when it’s obvious enough, you don’t even need to tell Sage the field for the matrix.

2.3.2 Change of Basis

Vector spaces have lots of different bases. Even though the describe the same space, different choices can be much easier (or harder!) to calculate with than others. For example, many linear maps have a special basis, an eigenbasis, with respect to which their matrix representation is extremely simple. When we polynomial approximation in Unit 2, we will see that they’re all about calculating representations with respect to different bases.

The clearest way to approach change of bases is to view matrices not as representing linear maps from vector spaces to vector spaces, but instead as representing linear maps between pairs of vector spaces and ordered bases.

In other words, when we have \(T:V\to W\) and pick ordered bases \(A\) and \(B\) in order to write down \([T]^A_B\), we should really think about \(T\) as an arrow from the pair \((V,A)\) to the pair \((W,B)\) (which is not to say that \(T(a_i) = b_i\), or anything like that, just that we package the ordered basis with the vector space).

Then, from an arrow \[V,A \overset{T}\longrightarrow W,B\]

we have a representation operator denoted by brackets, which sends \(v\in V\) to \([v]_A \in K^m\) and \(w\in W\) to \([w]_B \in K^n\) by writing them in terms of bases, and the linear map \(T\) to \([T]^A_B\) so that \[[v]_A \overset{[T]^A_B}\longrightarrow [w]_B\] (meaning \([T]^A_B [v]_A = [w]_B\) for usual matrix multiplication) whenever \(T(v) = w\).

Here is how we fit a change of basis into the picture. We have \[V,A \overset{T}\longrightarrow W,B\] and already know its \([T]^A_B\). Suppose that \(W\) has another basis \(B'\) that we’d like to use instead. In other words, we want to figure out how \[V,A \overset{T}\longrightarrow W,B'\] is related to the original. What if we append an arrow \(W,B \to W,B'\) to the original, and compose? If that arrow is the identity map, then it doesn’t change the actual linear map, but it does change the basis.

In other words, consider the composition of functions \[V,A \overset{T}\longrightarrow W,B \overset{\mathop{\mathrm{id}}}\longrightarrow W,B'\]

The overall map is \(T\). We know that composition is the same as matrix multiplication, which means that the overall map is \[[I\circ T]^A_{B'} = [\mathop{\mathrm{id}}]^B_{B'}[T]^A_{B}\] (functions compose right-to-left)

Therefore, a change of basis on the codomain is achieved by representing the identity map on the codomain in terms of the two bases. This seems pretty natural: a change of basis doesn’t actually do anything.

It’s easy to verify that a change of basis on the domain corresponds to a right multiplication by a matrix representation of the identity. Putting it together, simultaneously changing bases from \(A\) and \(B\) to \(A'\) and \(B'\) just comes from \[V,A' \overset{\mathop{\mathrm{id}}}\longrightarrow V,A \overset{T}\longrightarrow W,B \overset{\mathop{\mathrm{id}}}\longrightarrow W,B'\]

and hence \[[T]^{A'}_{B'} = [\mathop{\mathrm{id}}]^B_{B'}[T]^A_B [\mathop{\mathrm{id}}]^{A'}_A\]

Let \(T\) be differentiation from degree \(2\) polynomials \(V\) to degree \(2\) polynomials \(W\), with the basis \(1,x,x^2\) for \(V\), and the basis \(1,x,x^2\) for \(W\). You can quickly verify that the matrix is \[\begin{bmatrix} 0&1&0\\ 0&0&2\\ 0&0&0\\ \end{bmatrix}\] If we change the second basis to \(x^2,1-x,x\), what’s the new matrix? Well, the identity matrix sends \[\begin{align*} 1\mapsto 1&= 0(x^2) + 1(1-x) + 1(x)\\ x\mapsto x &= 0(x^2) + 0(1-x) + 1(x)\\ x^2\mapsto x^2 &= 1(x^2) + 0(1-x) + 0(x) \end{align*}\] which is the matrix \[\begin{bmatrix} 0&0&1\\ 1&0&0\\ 1&1&0\\ \end{bmatrix}\] And so you can calculate that the matrix of \(T\) represented by the bases \(1,x,x^2\) and \(x^2,1-x,x\) with Sage:

T1 = matrix([[0,1,0],
             [0,0,2],
             [0,0,0]])
I = matrix([[0,0,1],
             [1,0,0],
             [1,1,0]])

T2 = I*T1
show(T2)

\(\displaystyle \left(\begin{array}{rrr} 0 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 1 & 2 \end{array}\right)\)

Try calculating directly and comparing!

A helpful trick: sometimes a matrix is easier to calculate if you switch the order of the bases. For example, the identity matrix from \(x^2,1-x,x\) to \(1,x,x^2\) is easier to write down than the other way around (which is what we did above) because we actually wrote the second basis in terms of the first. When you go that way, the change of basis matrix is simply the column vectors of the alternate basis:

\[\begin{bmatrix} 0&1&0\\ 0&-1&1\\ 1&0&0\\ \end{bmatrix}\]

is the identity from \(x^2,1-x,x\) to \(1,x,x^2\). The inverse is the identity… but inversion switches the direction of the function! It will go between bases in the other order which is precisely the version of the identity we arrived at above:

I2 = matrix([[0,1,0],
             [0,-1,1],
             [1,0,0]])

1show(I2.inverse())

1: Equals the “I” above.

\(\displaystyle \left(\begin{array}{rrr} 0 & 0 & 1 \\ 1 & 0 & 0 \\ 1 & 1 & 0 \end{array}\right)\)

One special case, anticipated by the example, deserves a little extra attention. Suppose \(T\) is a map from \(V\) to itself (an endomorphism). If we want a matrix, it would make sense to use the same basis on both sides, and so we’d likely want to make the same kind of change of basis on both sides. That is, we’d have the following \[V,A' \overset{\mathop{\mathrm{id}}}\longrightarrow V,A \overset{T}\longrightarrow V,A \overset{\mathop{\mathrm{id}}}\longrightarrow V,A'\] and \[[T]^{A'}_{A'} = [\mathop{\mathrm{id}}]^A_{A'}[T]^A_A [\mathop{\mathrm{id}}]^{A'}_A.\] One might reasonably guess that the two identity matrices are related to each other. Indeed they are: \[V,A \overset{\mathop{\mathrm{id}}}\longrightarrow V,A' \overset{\mathop{\mathrm{id}}}\longrightarrow V,A\] compose into \[V,A \overset{\mathop{\mathrm{id}}}\longrightarrow V,A\] which is represented by the standard identity matrix (1s on the diagonal). The multiplication rule, then, tells us that \[[\mathop{\mathrm{id}}]^{A'}_A[\mathop{\mathrm{id}}]^A_{A'}= [\mathop{\mathrm{id}}]^A_A\] which means that the two matrices \([\mathop{\mathrm{id}}]^{A'}_A\) and \([\mathop{\mathrm{id}}]^A_{A'}\) are inverses as matrices – their matrix product is the usual identity matrix.

Therefore, the change of basis is accomplished by conjugation: abstract away the notation by setting \(M = [T]^A_A\), \(M' = [T]^{A'}_{A'}\), and \(Q=[\mathop{\mathrm{id}}]^A_{A'}\), and we arrive at \[M' = QMQ^{-1}.\]

It is a slightly subtle point that matrix representations of the identity map are not always the identity matrix, but when you view the representation as going from a vector space plus a choice of basis.

2.4 Row Reduction and LU Decomposition

You probably remember that matrices can be a convenient way to summarize a system of linear equations: you can always write them in the form \[Ax = b,\] and the solution is simply \[x = A^{-1} b.\]

The first algorithm you learned for solving such a system is row reduction. Starting from a matrix \(A\), you perform a series of “elementary row operations” on both the matrix \(A\) and vector \(b\). All these row operations correspond to left multiplication by invertible matrices. After performing enough of them, you end up with \(A=I\). If you combine the row operations by muliplying them together, you have really calculated \(A^{-1}\).

During this algorithm, there is a small issue involving “pivots” where you might have to permute some rows. Let’s imagine we are in the nice situation where this is never necessary. Also assume the matrix is square. Then the algorithm goes like this:

\[ \begin{bmatrix} a_{11} &*&*&*\\ *&*&*&*\\ *&*&*&*\\ *&*&*&* \end{bmatrix} \longrightarrow \begin{bmatrix} a_{11} &*&*&*\\ 0&a_{22}'&*&*\\ \vdots&*&*&*\\ 0&*&*&* \end{bmatrix} \longrightarrow \begin{bmatrix} a_{11} &*&*&*\\ 0&a_{22}'&*&*\\ \vdots&\ddots&*&*\\ 0&0&*&* \end{bmatrix} \longrightarrow ... \longrightarrow \begin{bmatrix} a_{11} &*&*&*\\ 0&a_{22}'&*&*\\ \vdots&\ddots&\ddots&*\\ 0&\dots&0&a_{nn}' \end{bmatrix} \]

Then, you reverse the direction of your elimination to get all zeros above the diagonal, and finally clean up the diagonals to get all \(1\)s. The “no pivot” assumption just means that \(a_{11}\neq 0\) and the subsequent \(a_{ii}' \neq 0\) as well.

What is nice about the pivot-free situation is that the matrices representing the elementary row operations are quite simple: eliminating lower rows only, and not permuting, means they will all be lower triangular! The product of lower triangular matrices is lower triangular (HW). If we think about the situation half way through the algorithm, it tells us that we’ve found an lower-triangular matrix \(E\) such that \[EA = \begin{bmatrix} a_{11} &*&*&*\\ 0&a_{22}'&*&*\\ \vdots&\ddots&\ddots&*\\ 0&\dots&0&a_{nn}' \end{bmatrix}\] is upper triangular. Observe that the inverse of a lower-triangular matrix is lower triangular (HW) and so if we let \(L = E^{-1}\), we arrive at a factorization \[A = LU\] representing \(A\) as the product of a Lower triangular matrix and an Upper triangular matrix. This is called an LU decomposition of \(A\). It’s not entirely unique, since we could have scaled the diagonal arbitrarily.

These decompositions are useful because it is far faster to solve systems which are upper or lower triangular, and they are more numerically stable.

A = matrix([[1,2,3],[0,1,0],[2,4,3]])
LU_A = A.LU(pivot='nonzero')
1show(LU_A)
2show(LU_A[1]*LU_A[2])

1: The first matrix is superfluous (it is the identity because there are no pivots).
2: The next two matrices are \(L\) and \(U\), respectively, which are multiplied here.

\(\displaystyle \left(\left(\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array}\right), \left(\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 2 & 0 & 1 \end{array}\right), \left(\begin{array}{rrr} 1 & 2 & 3 \\ 0 & 1 & 0 \\ 0 & 0 & -3 \end{array}\right)\right)\)

\(\displaystyle \left(\begin{array}{rrr} 1 & 2 & 3 \\ 0 & 1 & 0 \\ 2 & 4 & 3 \end{array}\right)\)

Even if the pivoting assumption is false and permutations are required, it’s not too hard to show that all the permutations can be done once. In this case, you get a factorization \[A = PLU\] where \(P\) is just a permutation matrix. These decompositions are even less unique, because at each step you could pick any allowable row (nonzero in the appropriate column) as pivot. This is just as useful for solving a system, though, because permuting the coordinates requires very little work to solve.

B = matrix([[1,2,3],[1,2,4],[1,3,5]])
PLU_B = B.LU(pivot='nonzero')
show('B = ',B)
show('PLU decomp =', PLU_B)
show('PLU =',PLU_B[0]*PLU_B[1]*PLU_B[2])
1show('LU =',PLU_B[1]*PLU_B[2])

1: Just LU - the rows are out of order

\(\displaystyle \verb|B|\verb| |\verb|=| \left(\begin{array}{rrr} 1 & 2 & 3 \\ 1 & 2 & 4 \\ 1 & 3 & 5 \end{array}\right)\)

\(\displaystyle \verb|PLU|\verb| |\verb|decomp|\verb| |\verb|=| \left(\left(\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & 0 \end{array}\right), \left(\begin{array}{rrr} 1 & 0 & 0 \\ 1 & 1 & 0 \\ 1 & 0 & 1 \end{array}\right), \left(\begin{array}{rrr} 1 & 2 & 3 \\ 0 & 1 & 2 \\ 0 & 0 & 1 \end{array}\right)\right)\)

\(\displaystyle \verb|PLU|\verb| |\verb|=| \left(\begin{array}{rrr} 1 & 2 & 3 \\ 1 & 2 & 4 \\ 1 & 3 & 5 \end{array}\right)\)

\(\displaystyle \verb|LU|\verb| |\verb|=| \left(\begin{array}{rrr} 1 & 2 & 3 \\ 1 & 3 & 5 \\ 1 & 2 & 4 \end{array}\right)\)

In case pivots worry you, there is a sufficient (but not necessary) criterion for the existence of an LU decomposition.

Given a matrix \(A\), let \(A_k\) be the \(k\times k\) sub-matrix starting from the upper left corner of \(A\). This called the \(k\)th principal minor of \(A\).

If \(A\) is square \(n\times n\) matrix, and all its principal minors are invertible too, then \(A\) has an \(LU\) decomposition.

Proof. Nothing to do if \(n=1\). By induction, \(A_{n-1}\) has an LU decomposition, which also means we find an lower triangular matrix \(T\) such that \(TA_{n-1} = U\) is upper-triangular. Then consider the product of block matrices:

\[ \begin{bmatrix} T & 0 \\0 & 1 \end{bmatrix} \begin{bmatrix} A_{n-1} & v \\w & a_{nn} \end{bmatrix} = \begin{bmatrix} U & Tv \\w & a_{nn} \end{bmatrix} \]

Since \(U\) is nonsingular, there is a vector \(w'\) such that \(w' U = w\) so if we further multiply by another lower triangular matrix, we get

\[ \begin{bmatrix} I_{n-1} & 0 \\-w' & 1 \end{bmatrix} \begin{bmatrix} T & 0 \\0 & 1 \end{bmatrix} \begin{bmatrix} A_{n-1} & v \\w & a_{nn} \end{bmatrix} = \begin{bmatrix} I_{n-1} & 0 \\-w' & 1 \end{bmatrix} \begin{bmatrix} U & Tv \\w & a_{nn} \end{bmatrix} = \begin{bmatrix} U & Tv \\0 & a_{nn} - w'Tv \end{bmatrix} \]

The RHS is upper-triangular, and the LHS, after grouping the first two matrix, is a lower-triangular matrix times the orginal matrix, thereby furnishing an LU decomposition.

2.5 Eigenvectors and Eigenvalues

This is the most important piece of linear algebra in our class.

Given a vector space \(V\) with an endomorphism \(T\), there are sometimes vectors which \(T\) transforms in a very simple way.

Definition 2.7 Let \(T:V\rightarrow V\) be an endomorphism of a \(K\)-vector space \(V\). We say that \(e\in V\) is an eigenvector for \(T\) with eigenvalue \(\lambda\), a scalar, if \(e\) is nonzero and \(T(e) = \lambda e.\)

Suppose \(V\) has a basis \(e_1,...,e_n\) consisting entirely of eigenvectors, with eigenvalues \(\lambda_1,...,\lambda_n\). Then the property above means that the matrix for \(T\) with respect to this basis is simply the diagonal matrix \[\begin{bmatrix} \lambda_1 & 0 & 0 &\cdots & 0\\ 0& \lambda_2 & 0 &\cdots & 0\\ 0& 0 & \lambda_3 &\cdots & 0\\ \vdots & \vdots & \vdots & \ddots &\vdots\\ 0&0&0&\cdots &\lambda_n \end{bmatrix}\]

This is a super easy matrix to calculate with!

Most square matrices \(M\) come to us from a linear map \(T:V\rightarrow V\) with the same basis on both sides. If \(T\) has enough distinct eigenvectors, then the special case above and the remark above tell us that we can find a matrix \(Q\), whose columns are just eigenvectors of \(T\) with respect to the starting basis, such that \(QMQ^{-1}\) is diagonal, with the eigenvalues on the diagonal. This is diagonalization.

Linear maps generally have enough eigenvectors, although not always.

As usual, Sage is helpful for calculating:

M = matrix(QQ,[[2,1],[3,4]])

1D,Q = M.diagonalization()

show(Q*M*Q.inverse())

1: Returns both the the matrix \(Q\) that does the diagonalizing and the diagonalization.

\(\displaystyle \left(\begin{array}{rr} 5 & 0 \\ 0 & 1 \end{array}\right)\)

2.5.1 Linear Recurrences and Rabbits

Diagonalization is deceptively useful. Consider the Fibonacci numbers, defined by \(F_0 = 0, F_1 = 1\), and the recursion \[F_{n+1} = F_n + F_{n-1}\] It can be obtained as a simple model for the growth of a rabbit population: group the rabbits into pairs of adults and babies, so that \(A_n\) is the adult population at time \(n\) and \(B_n\) is the baby population. During a unit time interval, each pair of adults produces a pair of babies, and every baby grows into an adult. No rabbits die. This means

\[A_{n+1} = A_n + B_n\] and \[B_{n+1} = A_n\]

The second implies \(B_n = A_{n-1}\). After replacing \(A\) by \(F\), one arrives at the Fibonacci recurrence above.

Sticking with the rabbit model, we can express it in terms of matrix multiplication because the RHS is a linear combination of \(A_n\)s and \(B_n\)s. \[\begin{bmatrix} A_{n+1}\\B_{n+1} \end{bmatrix}= \begin{bmatrix} 1 & 1\\ 1 & 0 \end{bmatrix} \begin{bmatrix} A_{n}\\B_{n} \end{bmatrix}\]

An initial population of \(A_0=0\) adults and \(B_0=1\) baby pair corresponds to the vector \[\begin{bmatrix} 0\\0 \end{bmatrix}\]

If we let \(F\) be the matrix, then we inductively arrive at a nice matrix expression for the populations at time \(n\):

\[ F^n \begin{bmatrix} 0\\ 1 \end{bmatrix}= \begin{bmatrix} A_n\\ B_n \end{bmatrix} \]

Diagonalization makes it easy to calculate matrix powers: a power of a diagonal matrix just raising the diagonal entries to the corresponding power. You can check (HW) that the eigenvalues of \(F\) are \(\phi\approx 1.62\) and \(\psi\approx -0.62\), the golden and (negative) silver ratios, roots of \(x^2 -x - 1\approx 0.62\). Then you can simply write

\[ F = Q \begin{bmatrix} \phi & 0 \\ 0 & \psi \end{bmatrix} Q^{-1} \ \ \ \ \Rightarrow \ \ \ \ F^n = Q \begin{bmatrix} \phi^n & 0 \\ 0 & \psi^n \end{bmatrix} Q^{-1}\]

Expanding with \(Q\) (HW) and applying to the vector \((0,1)\), you will obtain a classic expression for the Fibonacci numbers:

\[F_n = \frac{\phi^n - \psi^n}{\phi - \psi}\]

What makes this especially interesting is that \(|\psi| < 1\), and so \(|\psi^n|\rightarrow 0\) as \(n\rightarrow \infty\). Therefore, \(F_n \approx \phi^n \approx (1.62)^n\). In fact, the \(\psi^n\) term will always be small enough (after division by \(\phi-\psi\)) to be a rounding error: \(F_n\) is the nearest integer to \(\psi^n\). This tracks with our intuition that unrestricted growth is exponential in nature.

2.5.2 Left and Right Eigenvectors

Upon fixing a basis, you can look at matrices, so an eigenvector \(v\) satisfies \(Mv = \lambda v\). This is a right eigenvector. One could just as easily consider left eigenvectors: row vectors \(w\) such that \(wM = \lambda w\).

Note that left eigenvectors are equivalent to eigenvectors of the transpose: \[M^T w = \lambda w^T\] This matches the interpretation in terms of linear transformations and marked bases: left eigenvectors are eigenvectors for the dual \(T^*\) of \(T\). The matrix representation of \(T^*\) with respect to the dual basis is \(M^T\).

The two kinds of eigenvectors have essentially nothing to do with each other. But, remarkably, the resulting eigenvalues are the same, right down to multiplicity. Sometimes, it’s easier to calculate on the left than on the right. Sage handles both.

M = matrix(QQ,[[1,2],[3,4]])
1right_evs = M.eigenvectors_right()
show(right_evs)

lambda_ = right_evs[0][0]
eigen_vec = right_evs[0][1][0]
show(M*right_evs[0][1][0])
show(lambda_*eigen_vec)

1: List of tuples (e,[v],m) where:

\(\displaystyle \left[\left(-0.3722813232690144?, \left[\left(1,\,-0.6861406616345072?\right)\right], 1\right), \left(5.372281323269015?, \left[\left(1,\,2.186140661634508?\right)\right], 1\right)\right]\)

\(\displaystyle \left(-0.3722813232690144?,\,0.2554373534619714?\right)\)

e is an eigenvalue
v is a list of eigenvectors with that eigenvalue
m is the multiplicity

2.6 Important Facts

Here are a smattering of other important facts and concepts from linear algebra. Historically, these seem to be more memorable than change of basis.

Determinants.
1. There is a function called the determinant from square matrices over \(K\) to \(K\) with the property that the determinant is nonzero if and only if the matrix has an inverse (also with entries in \(K\)). In particular, a linear map from \(V\) to \(V\) has an inverse if and only if one (resp. all) matrices associated to it have nonzero determinant.
2. The determinant of an \(n\times n\) matrix is a homogeneous polynomial of degree \(n\) in the entries of the matrix. It is (multi)linear in rows and columns, and alternating in the sense that switching two rows and columns changes the determinant by a factor of \(-1\).
3. There is a row-expansion way to calculate the determinant. It is tedious but straightforward (a job for computers).
4. The determinant of a diagonal matrix is the product of its diagonal entries. In fact, the determinant of an upper triangular matrix is also the product of its diagonal entries.
Some special subspaces.
1. A linear map \(T:V\to W\) gives rise to a distinguished subspace of \(V\) and \(W\). These are the kernel, the set of \(v\in V\) such that \(T(v) = 0\), and the image, the set of \(w\in W\) such that \(w=T(v)\) for some \(v\in V\). These are usually denoted \(\ker T\) and \(\mathop{\mathrm{img}}T\).
2. The first isomorphism theorem (sometimes called Part I of the rank-nullity theorem) says that \[\mathop{\mathrm{img}}T \cong V/\ker T\]
3. The splitting lemma (sometimes called Part II of the rank-nullity theorem) says that \[V\cong \ker T \oplus \mathop{\mathrm{img}}T\] leading to the “numerical” rank-nullity theorem, \[\dim V = \dim \ker T + \dim \mathop{\mathrm{img}}T.\] (plus reinterpreted appropriately for infinite dimensions)
Eigenvalues and Kernels.
1. From the expression \[T(e) = \lambda e\] defining eigenvalues, we see that \[(T-\lambda \mathop{\mathrm{id}})(e) = 0.\] So eigenvalues are elements of the kernel of \(T - \lambda \mathop{\mathrm{id}}\). Running the algebra in reverse, we see that any nonzero element in the kernel of a linear map of the form \(T-\lambda \mathop{\mathrm{id}}\) is an eigenvector with eigenvalue \(\lambda\).
2. The characteristic polynomial of a linear transformation is the polynomial (in \(x\)) of \(T- x\mathop{\mathrm{id}}\). We know it is a degree \(n\) polynomial by our determinant facts. The roots of the characteristic polynomial are exactly the eigenvalues of \(T\).
3. The minimal polynomial of a linear transformation \(T\) is the smallest polynomial \(P(X)\) such that \(P(T) = 0\) is identically zero. If \(Q\) is any polynomial such that \(Q(T) = 0\), then \(P\) divides \(Q\). The roots of the minimal polynomial are also all the eigenvalues of \(T\).
4. The determinant of \(T\) is the constant term of the characteristic polynomial.
5. The product of the eigenvalues of \(T\), with the multiplicities from the characteristic polynomial, is its determinant. The sum is the trace.
6. Eigenvectors with distinct eigenvalues are linearly independent. Therefore, if the characteristic polynomial has \(n\) distinct roots, there is an eigenbasis which diagonalizes \(T\). (the converse is not true!)

2.7 Upper Triangularization

Diagonal matrices are extremely simple, and diagonalization means many matrices can be made quite simple after a change of basis. But not every matrix is diagonalizable. For example,

What do you do when a matrix isn’t diagonalizable? Well, another kind of matrix that’s fairly easy to calculate with are the upper-triangular matrices that appeared in LU decomposition. These correspond to systems which are easy to solve by iteratively back-substituting. Computationally, they have lots of zeros, so they’re easy to multiply, and conceptually, they preserve a flag, so their action can be sliced up into smaller pieces.

Theorem 2.1 Let \(T:V\to V\) be a linear transformation, where \(V\) has finite dimension \(n\). There is an ordered basis of \(V\) with respect to which \(T\) is upper triangular. Equivalently, for any square matrix \(M\), there is an invertible matrix \(Q\) such that \(QMQ^{-1}\) is upper triangular.

Proof. We will proceed by induction on the dimension. The base case is trivial.

Let \(v_1=e\) be an eigenvector of \(V\) - every matrix has at least one eigenvector, so \(e\) exists. Choose \(v_2,...,v_{n}\) such that \((v_1,v_2,...,v_n)\) is an ordered basis for \(V\). With respect to such a basis, \(T\) takes the form

\[M=\begin{bmatrix} \lambda & w\\ 0 & A \end{bmatrix}\]

where \(A\) is an \(n-1 \times n-1\) matrix and \(w\) is a column vector of length \(n-1\). The matrix \(A\) can be viewed as an endomorphism of an \(n-1\)-dimensional space, so by induction there is a matrix \(R\) such that \(RAR^{-1} = U\) is upper triangular. It’s easy to see, then, that \[\begin{bmatrix} 1 & 0\\ 0 & R \end{bmatrix}\begin{bmatrix} \lambda & w\\ 0 & A \end{bmatrix}\begin{bmatrix} 1 & 0\\ 0 & R^{-1} \end{bmatrix}= \begin{bmatrix} \lambda & Rw\\ 0 & U \end{bmatrix}\] which is upper triangular. Therefore, if we set \[Q =\begin{bmatrix} 1 & 0\\ 0 & R \end{bmatrix}\] the matrix \(QMQ^{-1}\) is upper triangular. Every conjugation is a change of basis, so upon unraveling the choice, one obtains an ordered basis with respect to which \(T\) is upper triangular.

2.8 *Jordan Normal Form

We introduced upper-triangularization as a second-best when diagonalization isn’t possible. While it is almost always sufficient for our purposes, one can push the idea much farther in order to obtain upper triangular matrices which are nearly as simple to calculate with as diagonal matrices.

Diagonal matrices are great – they’re a lot easier to calculate with. For example, if \(D\) is diagonal with \(\lambda_i\) along its diagonal, then \(D^n\) is also diagonal, with the powers \(\lambda_i^n\) along its diagonal.

The Jordan Normal Form is more canonical than a general upper-triangularization, and it has fare more zeros. An example is a matrix like \[ \begin{bmatrix} \lambda_1 & 1 & 0 & 0 & 0 \\ 0 & \lambda_1 & 0 & 0 & 0 \\ 0 & 0 & \lambda_2 & 0 & 0 \\ 0 & 0 & 0 & \lambda_2 & 0 \\ 0 & 0 & 0 & 0 & \lambda_3 \end{bmatrix} \]

This \(5\times 5\) matrix has repeated eigenvalues. It is not diagonalizable, because the eigenvalue \(\lambda_1\) is defective – its eigenspace doesn’t have full rank – whereas \(\lambda_2\) is repeated but still has a two-dimensional eigenspace.

You could also have \[ \begin{bmatrix} \lambda_1 & 1 & 0 & 0 & 0 \\ 0 & \lambda_1 & 1 & 0 & 0 \\ 0 & 0 & \lambda_1 & 0 & 0 \\ 0 & 0 & 0 & \lambda_2 & 0 \\ 0 & 0 & 0 & 0 & \lambda_3 \end{bmatrix} \]

where the eigenspace for \(\lambda_1\) should be \(3\)-dimensional but is actually \(1\)-dimensional. The number of 1s above the diagonal reflects the size of the defect.

That is all jordan normal form is: a matrix that is almost diagonal except for some \(1\)s above the diagonal (and each \(1\) comes together with two identical eigenvalues as neighbors). Every matrix is conjugate to one in jordan normal form – if a matrix is diagonalizable, its jordan normal form is diagonal.

These matrices are almost as nice as diagonal matrices. You can write them as \[D + N\] where \(D\) is diagonal and \(N\) has the off-diagonal \(1\)s. It can be shown (HW) that \(N\) is “nilpotent”, meaning \(N^k = 0\) for some integer \(k\). This means that large powers \((D+N)^k)\) are still pretty simple – they start from \(D^n + nD^{n-1}N + ...\) but stop after \(k\) terms, once the power of \(N\) is large enough.

Our first example of a non-diagonalizable matrix was already in Jordan Normal Form. We can use Sage to put the second one in Jordan Normal Form.

M = matrix([[-4, 9],[-4, 8]])
1J,Q = M.jordan_form(transformation=true)
2
show(Q)
show(Q.inverse()*M*Q)

1: The argument “transformation=true” is required for it to return the conjugating matrix \(Q\).
2: Be careful: Sage does conjugation the other way around for this calculation. I don’t know why it’s inconsistent with the diagonalization method.

\(\displaystyle \left(\begin{array}{rr} -6 & 1 \\ -4 & 0 \end{array}\right)\)

\(\displaystyle \left(\begin{array}{rr} 2 & 1 \\ 0 & 2 \end{array}\right)\)

Instead of using eigenvectors for a basis, the jordan normal form uses generalized eigenvectors. The difference is that generalized eigenspaces always have full rank, while eigenspaces do not. The details are interesting (please see your linear algebra book!) but not crucial for this course.

Exercises

Exercise 2.1 Verify:

The set \(\mathbb Q\) of rational numbers is a field.
The set \(\mathbb Z\) is not a field.
The sets \(\mathbb R\) and \(\mathbb C\) of real and complex numbers are fields.
If you know some algebra or number theory, show that when \(p\) is a prime, the set \(\mathbb Z/p\mathbb Z\) of integers modulo \(p\) is a field.

Exercise 2.2 Verify:

The set \(\mathbb R^2\) of pairs of real numbers is a vector space over \(\mathbb R\) when endowed with the usual addition and scalar multiplication.
The complex numbers are a real vector space.
The set of all functions \(\mathbb Z \to \mathbb Q\) is a \(\mathbb Q\)-vector space.

Exercise 2.3 Show that the span of \(B\) is a subspace of \(B\). In fact, show that the span of \(B\) is the smallest subspace of \(V\) containing \(B\): prove that if \(W\subseteq V\) is a subspace and \(W\) contains \(B\), then \(W\) contains the span of \(B\).

Exercise 2.4 Let \(V\) be a \(K\)-vector space of finite dimension \(n\). A is a collection of subspaces \[\{0\} = V_0 \subsetneq V_1 \subsetneq V_2 \subsetneq ... \subsetneq V_{n-1} \subsetneq V_n = V.\]

Prove that \(\dim V_k = k\). In fact, prove that if one chooses any collection \(v_1,...,v_n\) such that \(v_i \in V_i - V_{i-1}\), then \(v_1,...,v_k\) are linearly independent and span \(V_k\).

Conversely, show that from any ordered basis \((v_1,...,v_n)\) one can construct a full flag where \(V_k\) is the span of the first \(k\) vectors. This is the natural flag associated to an ordered basis.

Exercise 2.5 Let \(V\) be a finite-dimensional \(K\)-vector space of dimension \(n\) with a full flag \(V_1,...,V_n\). Suppose \(T:V\to V\) is a linear map such that \(T(V_k) \subseteq V_k\). We say that such a linear map respects this flag.

Show that with respect to any ordered basis \((v_1,...,v_n)\) where \(v_i\in V_i - V_{i-1}\) (c.f. previous problem) the matrix representation of \(T\) is upper triangular.

Conversely, suppose that \(V\) is a finite-dimensional vector space of dimension \(n\) and that it has some ordered basis \((v_1,...,v_n)\) with respect to which \(T\) is an upper triangular matrix. Show that there exists a full flag for \(V\) which is respected by \(T\).

In summary, upper triangular matrices correspond to linear maps from vector spaces with ordered bases that respect the natural flag for the basis.

Exercise 2.6 Prove that the product of two upper triangular matrices is upper triangular. Do so without writing down specific matrices or using matrix multiplication (use the previous problems).

Exercise 2.7 Show that

\[ \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix} \]

and

\[ \begin{bmatrix} -4 & 9 \\ -4 & 8 \end{bmatrix} \]

are not diagonalizable.

Programming Problems

Exercise 2.8 Write a function which calculates an upper triangularization using the algorithm from class. You may use Sage’s built-in eigenvalue functions, but nothing fancier.

Input: a matrix \(M\)

Output: a matrix \(Q\) such that \(Q^{-1} M Q\) is upper triangular.