The Matrix Representation of a Linear Transformation

If you are diving into linear algebra, you have likely noticed a shift. We start with relatively comfortable grids of numbers (matrix), but suddenly, we find ourselves in the world of “abstract vector spaces” (\(V\) and \(W\)). It can feel like floating in zero gravity. How do we anchor ourselves back to the concrete world of calculations? The secret lies in bases and coordinates. Today, we’ll break down three definitions that act as the bridge between abstract math and concrete numbers.

1. The Ordered Basis: Order Matters

In introductory math, sets are usually unordered collections. \({a, b}\) is the same as \({b, a}\). But in linear algebra, when we want to assign coordinates to vectors, order is important.

Definition

An ordered basis for a vector space \(V\) is a finite sequence of linearly independent vectors that generates (spans) \(V\). Unlike a standard set, the specific order of the vectors is fixed.

Simple Example

Consider the vector space \(\mathbb{F}^3\) (standard 3D space).

We usually work with the standard basis vectors: \(e_1 = (1,0,0)\), \(e_2 = (0,1,0)\), and \(e_3 = (0,0,1)\).

  • Basis \(\beta\): \(\beta = {e_1, e_2, e_3}\) is the standard ordered basis.
  • Basis \(\gamma\): \(\gamma = {e_2, e_1, e_3}\) is a different ordered basis.

Even though they contain the exact same vectors, \(\beta \neq \gamma\) as ordered bases. Why? because if you swap the reference axes (putting \(y\) first instead of \(x\)), the coordinates of every point in space change.

2. Coordinate Vector: The Address of a Vector

Once we have a fixed ordered basis, every abstract vector in the space gets a unique “address.” This address is the coordinate vector.

Definition

Let \(\beta = \{u_1, u_2, \dots, u_n\}\) be an ordered basis for \(V\).

Any vector \(x \in V\) can be written uniquely as a linear combination:

$$
x = a_1u_1 + a_2u_2 + \dots + a_nu_n
$$

The coordinate vector of \(x\) relative to \(\beta\), denoted as \([x]_\beta\), is the column vector formed by these coefficients:

$$
[x]_\beta = \begin{pmatrix} a_1 \\ a_2 \\ \vdots \\ a_n \end{pmatrix}
$$

Simple Example

Let’s look at the space of polynomials with degree \(\le 1\), denoted \(P_1(\mathbb{R})\).

Let our ordered basis be \(\beta = {1, x}\).

If we have a vector (polynomial) \(v = 3 + 2x\):

  • Here, \(u_1 = 1\) and coefficient \(a_1 = 3\).
  • Here, \(u_2 = x\) and coefficient \(a_2 = 2\).

Therefore, the coordinate vector is:

$$
[v]_\beta = \begin{pmatrix} 3 \\ 2 \end{pmatrix}
$$

Note: If we flipped the basis order to \({x, 1}\), the coordinate vector would flip to \(\begin{pmatrix} 2 \ 3 \end{pmatrix}\). This is why the order in definition 1 was so crucial!

3. The Matrix Representation of a Linear Transformation

The Logical Foundation: Why a Few Vectors Are Enough

To turn a linear transformation \(T\) into a matrix, we face a fundamental challenge: How do we capture a function’s behavior on infinite vectors within a finite grid of numbers?

The answer lies in the following Theorem. This theorem acts as our mathematical guarantee. It states that a linear transformation is completely determined by its action on the basis vectors alone. Based on this theorem, we can define a transformation that acts on the entire infinite space just by looking at a few basis vectors. The logic works like this:

If we know the outputs for the basis vectors—\(T(v_1), T(v_2), \dots, T(v_n)\)—we automatically know \(T(x)\) for every single \(x\) in the “universe” of \(V\). Why? Because this Theorem proves that there exists exactly one linear transformation that satisfies those specific conditions. It is mathematically impossible to have other linear transformations that behaves the same way on the basis but acts differently on other vectors \(x\). Because the transformation is locked in by the basis, we only need to record what happens to the basis to capture the whole picture.

Theorem 2.6: If \(V\) and \(W\) are vector spaces and \(\{v_1, \dots, v_n\}\) is a basis for \(V\), we are free to send these basis vectors to any vectors \(w_1, \dots, w_n\) in \(W\). Crucially, once we decide where the basis vectors go, there exists exactly one linear transformation \(T: V \to W\) that satisfies those conditions.

Corollary: Let \(V\) and \(W\) be vector spaces, and suppose that \(V\) has a finite basis \({v_1, v_2, \dots, v_n}\). If \(\mathsf{U}, \mathsf{T}: V \to W\) are linear and \(\mathsf{U}(v_i) = \mathsf{T}(v_i)\) for \(i = 1, 2, \dots, n\), then \(\mathsf{U} = \mathsf{T}\).

Defining the Matrix representation

Now that we have proven that \(T(v_j)\) is the only data that matters, we simply need to write that data down. This is where the Matrix Representation is born.

Let \(T: V \to W\) be a linear transformation.

Let \(\beta = \{v_1, v_2, \dots, v_n\}\) be the ordered basis for the input space \(V\).

Let \(\gamma = \{w_1, w_2, \dots, w_m\}\) be the ordered basis for the output space \(W\).

For each input basis vector \(v_j\), the result \(T(v_j)\) lands in the output space \(W\). Therefore, \(T(v_j)\) can be uniquely expressed as a linear combination of the output basis \(\gamma\):

$$
T(v_j) = \sum_{i=1}^{m} a_{ij} w_i
$$

These scalars \(a_{ij}\) describe exactly where the input basis vectors land.

We collect these scalars to form the matrix \([T]_\beta^\gamma\), where the \(j\)-th column consists of the scalars \(a_{1j}, a_{2j}, \dots, a_{mj}\).

The matrix representation of T, denoted \([T]_\beta^\gamma\), is the \(m \times n\) matrix constructed by stacking columns.

Specifically, the \(j\)-th column of the matrix is the coordinate vector of the transformed basis vector \(T(v_j)\) relative to \(\gamma\).

$$
[T]_\beta^\gamma = \begin{pmatrix} | & | & & | \\ [T(v_1)]\gamma & [T(v_2)]\gamma & \dots & [T(v_n)]\gamma \\ | & | & & | \end{pmatrix}
$$

Simple Example 1

Let’s use the polynomial space \(P_1(\mathbb{R})\) (polynomials of degree \(\le 1\)) and the standard ordered basis \(\beta = \{1, x\}\).

Let’s define a linear transformation \(T: P_1(\mathbb{R}) \to P_1(\mathbb{R})\) that represents the derivative. If you are not familiar with derivatives, that’s perfectly okay—just briefly read it and skip to the next example:

$$
T(f) = f’
$$

Since the input and output spaces are the same and we are using basis \(\beta\) for both, we are looking for \([T]_\beta\).

To find this matrix, we take the vectors in our basis \(\beta\) one by one, apply \(T\), and then rewrite the result in terms of \(\beta\):

  1. Input first basis vector (\(1\)):
    • Apply \(T\): \(T(1) = \frac{d}{dx}(1) = 0\).
    • Map to coordinates relative to \(\beta\): \(0 = \mathbf{0}\cdot(1) + \mathbf{0}\cdot(x)\).
    • Coordinate vector: \(\begin{pmatrix} 0 \ 0 \end{pmatrix}\). This is our 1st column.
  2. Input second basis vector (\(x\)):
    • Apply \(T\): \(T(x) = \frac{d}{dx}(x) = 1\).
    • Map to coordinates relative to \(\beta\): \(1 = \mathbf{1}\cdot(1) + \mathbf{0}\cdot(x)\).
    • Coordinate vector: \(\begin{pmatrix} 1 \ 0 \end{pmatrix}\). This is our 2nd column.

Putting the columns together, the matrix representation is:

$$
[T]_\beta = \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix}
$$

Simple Example 2

Let’s look at a transformation between standard coordinate spaces, which is easier to visualize than polynomials.

The Setup:

  • Transformation: Let \(T: \mathbb{R}^2 \to \mathbb{R}^3\) be defined by the formula: $$
    T(a_1, a_2) = (a_1 + 2a_2, ~a_1 – a_2, ~2a_1 – a_2)
    $$
  • Input Basis (\(\beta\)): The standard basis for \(\mathbb{R}^2\): \(e_1 = (1, 0)\) and \(e_2 = (0, 1)\).
  • Output Basis (\(\gamma\)): The standard basis for \(\mathbb{R}^3\): \(e_1, e_2, e_3\).

The Goal: Find \([T]_\beta^\gamma\).

Step 1: Transform the first basis vector (\(e_1\))

Plug \(e_1 = (1, 0)\) into our formula:

$$T(1, 0) = (1(1) + 2(0), ~1(1) – 1(0), ~2(1) – 1(0)) = (1, 1, 2)$$

Now, write this result as a coordinate vector relative to the output basis \(\gamma\):

$$
(1, 1, 2) = 1e_1 + 1e_2 + 2e_3 \Rightarrow \begin{pmatrix} 1 \\ 1 \\ 2 \end{pmatrix}
$$

This is our first column.

Step 2: Transform the second basis vector (\(e_2\))

Plug \(e_2 = (0, 1)\) into our formula:

$$
T(0, 1) = (1(0) + 2(1), ~1(0) – 1(1), ~ 2(0) – 1(1)) = (2,~ -1,~ -1)
$$

Write this result as a coordinate vector relative to \(\gamma\):

$$
(2, -1, -1) = 2e_1 – 1e_2 – 1e_3 \Rightarrow \begin{pmatrix} 2 \\ -1 \\ -1 \end{pmatrix}
$$

This is our second column.

Step 3: Construct the Matrix

We simply place the two columns side-by-side:

$$
[T]_\beta^\gamma = \begin{pmatrix} 1 & 2 \\ 1 & -1 \\ 2 & -1 \end{pmatrix}
$$

Why this is useful:

Instead of calculating the messy formula \((a_1 + 2a_2, \dots)\) every time, we can now just multiply this simple matrix by any vector \(\begin{pmatrix} a_1 \ a_2 \end{pmatrix}\) to get the same result!

To ensure we don’t get lost in the symbols, here is a quick cheat sheet on the notation used in the definition:

  • \([T]_\beta^\gamma\): The general form. Used when input basis \(\beta\) and output basis \(\gamma\) are different (or simply distinct).
  • \([T]_\beta\): The shorthand form. Used ONLY when \(V=W\) and the input/output bases are identical (\(\beta = \gamma\)).

The Difference

It is easy to confuse \([x]_\beta\) and \([T]_\beta^\gamma\) because they both result in arrays of numbers. Here is the key difference:

FeatureCoordinate Vector \([x]_\beta\)Matrix Representation \([T]_\beta^\gamma\)​
What it representsA specific Object (a vector).A Function (a mechanism that changes vectors).
ShapeAn \(n \times 1\) column (vertical list).An \(m \times n\) rectangular grid.
RoleIt tells you “Where act I?” (Location).It tells you “How do I move?” (Transformation).
AnalogyThink of this as the GPS coordinates of a specific car.Think of this as the Instruction Manual for the engine that drives the car.

4. Unlocking the “Space” of Functions: A Look at Linear Transformations

Now, let’s explore the fact that linear transformations can form a vector space and act like vectors

Defining Operations on Functions

Before we can treat functions as vectors, we need to define how to add them together and how to scale them. The first definition from the text establishes the “rules of engagement” for arbitrary functions.

Definition: Let \(T, U: V \to W\) be arbitrary functions.

  • Vector Addition: We define the sum \((T + U)\) by \((T + U)(x) = T(x) + U(x)\) for all \(x \in V\).
  • Scalar Multiplication: We define \((aT)\) by \((aT)(x) = aT(x)\) for all \(x \in V\) and scalar \(a \in F\).

Simple Example

Imagine \(V\) and \(W\) are just the real numbers (\(\mathbb{R}\)).

Let two functions be defined as:

  • \(T(x) = 2x\)
  • \(U(x) = 3x + 1\)

Using the definition above:

  • Sum: \((T + U)(x) = (2x) + (3x + 1) = 5x + 1\)
  • Scalar Multiply (let \(a=4\)): \((4T)(x) = 4(2x) = 8x\)

Linearity is Preserved

Theorem 2.7: Let \(V\) and \(W\) be vector spaces, and let \(T, U: V \to W\) be linear.

\(a\) For all scalars \(a \in F\), the map \(aT + U\) is linear.

\(b\) The collection of all linear transformations from \(V\) to \(W\) is itself a vector space over \(F\).

The Notation \(\mathcal{L}(V, W)\)

Now that we know this collection of maps acts like a vector space (it has addition, a zero element, inverses, etc.), we give it a special name.

Definition: We denote the vector space of all linear transformations from \(V\) into \(W\) by \(\mathcal{L}(V, W)\).

  • If the domain and codomain are the same (\(V = W\)), we simply write \(\mathcal{L}(V)\).

Simple Example & The Matrix Connection

If \(V = \mathbb{R}^2\) (2D space) and \(W = \mathbb{R}^3\) (3D space), then \(\mathcal{L}(\mathbb{R}^2, \mathbb{R}^3)\) is the set of all possible linear maps taking 2D vectors to 3D vectors.

Why is this useful?

There is a direct link between this space and matrices.

  • A linear transformation from \(\mathbb{R}^2\) to \(\mathbb{R}^3\) can be represented perfectly by a \(3 \times 2\) matrix.
  • Therefore, the vector space \(\mathcal{L}(\mathbb{R}^2, \mathbb{R}^3)\) behaves exactly like the space of all \(3 \times 2\) matrices (\(M_{3 \times 2}\)).

Appendix: Theorem 2.6 and Corollary

The Goal: This theorem tells us that to define a linear transformation on a whole vector space, you don’t need to specify where every vector goes. You only need to decide where the basis vectors go. Once you choose the destination for the basis vectors, the path for every other vector is mathematically locked in.

Proof

Step 1: Defining the Function

Proof. Let \(x \in V\). Then

$$
x = \sum_{i=1}^{n} a_i v_i
$$

where \(a_1, a_2, \dots, a_n\) are unique scalars.

First, we must create a candidate function. We rely on the definition of a basis. Because \({v_1, \dots, v_n}\) is a basis, every vector \(x\) has a “DNA code”—a unique set of coordinates (\(a_i\)). If these scalars weren’t unique, our next step would be impossible because the function would be confused about which output to give.

Define

$$
\mathsf{T}: V \to W \quad \text{by} \quad \mathsf{T}(x) = \sum_{i=1}^{n} a_i w_i
$$

Here is the construction. We define the output \(\mathsf{T}(x)\) by taking the “DNA” of \(x\) (the scalars \(a_i\)) and attaching them to the new vectors \(w_i\). We have constructed a function, but we don’t know if it is linear yet.

Step 2: Proving Linearity

(a) \(\mathsf{T}\) is linear: Suppose that \(u, v \in V\) and \(d \in F\). Then we may write

$$
u = \sum_{i=1}^{n} b_i v_i \quad \text{and} \quad v = \sum_{i=1}^{n} c_i v_i
$$

for some scalars \(b_1, \dots, b_n, c_1, \dots, c_n\).

Thus

$$
du + v = \sum_{i=1}^{n} (d b_i + c_i)v_i
$$

To check linearity, we look at a linear combination of inputs: \(du + v\). We need to find the coordinates of this new vector. Because vector addition works component-wise, the new coordinate for the \(i\)-th position is simply \(db_i + c_i\).

So

$$
\mathsf{T}(du + v) = \sum_{i=1}^{n} (d b_i + c_i)w_i
$$

$$
= d \sum_{i=1}^{n} b_i w_i + \sum_{i=1}^{n} c_i w_i
$$

$$
= d \mathsf{T}(u) + \mathsf{T}(v)
$$

Now we feed that new combined vector into our function \(\mathsf{T}\). By our definition, \(\mathsf{T}\) swaps the basis vectors \(v_i\) for the target vectors \(w_i\), but keeps the scalar coefficients exactly the same. We can then split the sum back apart. This proves the function preserves the structure of the space (linearity).

Step 3: Checking the Requirement

(b) Clearly

$$
\mathsf{T}(v_i) = w_i \quad \text{for } i = 1, 2, \dots, n
$$

We built this machine, but does it actually do what the theorem asked? We check the basis vectors.

For a specific basis vector \(v_1\), the coordinates are \(a_1=1\) and all other \(a_i=0\).

Therefore, \(\mathsf{T}(v_1) = 1 \cdot w_1 + 0 \cdot w_2 + \dots = w_1\). It works!

Step 4: Proving Uniqueness

(c) \(\mathsf{T}\) is unique: Suppose that \(\mathsf{U}: V \to W\) is linear and \(\mathsf{U}(v_i) = w_i\) for \(i = 1, 2, \dots, n\).

We know one such map exists (we just made it). But could there be another, different map \(\mathsf{U}\) that also sends \(v_i\) to \(w_i\)? To prove uniqueness, we assume there is a competitor \(\mathsf{U}\) and try to prove it is actually identical to \(\mathsf{T}\).

Then for \(x \in V\) with

$$
x = \sum_{i=1}^{n} a_i v_i
$$

we have

$$
\mathsf{U}(x) = \sum_{i=1}^{n} a_i \mathsf{U}(v_i) = \sum_{i=1}^{n} a_i w_i = \mathsf{T}(x)
$$

Because \(\mathsf{U}\) is linear, it must allow scalars to come outside and sums to split. This forces \(\mathsf{U}\) to determine the output of \(x\) based solely on the outputs of the basis vectors \(v_i\).

Since \(\mathsf{U}\) agrees with \(\mathsf{T}\) on the basis vectors, linearity forces it to agree with \(\mathsf{T}\) on every vector \(x\).

Hence \(\mathsf{U} = \mathsf{T}\).

Corollary

Let \(V\) and \(W\) be vector spaces, and suppose that \(V\) has a finite basis \({v_1, v_2, \dots, v_n}\). If \(\mathsf{U}, \mathsf{T}: V \to W\) are linear and \(\mathsf{U}(v_i) = \mathsf{T}(v_i)\) for \(i = 1, 2, \dots, n\), then \(\mathsf{U} = \mathsf{T}\).

Proof

Let \(x\) be any vector in \(V\). Since \({v_1, \dots, v_n}\) is a basis for \(V\), there exist unique scalars \(a_1, \dots, a_n\) such that:

$$
x = \sum_{i=1}^{n} a_i v_i
$$

Now, we calculate \(\mathsf{U}(x)\) using the linearity of \(\mathsf{U}\):

$$
\begin{aligned} \mathsf{U}(x) &= \mathsf{U}\left( \sum_{i=1}^{n} a_i v_i \right) \\ &= \sum_{i=1}^{n} a_i \mathsf{U}(v_i) \quad (\text{by linearity of } \mathsf{U}) \end{aligned}
$$

Next, we calculate \(\mathsf{T}(x)\) using the linearity of \(\mathsf{T}\):

$$
\begin{aligned} \mathsf{T}(x) &= \mathsf{T}\left( \sum_{i=1}^{n} a_i v_i \right) \\ &= \sum_{i=1}^{n} a_i \mathsf{T}(v_i) \quad (\text{by linearity of } \mathsf{T}) \end{aligned}
$$

We are given that \(\mathsf{U}(v_i) = \mathsf{T}(v_i)\) for all \(i\). Therefore:

$$
\sum_{i=1}^{n} a_i \mathsf{U}(v_i) = \sum_{i=1}^{n} a_i \mathsf{T}(v_i)
$$

This implies that \(\mathsf{U}(x) = \mathsf{T}(x)\). Since this holds for every \(x \in V\), we conclude that \(\mathsf{U} = \mathsf{T}\).

References & Further Reading

The theorem numbering in this post follows Linear Algebra (4th Edition) by Friedberg, Insel, and Spence. Some explanations and details here differ from the book. If you want a deeper and more rigorous treatment of linear algebra, this book is an excellent reference.

Leave a Reply

Your email address will not be published. Required fields are marked *