Why is Matrix Multiplication Defined That Way?

Unlike simple addition, where we just add corresponding numbers, matrix multiplication involves a complex process of multiplying rows by columns and summing them up. Why make it so complicated? The answer lies in Linear Transformations. Matrix multiplication is not just an arbitrary arithmetic rule; it is the specific tool designed to represent the composition of linear functions. Today, we will walk through how combining linear transformations naturally leads us to the definition of matrix multiplication.
1. The Composition of Linear Transformations
First, let’s look at functions. If we have two linear transformations that run sequentially, is the combined result also a linear transformation?
Let \(V, W,\) and \(Z\) be vector spaces. Suppose we have a transformation \(\mathsf{T}\) that takes us from \(V\) to \(W\), and another transformation \(\mathsf{U}\) that takes us from \(W\) to \(Z\).
Theorem 2.9
Let \(V, W,\) and \(Z\) be vector spaces over the same field \(F\). Let \(\mathsf{T}: V \to W\) and \(\mathsf{U}: W \to Z\) be linear.
Then the composition \(\mathsf{UT}: V \to Z\) is linear.
Proof
We can prove this by checking the definition of linearity (preservation of addition and scalar multiplication). Let \(x, y \in V\) and \(a \in F\).
$$
\begin{aligned} \mathsf{UT}(ax + y) &= \mathsf{U}(\mathsf{T}(ax + y)) \quad &(\text{Definition of composition}) \\ &= \mathsf{U}(a\mathsf{T}(x) + \mathsf{T}(y)) \quad &(\text{Linearity of } \mathsf{T}) \\ &= a\mathsf{U}(\mathsf{T}(x)) + \mathsf{U}(\mathsf{T}(y)) \quad &(\text{Linearity of } \mathsf{U}) \\ &= a(\mathsf{UT})(x) + (\mathsf{UT})(y). \quad &(\text{Definition of composition}) \end{aligned}
$$
Since the composition preserves the linear structure, \(\mathsf{UT}\) is indeed a linear transformation (see the previous post on functions for the definition of composition).
2. Algebraic Properties of Composition
Now that we know the composition is linear, how does it behave? It turns out that composing linear transformations follows many of the same algebraic rules as multiplying numbers, such as the associative and distributive laws.
Theorem 2.10
Let \(V\) be a vector space. Let \(\mathsf{T}, \mathsf{U}_1, \mathsf{U}_2 \in \mathcal{L}(V)\) (linear operators on V). Then:
- Distributivity:\(\mathsf{T}(\mathsf{U}_1 + \mathsf{U}_2) = \mathsf{TU}_1 + \mathsf{TU}_2\) and \((\mathsf{U}_1 + \mathsf{U}_2)\mathsf{T} = \mathsf{U}_1\mathsf{T} + \mathsf{U}_2\mathsf{T}\)
- Associativity:\(\mathsf{T}(\mathsf{U}_1\mathsf{U}_2) = (\mathsf{TU}_1)\mathsf{U}_2\)
- Identity:\(\mathsf{TI} = \mathsf{IT} = \mathsf{T}\)
- Scalar Multiplication:\(a(\mathsf{U}_1\mathsf{U}_2) = (a\mathsf{U}_1)\mathsf{U}_2 = \mathsf{U}_1(a\mathsf{U}_2)\) for all scalars \(a\).
These properties allow us to manipulate complex chains of transformations algebraically.
3. Connecting Composition to Matrix Multiplication
This is the crucial step. We want to represent these abstract transformations using matrices.
Let’s define ordered bases for our spaces: \(\alpha\) for \(V\), \(\beta\) for \(W\), and \(\gamma\) for \(Z\).
- Let \(B = [\mathsf{T}]^\beta_\alpha\) (The matrix representing \(\mathsf{T}\))
- Let \(A = [\mathsf{U}]^\gamma_\beta\) (The matrix representing \(\mathsf{U}\))
When we calculate the action of the composite transformation \(\mathsf{UT}\) on a basis vector \(v_j\), we have to pass the inputs through \(\mathsf{T}\) (using coefficients \(B_{kj}\)) and then through \(\mathsf{U}\) (using coefficients \(A_{ik}\)).
This formula motivates the standard definition of matrix multiplication.
Definition: Matrix Product
Let \(A\) be an \(m \times n\) matrix and \(B\) be an \(n \times p\) matrix. We define the product \(AB\) to be the \(m \times p\) matrix such that:
$$
(AB){ij} = \sum_{k=1}^{n} A_{ik}B_{kj}
$$
This formula essentially tells us to take the dot product of the \(i\)-th row of \(A\) and the \(j\)-th column of \(B\).
Example 1
To visualize this, let’s multiply two \(2 \times 2\) matrices.
Let \(A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}\) and \(B = \begin{pmatrix} 2 & 0 \\ 1 & 2 \end{pmatrix}\).
To find the top-left entry (Row 1 of A \(\cdot\) Column 1 of B):
$$
(1 \times 2) + (2 \times 1) = 2 + 2 = 4
$$
To find the top-right entry (Row 1 of A \(\cdot\) Column 2 of B):
$$
(1 \times 0) + (2 \times 2) = 0 + 4 = 4
$$
Repeating this for the second row, the final matrix product is:
$$
AB = \begin{pmatrix} 1\cdot2 + 2\cdot1 & 1\cdot0 + 2\cdot2 \\ 3\cdot2 + 4\cdot1 & 3\cdot0 + 4\cdot2 \end{pmatrix} = \begin{pmatrix} 4 & 4 \\ 10 & 8 \end{pmatrix}
$$
Example 2
Before we dive into our next example, let’s look at a quick but important concept: the Transpose of a matrix.
Definition: Transpose (\(A^t\))
The transpose of an \(m \times n\) matrix \(A\) is the \(n \times m\) matrix \(A^t\) obtained by interchanging the rows and columns. That is, \((A^t){ij} = A{ji}\).
Now, we show that if \(A\) is an \(m \times n\) matrix and \(B\) is an \(n \times p\) matrix, then \((AB)^t = B^t A^t\).
Since
$$
(AB)^t_{ij} = (AB)_{ji} = \sum_{k=1}^{n} A_{jk} B_{ki}
$$
and
$$
(B^t A^t){ij} = \sum_{k=1}^{n} (B^t)_{ik} (A^t)_{kj} = \sum_{k=1}^{n} B_{ki} A_{jk} = \sum_{k=1}^{n} A_{jk} B_{ki},
$$
we are finished. Therefore, the transpose of a product is the product of the transposes in the opposite order.
4. The Fundamental Relationship (Theorem 2.11)
We defined matrix multiplication specifically to make the following theorem true. It connects the “world of functions” (transformations) to the “world of numbers” (matrices).
Theorem 2.11
Let \(V, W,\) and \(Z\) be finite-dimensional vector spaces with ordered bases \(\alpha, \beta,\) and \(\gamma\), respectively. Let \(\mathsf{T}: V \to W\) and \(\mathsf{U}: W \to Z\) be linear transformations. Then:
$$
[\mathsf{UT}]^\gamma_\alpha = [\mathsf{U}]^\gamma_\beta [\mathsf{T}]^\beta_\alpha
$$
In simple terms: The matrix of the composite transformation is the product of the matrices of the individual transformations. This is the “why” behind matrix multiplication.
5. Left-multiplication transformation
We complete this section by introducing a concept that allows us to go in the “reverse” direction: defining a transformation based on a matrix. This serves as a critical bridge between the abstract world of functions and the concrete world of matrices.
Definition
Let \(A\) be an \(m \times n\) matrix with entries from a field \(F\). We define the left-multiplication transformation \(\mathsf{L}_A: \mathsf{F}^n \to \mathsf{F}^m\) by:
$$
\mathsf{L}_A(x) = Ax
$$
for every column vector \(x \in \mathsf{F}^n\).
In simpler terms, \(\mathsf{L}_A\) is a function that takes a vector \(x\) as an input and produces a new vector by strictly multiplying \(x\) on the left by the matrix \(A\).
Why is this important?
This transformation is likely the most vital tool for transferring properties between transformations and matrices.
Because \(\mathsf{L}_A\) is a linear transformation, we can use it to prove algebraic properties of matrices by relying on the known properties of functions. For example, since the composition of functions is naturally associative (doing \(f\), then \(g\), then \(h\) is structurally stable), we can use \(\mathsf{L}_A\) to prove that matrix multiplication is associative without getting lost in messy summation indices.
Example
Let’s see \(\mathsf{L}_A\) in action with a simple calculation.
Suppose we have a \(2 \times 2\) matrix \(A\) and a vector \(x\):
$$
A = \begin{pmatrix} 2 & 1 \\ -1 & 3 \end{pmatrix}, \quad x = \begin{pmatrix} 4 \\ 2 \end{pmatrix}
$$
The transformation \(\mathsf{L}_A\) maps the vector \(x\) to a new vector in \(\mathsf{F}^2\) by performing the matrix product \(Ax\):
$$
\mathsf{L}_A(x) = \begin{pmatrix} 2 & 1 \\ -1 & 3 \end{pmatrix} \begin{pmatrix} 4 \\ 2 \end{pmatrix}
$$
Calculating the product:
- Row 1: \((2 \times 4) + (1 \times 2) = 8 + 2 = 10\)
- Row 2: \((-1 \times 4) + (3 \times 2) = -4 + 6 = 2\)
$$
\mathsf{L}_A(x) = \begin{pmatrix} 10 \\ 2 \end{pmatrix}
$$
So, the transformation \(\mathsf{L}_A\) takes the input vector \(\begin{pmatrix} 4 \ 2 \end{pmatrix}\) and maps it to the output vector \(\begin{pmatrix} 10 \ 2 \end{pmatrix}\). By viewing matrices as these active “left-multiplication” functions, we can understand their behavior much more intuitively.
References
The theorem numbering in this post follows Linear Algebra (4th Edition) by Friedberg, Insel, and Spence. Some explanations and details here differ from the book.