Skip to main content

Inverse of a Matrix

The Inverse of a Matrix is one of the most powerful concepts in Linear Algebra, as it allows us to "undo" the effects of a matrix transformation and solve systems of linear equations.

1. What is the Matrix Inverse?โ€‹

The inverse of a square matrix A\mathbf{A} is another square matrix, denoted Aโˆ’1\mathbf{A}^{-1}, such that when A\mathbf{A} is multiplied by Aโˆ’1\mathbf{A}^{-1}, the result is the Identity Matrix (I\mathbf{I}).

The Definitionโ€‹

For a square matrix A\mathbf{A}, its inverse Aโˆ’1\mathbf{A}^{-1} satisfies the condition:

AAโˆ’1=Aโˆ’1A=I\mathbf{A}\mathbf{A}^{-1} = \mathbf{A}^{-1}\mathbf{A} = \mathbf{I}

The Identity Matrix (I\mathbf{I}) acts like the number '1' in scalar multiplication (i.e., aโ‹…1=aa \cdot 1 = a). When multiplied by I\mathbf{I}, a matrix remains unchanged.

I=[100010001](Forย aย 3ร—3ย matrix)\mathbf{I} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \quad (\text{For a } 3 \times 3 \text{ matrix})

2. Condition for Invertibilityโ€‹

As we learned in the section on determinants, a matrix A\mathbf{A} has an inverse Aโˆ’1\mathbf{A}^{-1} if and only if A\mathbf{A} is non-singular.

Invertibility Rule

A matrix A\mathbf{A} is invertible if and only if its determinant is non-zero:

detโก(A)โ‰ 0\det(\mathbf{A}) \ne 0

If detโก(A)=0\det(\mathbf{A}) = 0, the matrix is singular and Aโˆ’1\mathbf{A}^{-1} does not exist.

3. Calculating the Inverseโ€‹

Calculating the inverse for large matrices is computationally expensive and complex, but understanding the process for 2ร—22 \times 2 matrices provides key intuition.

A. 2ร—22 \times 2 Matrix Inverseโ€‹

For a 2ร—22 \times 2 matrix A=[abcd]\mathbf{A} = \begin{bmatrix} a & b \\ c & d \end{bmatrix}, the inverse is calculated as:

Aโˆ’1=1detโก(A)[dโˆ’bโˆ’ca]\mathbf{A}^{-1} = \frac{1}{\det(\mathbf{A})} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix}

Notice that the inverse calculation requires dividing by the determinant. If detโก(A)=0\det(\mathbf{A}) = 0, the fraction is undefined, proving the non-invertibility condition.

Example: Inverting a 2x2 Matrix

Let A=[4123]\mathbf{A} = \begin{bmatrix} 4 & 1 \\ 2 & 3 \end{bmatrix}.

  1. Calculate Determinant: detโก(A)=(4)(3)โˆ’(1)(2)=12โˆ’2=10\det(\mathbf{A}) = (4)(3) - (1)(2) = 12 - 2 = 10.

  2. Calculate Inverse:

    Aโˆ’1=110[3โˆ’1โˆ’24]=[0.3โˆ’0.1โˆ’0.20.4]\mathbf{A}^{-1} = \frac{1}{10} \begin{bmatrix} 3 & -1 \\ -2 & 4 \end{bmatrix} = \begin{bmatrix} 0.3 & -0.1 \\ -0.2 & 0.4 \end{bmatrix}

B. General Case (nร—nn \times n)โ€‹

For nร—nn \times n matrices, the inverse is typically calculated using techniques like the Gauss-Jordan elimination method or the formula involving the adjoint matrix. In practice, ML libraries like NumPy or PyTorch use highly optimized numerical algorithms to compute the inverse (or pseudo-inverse) efficiently.

4. Inverse Matrix in Machine Learningโ€‹

The primary use of the matrix inverse is to solve systems of linear equations, which forms the basis for many models.

A. Solving Linear Systemsโ€‹

Consider a system of linear equations represented by:

Ax=b\mathbf{A}\mathbf{x} = \mathbf{b}

Where A\mathbf{A} is the matrix of coefficients, x\mathbf{x} is the vector of unknowns (the parameters we want to find), and b\mathbf{b} is the result vector.

To solve for x\mathbf{x}, we multiply both sides by Aโˆ’1\mathbf{A}^{-1}:

Aโˆ’1Ax=Aโˆ’1b\mathbf{A}^{-1} \mathbf{A}\mathbf{x} = \mathbf{A}^{-1}\mathbf{b}

Since Aโˆ’1A=I\mathbf{A}^{-1}\mathbf{A} = \mathbf{I}, and Ix=x\mathbf{I}\mathbf{x} = \mathbf{x}:

x=Aโˆ’1b\mathbf{x} = \mathbf{A}^{-1}\mathbf{b}

B. The Normal Equation in Linear Regressionโ€‹

As mentioned earlier, the closed-form solution for the optimal weight vector (w\mathbf{w}) in Linear Regression is the Normal Equation:

w=(XTX)โˆ’1XTy\mathbf{w} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}

The calculation of the inverse of (XTX)(\mathbf{X}^T\mathbf{X}) is the most computationally intensive part of this method. For large datasets, directly calculating the inverse is often avoided in favor of iterative optimization algorithms like Gradient Descent.


The inverse is crucial for understanding linear dependencies and closed-form solutions. We now move to the two concepts that unlock the power of dimensionality reduction and data compression: Eigenvalues and Eigenvectors.