Skip to main content

Inverse of a Matrix

The Inverse of a Matrix is one of the most powerful concepts in Linear Algebra, as it allows us to "undo" the effects of a matrix transformation and solve systems of linear equations.

1. What is the Matrix Inverse?​

The inverse of a square matrix A\mathbf{A} is another square matrix, denoted Aβˆ’1\mathbf{A}^{-1}, such that when A\mathbf{A} is multiplied by Aβˆ’1\mathbf{A}^{-1}, the result is the Identity Matrix (I\mathbf{I}).

The Definition​

For a square matrix A\mathbf{A}, its inverse Aβˆ’1\mathbf{A}^{-1} satisfies the condition:

AAβˆ’1=Aβˆ’1A=I\mathbf{A}\mathbf{A}^{-1} = \mathbf{A}^{-1}\mathbf{A} = \mathbf{I}

The Identity Matrix (I\mathbf{I}) acts like the number '1' in scalar multiplication (i.e., aβ‹…1=aa \cdot 1 = a). When multiplied by I\mathbf{I}, a matrix remains unchanged.

I=[100010001](ForΒ aΒ 3Γ—3Β matrix)\mathbf{I} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \quad (\text{For a } 3 \times 3 \text{ matrix})

2. Condition for Invertibility​

As we learned in the section on determinants, a matrix A\mathbf{A} has an inverse Aβˆ’1\mathbf{A}^{-1} if and only if A\mathbf{A} is non-singular.

Invertibility Rule

A matrix A\mathbf{A} is invertible if and only if its determinant is non-zero:

det⁑(A)β‰ 0\det(\mathbf{A}) \ne 0

If det⁑(A)=0\det(\mathbf{A}) = 0, the matrix is singular and Aβˆ’1\mathbf{A}^{-1} does not exist.

3. Calculating the Inverse​

Calculating the inverse for large matrices is computationally expensive and complex, but understanding the process for 2Γ—22 \times 2 matrices provides key intuition.

A. 2Γ—22 \times 2 Matrix Inverse​

For a 2Γ—22 \times 2 matrix A=[abcd]\mathbf{A} = \begin{bmatrix} a & b \\ c & d \end{bmatrix}, the inverse is calculated as:

Aβˆ’1=1det⁑(A)[dβˆ’bβˆ’ca]\mathbf{A}^{-1} = \frac{1}{\det(\mathbf{A})} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix}

Notice that the inverse calculation requires dividing by the determinant. If det⁑(A)=0\det(\mathbf{A}) = 0, the fraction is undefined, proving the non-invertibility condition.

Example: Inverting a 2x2 Matrix

Let A=[4123]\mathbf{A} = \begin{bmatrix} 4 & 1 \\ 2 & 3 \end{bmatrix}.

  1. Calculate Determinant: det⁑(A)=(4)(3)βˆ’(1)(2)=12βˆ’2=10\det(\mathbf{A}) = (4)(3) - (1)(2) = 12 - 2 = 10.

  2. Calculate Inverse:

    Aβˆ’1=110[3βˆ’1βˆ’24]=[0.3βˆ’0.1βˆ’0.20.4]\mathbf{A}^{-1} = \frac{1}{10} \begin{bmatrix} 3 & -1 \\ -2 & 4 \end{bmatrix} = \begin{bmatrix} 0.3 & -0.1 \\ -0.2 & 0.4 \end{bmatrix}

B. General Case (nΓ—nn \times n)​

For nΓ—nn \times n matrices, the inverse is typically calculated using techniques like the Gauss-Jordan elimination method or the formula involving the adjoint matrix. In practice, ML libraries like NumPy or PyTorch use highly optimized numerical algorithms to compute the inverse (or pseudo-inverse) efficiently.

4. Inverse Matrix in Machine Learning​

The primary use of the matrix inverse is to solve systems of linear equations, which forms the basis for many models.

A. Solving Linear Systems​

Consider a system of linear equations represented by:

Ax=b\mathbf{A}\mathbf{x} = \mathbf{b}

Where A\mathbf{A} is the matrix of coefficients, x\mathbf{x} is the vector of unknowns (the parameters we want to find), and b\mathbf{b} is the result vector.

To solve for x\mathbf{x}, we multiply both sides by Aβˆ’1\mathbf{A}^{-1}:

Aβˆ’1Ax=Aβˆ’1b\mathbf{A}^{-1} \mathbf{A}\mathbf{x} = \mathbf{A}^{-1}\mathbf{b}

Since Aβˆ’1A=I\mathbf{A}^{-1}\mathbf{A} = \mathbf{I}, and Ix=x\mathbf{I}\mathbf{x} = \mathbf{x}:

x=Aβˆ’1b\mathbf{x} = \mathbf{A}^{-1}\mathbf{b}

B. The Normal Equation in Linear Regression​

As mentioned earlier, the closed-form solution for the optimal weight vector (w\mathbf{w}) in Linear Regression is the Normal Equation:

w=(XTX)βˆ’1XTy\mathbf{w} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}

The calculation of the inverse of (XTX)(\mathbf{X}^T\mathbf{X}) is the most computationally intensive part of this method. For large datasets, directly calculating the inverse is often avoided in favor of iterative optimization algorithms like Gradient Descent.


The inverse is crucial for understanding linear dependencies and closed-form solutions. We now move to the two concepts that unlock the power of dimensionality reduction and data compression: Eigenvalues and Eigenvectors.