Skip to main content

Vectors - Data Representation

If a scalar is a single number, a Vector is an ordered list of scalars. Vectors are arguably the most important data structure in Machine Learning, as they are how we mathematically represent individual data samples.

1. What is a Vector?

A vector is a quantity that has both magnitude (length) and direction.

In Linear Algebra, a vector is typically represented as a column or row of numbers (scalars).

Notation and Representation

Vectors are usually denoted by bold, lowercase letters (e.g., v,x,a\mathbf{v}, \mathbf{x}, \mathbf{a}).

A vector x\mathbf{x} in nn-dimensional space (Rn\mathbb{R}^n) is represented as:

x=[x1x2xn]\mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}
  • Components/Elements: The individual numbers x1,x2,,xnx_1, x_2, \ldots, x_n are the scalar components.
  • Dimensions: The number of components, nn, is the dimension of the vector. A vector in R2\mathbb{R}^2 is a point in a 2D plane; a vector in R100\mathbb{R}^{100} is a point in 100D space.

2. Vectors in Machine Learning

In ML, every single data sample or feature set is treated as a vector.

A. Feature Vectors

Consider a dataset for predicting house prices. Each house is a single data sample, which is converted into a vector of features:

xhouse=[AreaBedroomsAge]\mathbf{x}_{\text{house}} = \begin{bmatrix} \text{Area} \\ \text{Bedrooms} \\ \text{Age} \end{bmatrix}

If a specific house has an area of 20002000 sq ft, 44 bedrooms, and is 55 years old, its vector representation is:

x1=[200045]\mathbf{x}_1 = \begin{bmatrix} 2000 \\ 4 \\ 5 \end{bmatrix}

This vector x1\mathbf{x}_1 exists in R3\mathbb{R}^3 space.

B. Embeddings

In Natural Language Processing (NLP), words or documents are represented as word embeddings, which are dense vectors (often in R300\mathbb{R}^{300} or R768\mathbb{R}^{768}) capturing semantic meaning.

3. Key Vector Properties

A. Magnitude (Length or 2\ell_2-Norm)

The magnitude of a vector, often called the 2\ell_2-norm (or Euclidean norm), measures its length from the origin.

For a vector xRn\mathbf{x} \in \mathbb{R}^n, the magnitude is calculated as:

x2=i=1nxi2=x12+x22++xn2\Vert \mathbf{x} \Vert_2 = \sqrt{\sum_{i=1}^{n} x_i^2} = \sqrt{x_1^2 + x_2^2 + \dots + x_n^2}
2\ell_2-Norm in ML

The 2\ell_2-norm is used extensively:

  • Distance: It is the standard way to measure the distance between two vectors (data points).
  • Regularization: In techniques like Ridge Regression, the 2\ell_2-norm of the weight vector is minimized to prevent overfitting.

B. Direction

The direction of a vector is defined by the components' ratios, independent of the vector's length. A Unit Vector (x^\hat{\mathbf{x}}) has a magnitude of 1 and points in the same direction as x\mathbf{x}.

4. Fundamental Vector Operations

A. Vector Addition

Vectors can be added if they have the same dimension. Addition is done element-wise. Geometrically, vector addition follows the parallelogram rule.

a+b=[a1a2]+[b1b2]=[a1+b1a2+b2]\mathbf{a} + \mathbf{b} = \begin{bmatrix} a_1 \\ a_2 \end{bmatrix} + \begin{bmatrix} b_1 \\ b_2 \end{bmatrix} = \begin{bmatrix} a_1 + b_1 \\ a_2 + b_2 \end{bmatrix}

B. Scalar Multiplication

Multiplying a vector v\mathbf{v} by a scalar cc changes the vector's magnitude (length) but not its direction (unless cc is negative).

cv=c[v1v2]=[cv1cv2]c\mathbf{v} = c \begin{bmatrix} v_1 \\ v_2 \end{bmatrix} = \begin{bmatrix} c \cdot v_1 \\ c \cdot v_2 \end{bmatrix}

C. Dot Product (Scalar Product)

The dot product of two vectors a\mathbf{a} and b\mathbf{b} (of the same dimension nn) results in a scalar. It is calculated by multiplying corresponding elements and summing the results.

ab=i=1naibi=a1b1+a2b2++anbn\mathbf{a} \cdot \mathbf{b} = \sum_{i=1}^{n} a_i b_i = a_1 b_1 + a_2 b_2 + \dots + a_n b_n

The dot product also relates to the angle θ\theta between the vectors:

ab=abcos(θ)\mathbf{a} \cdot \mathbf{b} = \Vert \mathbf{a} \Vert \Vert \mathbf{b} \Vert \cos(\theta)
Dot Product in ML

The dot product is the core calculation in almost every ML model:

  • Similarity: It is used to measure the similarity between two vectors (e.g., how similar two word embeddings are).
  • Weighted Sum: In a neural network, the input features vector x\mathbf{x} is multiplied by the weight vector w\mathbf{w} using a dot product: wx\mathbf{w} \cdot \mathbf{x}.

Let a=[21]\mathbf{a} = \begin{bmatrix} 2 \\ 1 \end{bmatrix} and b=[13]\mathbf{b} = \begin{bmatrix} -1 \\ 3 \end{bmatrix}.

ab=(2)(1)+(1)(3)=2+3=1 \mathbf{a} \cdot \mathbf{b} = (2)(-1) + (1)(3) = -2 + 3 = 1


Vectors are single data samples. When we combine many data samples, we form a structured grid of numbers the Matrix.