Vectors - Data Representation
If a scalar is a single number, a Vector is an ordered list of scalars. Vectors are arguably the most important data structure in Machine Learning, as they are how we mathematically represent individual data samples.
1. What is a Vector?
A vector is a quantity that has both magnitude (length) and direction.
In Linear Algebra, a vector is typically represented as a column or row of numbers (scalars).
Notation and Representation
Vectors are usually denoted by bold, lowercase letters (e.g., ).
A vector in -dimensional space () is represented as:
- Components/Elements: The individual numbers are the scalar components.
- Dimensions: The number of components, , is the dimension of the vector. A vector in is a point in a 2D plane; a vector in is a point in 100D space.
2. Vectors in Machine Learning
In ML, every single data sample or feature set is treated as a vector.
A. Feature Vectors
Consider a dataset for predicting house prices. Each house is a single data sample, which is converted into a vector of features:
If a specific house has an area of sq ft, bedrooms, and is years old, its vector representation is:
This vector exists in space.
B. Embeddings
In Natural Language Processing (NLP), words or documents are represented as word embeddings, which are dense vectors (often in or ) capturing semantic meaning.
3. Key Vector Properties
A. Magnitude (Length or -Norm)
The magnitude of a vector, often called the -norm (or Euclidean norm), measures its length from the origin.
For a vector , the magnitude is calculated as:
The -norm is used extensively:
- Distance: It is the standard way to measure the distance between two vectors (data points).
- Regularization: In techniques like Ridge Regression, the -norm of the weight vector is minimized to prevent overfitting.
B. Direction
The direction of a vector is defined by the components' ratios, independent of the vector's length. A Unit Vector () has a magnitude of 1 and points in the same direction as .
4. Fundamental Vector Operations
A. Vector Addition
Vectors can be added if they have the same dimension. Addition is done element-wise. Geometrically, vector addition follows the parallelogram rule.