Skip to main content

Sets and Relations

Discrete Mathematics deals with distinct, separated values rather than continuous ranges. Set Theory is the language we use to group these values, and Relations describe how these groups interact. In Machine Learning, these concepts are vital for everything from defining probability spaces to building database schemas for training data.

1. Set Theory Fundamentalsโ€‹

A Set is an unordered collection of distinct objects, called elements.

Notationโ€‹

  • A={1,2,3}A = \{1, 2, 3\} : A set containing numbers 1, 2, and 3.
  • xโˆˆAx \in A : xx is an element of set AA.
  • โˆ…\emptyset : An empty set.
  • R,Z,N\mathbb{R}, \mathbb{Z}, \mathbb{N} : Sets of Real numbers, Integers, and Natural numbers.
Common Sets in ML
  • R\mathbb{R} (Real Numbers): Used for continuous features like height, price, or weight.
  • Z\mathbb{Z} (Integers): Used for count-based data (e.g., number of clicks).
  • {0,1}\{0, 1\} (Binary Set): The standard output set for binary classification.
  • {C1,C2,โ€ฆ,Ck}\{C_1, C_2, \dots, C_k\} (Categorical Set): The labels for multi-class classification.

Key Operationsโ€‹

The interaction between sets is often visualized using Venn Diagrams.

  • Union (AโˆชBA \cup B): Elements in AA, or BB, or both. (Equivalent to a logical OR).
  • Intersection (AโˆฉBA \cap B): Elements present in both AA and BB. (Equivalent to a logical AND).
  • Difference (Aโˆ–BA \setminus B): Elements in AA that are not in BB.
  • Complement (AcA^c): Everything in the universal set that is not in AA.
Sets in ML

In classification tasks, the Label Space is a set. For a cat/dog classifier, the set of possible outputs is Y={Cat,Dog}Y = \{\text{Cat}, \text{Dog}\}. When evaluating models, we often look at the Intersection of predicted labels and true labels to calculate accuracy.

2. Cartesian Productsโ€‹

The Cartesian Product of two sets AA and BB, denoted Aร—BA \times B, is the set of all possible ordered pairs (a,b)(a, b).

Aร—B={(a,b)โˆฃaโˆˆAย andย bโˆˆB}A \times B = \{ (a, b) \mid a \in A \text{ and } b \in B \}

If AA represents "Users" and BB represents "Movies," Aร—BA \times B represents every possible interaction between every user and every movie. This is the foundation of Utility Matrices in Recommender Systems.

3. Relationsโ€‹

A Relation RR from set AA to set BB is simply a subset of the Cartesian product Aร—BA \times B. It defines a relationship between elements of the two sets.

Types of Relationsโ€‹

In ML, we specifically look for certain properties in relations:

  • Reflexive: Every element is related to itself.
  • Symmetric: If aa is related to bb, then bb is related to aa (e.g., "Similarity" in clustering).
  • Transitive: If aโ†’ba \to b and bโ†’cb \to c, then aโ†’ca \to c.

Binary Relations and Graphsโ€‹

Relations are often represented as Directed Graphs. If (a,b)โˆˆR(a, b) \in R, we draw an arrow from node aa to node bb.

4. Why this matters in Machine Learningโ€‹

A. Data Preprocessingโ€‹

When we perform "One-Hot Encoding" or handle categorical variables, we are mapping elements from a discrete set of categories into a numerical space.

B. Knowledge Graphsโ€‹

Modern AI often uses Knowledge Graphs (like those powering Google Search). These are massive sets of entities connected by relations (e.g., (Paris, is_capital_of, France)).

C. Formal Logic in AIโ€‹

Sets and relations form the basis of predicate logic, which is used in "Symbolic AI" and for defining constraints in optimization problems.


Now that we can group objects into sets and relate them, we need to understand the logic that allows us to make valid inferences from these groups.