Graph Theory Basics
In Machine Learning, we often need to model relationships between entities. Graph Theory provides the mathematical framework for representing and analyzing networks of interconnected data, from the structure of a Neural Network to the relationships in a social media dataset.
1. What is a Graph?β
A graph consists of two sets:
- Vertices (Nodes) : The entities or points in the network (e.g., users, neurons, cities).
- Edges : The connections between the nodes (e.g., friendships, synaptic weights, roads).
2. Types of Graphsβ
The nature of the relationship determines the type of graph used:
| Type | Description | ML Application |
|---|---|---|
| Undirected | Edges have no direction (A-B is same as B-A). | Collaborative Filtering (User-Item similarity). |
| Directed (Digraph) | Edges point from one node to another (). | Neural Networks (Information flows forward). |
| Weighted | Edges have numerical values assigned to them. | Weights in a Neural Network or distances. |
| Cyclic / Acyclic | Whether you can follow edges to return to a start node. | Directed Acyclic Graphs (DAGs) are used in computation pipelines. |
3. Representing Graphs Mathematicallyβ
To process graphs with algorithms, we must represent them as numbers or matrices.
A. Adjacency Matrixβ
A square matrix where the entry is 1 if there is an edge between node and node , and 0 otherwise.
- Symmetric: For undirected graphs.
- Asymmetric: For directed graphs.
B. Adjacency Listβ
A collection of lists where each node has a list of the other nodes it is connected to. This is more memory-efficient for sparse graphs (where most nodes aren't connected).
4. Graph Theory in Machine Learningβ
A. Neural Network Architectureβ
A Neural Network is a directed, weighted graph. The neurons are nodes, and the weights are the directed edges. Training the network is essentially updating the weights (edge values) of this graph.
B. Knowledge Graphsβ
Companies like Google and Meta use Knowledge Graphs to store facts about the world. Entities (like "Albert Einstein") are nodes, and relationships (like "Born In") are edges. This allows for advanced reasoning and semantic search.
C. Graph Neural Networks (GNNs)β
GNNs are a class of deep learning models designed to perform inference on data described by graphs. They are used for:
- Molecular Discovery: Atoms are nodes, bonds are edges.
- Fraud Detection: Detecting suspicious patterns in transaction networks.
- Recommendation Systems: Predicting new edges (purchases) between User nodes and Item nodes.
5. Key Graph Algorithmsβ
- Shortest Path (Dijkstraβs): Used in routing and logistics.
- PageRank: The original algorithm behind Google Search, which ranks nodes based on their "importance" in the network.
- Community Detection: Finding clusters of nodes that are more densely connected to each other than to the rest of the network (Clustering).
This concludes our module on Discrete Mathematics. You now have the tools for logic, counting, and networking. Next, we enter the final and most critical pillar of ML mathematics: Probability and Statistics.