The Perceptron: The First Neural Network
The Perceptron is the simplest form of a neural network, introduced by Frank Rosenblatt in 1958. It is a mathematical model inspired by the biological neuron, designed to perform binary classification (predicting whether an input belongs to one of two categories).
1. Biological Inspiration
Just as a biological neuron receives signals through dendrites and "fires" an impulse through the axon once a threshold is reached, the artificial perceptron sums up its inputs and triggers an output based on a threshold.
2. Anatomy of a Perceptron
A perceptron consists of four main parts:
- Input Values (): The features of your data.
- Weights (): Values that represent the "strength" or importance of each input.
- Bias (): An additional parameter that allows the model to shift the activation function left or right.
- Activation Function: A rule (usually a Step Function) that decides the final output.
3. The Mathematics of "Firing"
The perceptron calculates a weighted sum of the inputs and adds a bias. This result is passed through a Step Function.
The Weighted Sum ():
The Activation (Heaviside Step Function):
The output is determined by whether is positive or negative:
4. How the Perceptron Learns
The learning process is an iterative adjustment of weights and bias. If the model makes a mistake, the weights are updated using the Perceptron Learning Rule:
Where:
- (Learning Rate): A small value (e.g., 0.01) that controls how drastically we update the weights.
- : The true label (0 or 1).
- : The output from the perceptron.
5. The Critical Limitation: Linearity
The Perceptron can only solve problems that are Linearly Separable. This means it can only learn to classify data that can be separated by a straight line.
The XOR Problem
In 1969, Minsky and Papert proved that a single-layer perceptron could not solve the XOR logic gate because the points cannot be separated by a single straight line. This discovery led to the first "AI Winter," which only ended when researchers began stacking perceptrons to create Multi-Layer Perceptrons (MLP).
6. Implementation with NumPy
import numpy as np
def step_function(z):
return 1 if z >= 0 else 0
def perceptron(inputs, weights, bias):
# Calculate z = (w1*x1 + w2*x2 + ... + wn*xn) + b
z = np.dot(inputs, weights) + bias
return step_function(z)
# Example: AND Gate logic
# Inputs: [0, 0], [0, 1], [1, 0], [1, 1]
weights = np.array([1, 1])
bias = -1.5
print(perceptron([1, 1], weights, bias)) # Output: 1 (True)
print(perceptron([0, 1], weights, bias)) # Output: 0 (False)
One neuron can't solve complex problems. But what happens when we connect thousands of them in layers?