Core Concept
Learn to classify by making mistakes and correcting them. Like teaching a child to sort toys: watch, correct errors, repeat until they get it right.
The Algorithm
Four-Step Loop:
- Guess: Start with random line/hyperplane
- Pick: Select a training example
- Check: Is point on correct side?
- Correct: If misclassified, nudge line toward correct position
Repeat until all points correctly classified or convergence fails.
Decision Function
h(x; θ, θ0) = sign(θTx + θ0)
Components:
- x: Input feature vector (d-dimensional)
- θ: Weight vector (perpendicular to boundary)
- θ0: Bias term (offset from origin)
- sign(·): Returns +1 or -1
Update Rule
When point (x(i), y(i)) is misclassified:
θ ← θ + y(i)x(i)
θ0 ← θ0 + y(i)
θ0 ← θ0 + y(i)
Geometric Intuition:
- If y = +1 misclassified: Add x to θ (rotate boundary toward point)
- If y = -1 misclassified: Subtract x from θ (push boundary away)
No calculus. No gradient descent. Pure geometry.
Convergence Theorem
Guarantee: If data is linearly separable with margin γ and bounded inputs ||x|| ≤ R, then Perceptron converges in at most (R/γ)² updates.
Key Insight: Wider margin = faster convergence.
Critical Limitation
Only works for linearly separable data.
If no straight line can separate classes, Perceptron never converges.
Classic Failure: XOR Problem
- Cannot separate (0,0), (1,1) from (0,1), (1,0) with single line
- This limitation caused AI winter in 1970s
- Solved by multi-layer networks + backpropagation (1980s)
Quick Facts
- Simplest learning algorithm that works
- Single neuron model
- Every deep network = stacked Perceptrons + nonlinear activations
- Understanding Perceptron = understanding modern ML foundations