Perceptron

Quick Reference

← Previous: Linear Classifiers 📖 Read Deep Dive Next: SVM →

Core Concept

Learn to classify by making mistakes and correcting them. Like teaching a child to sort toys: watch, correct errors, repeat until they get it right.

The Algorithm

Four-Step Loop:

Guess: Start with random line/hyperplane
Pick: Select a training example
Check: Is point on correct side?
Correct: If misclassified, nudge line toward correct position

Repeat until all points correctly classified or convergence fails.

Decision Function

h(x; θ, θ 0) = sign(θ T x + θ 0)

Components:

x: Input feature vector (d-dimensional)
θ: Weight vector (perpendicular to boundary)
θ₀: Bias term (offset from origin)
sign(·): Returns +1 or -1

Update Rule

When point (x⁽ⁱ⁾, y⁽ⁱ⁾) is misclassified:

θ \leftarrow θ + y (i) x (i) θ 0 \leftarrow θ 0 + y (i)

Geometric Intuition:

If y = +1 misclassified: Add x to θ (rotate boundary toward point)
If y = -1 misclassified: Subtract x from θ (push boundary away)

No calculus. No gradient descent. Pure geometry.

Convergence Theorem

Guarantee: If data is linearly separable with margin γ and bounded inputs ||x|| ≤ R, then Perceptron converges in at most (R/γ)² updates.

Key Insight: Wider margin = faster convergence.

Critical Limitation

Only works for linearly separable data.

If no straight line can separate classes, Perceptron never converges.

Classic Failure: XOR Problem

Cannot separate (0,0), (1,1) from (0,1), (1,0) with single line
This limitation caused AI winter in 1970s
Solved by multi-layer networks + backpropagation (1980s)

Quick Facts

Simplest learning algorithm that works
Single neuron model
Every deep network = stacked Perceptrons + nonlinear activations
Understanding Perceptron = understanding modern ML foundations

← Previous: Linear Classifiers 📖 Read Deep Dive Next: SVM →