Core Concept
Find the decision boundary with the widest possible margin. Like parking in the center of a space, maximizing distance to both neighbors.
Philosophy
Perceptron: Satisfied with any separating line SVM: Perfectionist, searches for the best line with maximum marginThe Margin
Distance between decision boundary and nearest data points from either class.
Support Vectors: Nearest points that "support" or define the boundary. Why margin matters: Wide margin = robustness to noisy data. Boundary has breathing room.SVM Objective
Find decision boundary that:
1. Maximizes the margin
2. Correctly classifies all training points
Hinge Loss
L(y, f(x)) = max(0, 1 - y · f(x))
Where f(x) = θᵀx + θ₀ and y ∈ {-1, +1}
Three Cases: Wrong (y · f(x) < 0):- Big penalty (loss > 1)
- Misclassified point
- Small penalty
- Correct but too close to boundary (inside margin)
- Zero penalty
- Outside margin, exactly where it should be
Optimization Problem
minimize: (1/2)||θ||² + C · Σ max(0, 1 - y⁽ⁱ⁾(θᵀx⁽ⁱ⁾ + θ₀))
- Minimizing this maximizes margin
- Margin inversely proportional to ||θ||
- Penalizes misclassifications and margin violations
Hyperparameter C
Large C:- Prioritize perfect training fit
- Small margin
- Risk overfitting
- Prioritize wide margin
- Tolerate some training errors
- Better generalization
The Kernel Trick
Map data to higher-dimensional space where it becomes linearly separable, without computing transformation explicitly.
Common Kernels: Linear: K(x, x') = xᵀx'- Standard SVM
- Polynomial boundaries
- Complex, smooth boundaries
Enables learning complex, nonlinear boundaries with linear optimization elegance.
Quick Facts
- Margin maximization for robust classification
- Hinge loss creates wide margins
- Kernel trick enables nonlinear boundaries
- Still relevant for small data and interpretability
- Theoretical rigor vs. neural network empiricism