Core Concept
Transform raw data into structured numerical vectors that algorithms can process. Like prepping ingredients before cooking: raw flour must become dough before it can become pizza.
The Feature Map
Ο: X β Rα΅
Maps real-world object x to vector v of numbers.
Encoding Techniques
One-Hot Encoding
For categorical data (e.g., [Italian, Mexican, Thai]):
- Wrong: Italian=1, Mexican=2, Thai=3 (implies false ordering)
- Correct: Mexican = [0, 1, 0], Thai = [0, 0, 1]
Each category gets independent dimension, preventing false mathematical relationships.
Bag of Words
For text processing:
- Ignore word order
- Count word presence/frequency
- "The quick brown fox" β vector of word counts
- Powers spam filters and early search engines
Polynomial Features
Purpose: Enable linear models to learn curved boundaries Method: Add powers and interactions of existing features Example (1D):Ο(x) = [1, x, xΒ², ..., xα΅]α΅
Transformed Model
h(x) = ΞΈα΅Ο(x) + ΞΈβ
Learn non-linear boundaries in input space X while staying linear in feature space Rα΅.
Modern Approach: Embeddings (2025)
Concept: Neural networks learn feature representations automatically How it works:- Network reads millions of examples
- Learns to place similar concepts close in high-dimensional space
- "Taco" and "Burrito" near each other, far from "Sushi"
- Store learned representations as vectors
- Search by geometric proximity, not keyword matching
- "Spicy food" finds vectors close to "spiciness" concept
- Convert documents to feature vectors
- Find relevant vectors
- Feed to LLM for context-aware responses
Key Principles
- Algorithms understand numbers, not concepts
- Good features make simple models powerful
- Manual feature engineering β learned embeddings
- Balance expressiveness vs. overfitting