Feature Representation Cheatsheet

← Previous: SVM 📖 Read Deep Dive Next: Regression →

Core Concept

Transform raw data into structured numerical vectors that algorithms can process. Like prepping ingredients before cooking: raw flour must become dough before it can become pizza.

The Feature Map

ϕ: X \to Rᵈ

Maps real-world object x to vector v of numbers.

Encoding Techniques

One-Hot Encoding

For categorical data (e.g., [Italian, Mexican, Thai]):

Wrong: Italian=1, Mexican=2, Thai=3 (implies false ordering)
Correct: Mexican = [0, 1, 0], Thai = [0, 0, 1]

Each category gets independent dimension, preventing false mathematical relationships.

Bag of Words

For text processing:

Ignore word order
Count word presence/frequency
"The quick brown fox" → vector of word counts
Powers spam filters and early search engines

Polynomial Features

Purpose: Enable linear models to learn curved boundaries Method: Add powers and interactions of existing features Example (1D):

ϕ(x) = [1, x, x², ..., xᵏ]ᵀ

Key Insight: Linear model in transformed space = non-linear in original space Warning: Combinatorial explosion in high dimensions. Use domain knowledge to select relevant features only.

Transformed Model

h(x) = θᵀϕ(x) + θ₀

Learn non-linear boundaries in input space X while staying linear in feature space Rᵈ.

Modern Approach: Embeddings (2025)

Concept: Neural networks learn feature representations automatically How it works:

Network reads millions of examples
Learns to place similar concepts close in high-dimensional space
"Taco" and "Burrito" near each other, far from "Sushi"

Applications: Vector Databases:

Store learned representations as vectors
Search by geometric proximity, not keyword matching
"Spicy food" finds vectors close to "spiciness" concept

RAG (Retrieval Augmented Generation):

Convert documents to feature vectors
Find relevant vectors
Feed to LLM for context-aware responses

Key Principles

Algorithms understand numbers, not concepts
Good features make simple models powerful
Manual feature engineering → learned embeddings
Balance expressiveness vs. overfitting