Feature Representation

Quick Reference

Core Concept

Transform raw data into structured numerical vectors that algorithms can process. Like prepping ingredients before cooking: raw flour must become dough before it can become pizza.

The Feature Map

Ο•: X β†’ Rᡈ

Maps real-world object x to vector v of numbers.

Encoding Techniques

One-Hot Encoding

For categorical data (e.g., [Italian, Mexican, Thai]):

Each category gets independent dimension, preventing false mathematical relationships.

Bag of Words

For text processing:

Polynomial Features

Purpose: Enable linear models to learn curved boundaries Method: Add powers and interactions of existing features Example (1D):

Ο•(x) = [1, x, xΒ², ..., xᡏ]α΅€

Key Insight: Linear model in transformed space = non-linear in original space Warning: Combinatorial explosion in high dimensions. Use domain knowledge to select relevant features only.

Transformed Model

h(x) = ΞΈα΅€Ο•(x) + ΞΈβ‚€

Learn non-linear boundaries in input space X while staying linear in feature space Rᡈ.

Modern Approach: Embeddings (2025)

Concept: Neural networks learn feature representations automatically How it works: Applications: Vector Databases: RAG (Retrieval Augmented Generation):

Key Principles

← Previous: SVM πŸ“– Read Deep Dive Next: Regression β†’