Feature Representation
Turning Raw Data into Learnable Signals
Imagine you are a chef. You have the finest raw ingredients: a sack of flour, a basket of tomatoes, and a block of cheese. But you cannot simply throw the sack of flour into the oven and expect a pizza. You must first process the ingredients: grind, chop, knead, and season.
In Machine Learning, Feature Representation is this culinary prep work. Your raw data—whether it's pixels, text, or database rows—is often unusable in its native form. To make a machine learn, we must transform this raw "flour" into a structured "dough" that algorithms can digest. It is the bridge between the messy real world and the precise mathematical world of AI.
The Art of Translation
Algorithms live in a world of numbers. They understand vectors, matrices, and geometry. They do not understand "Red", "Blue", or "Happy".
The goal of feature representation is to define a mapping, let's call it ϕ (phi), that translates a real-world object x into a vector of numbers v. If we do this well, simple linear models can solve complex, non-linear problems.
Discrete & Text: The Menu and The Soup
The Menu: One-Hot Encoding
Suppose your data is a restaurant menu with categories: [Italian, Mexican, Thai]. You might be tempted to assign numbers: Italian=1, Mexican=2, Thai=3. But this is dangerous! It implies that Mexican is "twice" as much as Italian, or that Thai is "greater" than Mexican.
Instead, we use One-Hot Encoding. We create a vector with a slot for every possible category. "Mexican" becomes [0, 1, 0]. "Thai" becomes [0, 0, 1]. Each category gets its own independent dimension, preventing false mathematical relationships.
The Soup: Bag of Words
For text, imagine chopping up a sentence into a bowl of word soup. We don't care about the order, just the ingredients. The sentence "The quick brown fox" is represented by a vector counting the presence of each word. While simple, this Bag of Words model powers everything from spam filters to early search engines.
Polynomial Features: Bending the Line
Linear classifiers are great, but they are rigid. They can only draw straight lines. What if your data follows a curve?
We don't need a more complex algorithm; we need richer features. By checking the interactions between our existing features—squaring them, multiplying them together—we can create Polynomial Features.
If you give a linear model x and x2 as inputs, it can effectively draw a parabola while still thinking it's drawing a straight line in a higher dimension. Use this wisely, though; add too many complexities, and your model might start hallucinating patterns that don't exist (Overfitting).
The Mathematics
Let's formalize this transformation. We define a feature map ϕ: X → Rd.
1. The Transformed Model
Our original linear hypothesis was h(x) = θTx. With feature transformation, it becomes:
This allows us to learn non-linear boundaries in the original input space X, as long as they are linear in the transformed feature space Rd.
2. Polynomial Basis Example
For a 1D input x, a k-th order polynomial feature map looks like:
For a d-dimensional input, this expands rapidly to include all cross-terms (xixj), leading to a combinatorial explosion in dimensions. This curse of dimensionality is why we must obtain domain knowledge to select only the relevant features.
2025: Encoded Meaning
In 2025, we have moved beyond manually cooking up polynomial features. We now rely on Embeddings.
The Embedding Revolution: Instead of us telling the computer "Mexican is a category," we let a Neural Network read millions of menus. It learns to place "Taco" and "Burrito" close together in a 1000-dimensional space, far away from "Sushi".
- Vector Databases: Modern systems store these "learned thoughts" (vectors) in massive Vector Databases. When you search for "spicy food," the system doesn't match keywords; it looks for vectors geometrically close to the concept of "spiciness" in that high-dimensional space.
- RAG (Retrieval Augmented Generation): This is the backbone of today's AI. We convert your documents into feature vectors, find the relevant ones, and feed them to an LLM. It all starts with how we represent the features.