Regression Cheatsheet | The AI Neurons

← Previous: Features 📖 Read Deep Dive Next: Optimization →

Core Concept

Predict continuous numerical values (not categories). Like estimating house price ($450,000) instead of classifying as "Expensive" or "Cheap".

Linear Regression

Find the line (or hyperplane) that best fits the data points.

Goal: Minimize distance between predictions and actual values.

The Model

h(x) = θᵀx + θ₀

Components:

θ: Weight vector (slope)
θ₀: Bias term (intercept)
x: Input features
h(x): Predicted output

Cost Function: Mean Squared Error (MSE)

J(θ) = (1/2n) * Σ (h(x⁽ⁱ⁾) - y⁽ⁱ⁾)²

Why square errors?

Positive and negative errors don't cancel out
Punishes large mistakes heavily (off by 100 is 100× worse than off by 10, not 10×)
Factor of 1/2 simplifies gradient calculation

Finding Optimal Parameters

Analytical Solution (OLS)

Use linear algebra to calculate exact solution
Perfect but computationally expensive for large datasets
Best for small to medium data

Gradient Descent

Iterative approach for large datasets
Start with random line
Follow slope of error landscape downhill
Take small steps until minimum error found

Practical Considerations

Feature Scaling

Problem: Features with different ranges (bedrooms: 1-5, square footage: 500-5000) cause issues. Solution: Scale all features to similar range (e.g., 0 to 1). Benefit: Faster and more accurate convergence.

Outliers

Issue: Squared errors mean single outlier can drag entire line. Action: Always check for and handle anomalies before training.

Evaluation Metric: R-Squared

Meaning: Proportion of variance in target explained by model. Range: 0.0 to 1.0

1.0 = perfect prediction
0.0 = no better than guessing average

Regression vs. Classification

Regression: Predicts quantity (temperature: 72.5°F) Classification: Predicts category (weather: Sunny/Rainy) Rule: If output is a number, use regression.

Quick Facts

Foundation for stock forecasting, climate modeling, price prediction
Linear regression assumes straight-line relationship
Can extend to polynomial regression for curves
Basis for more complex ML algorithms