Core Concept
Predict continuous numerical values (not categories). Like estimating house price ($450,000) instead of classifying as "Expensive" or "Cheap".
Linear Regression
Find the line (or hyperplane) that best fits the data points.
Goal: Minimize distance between predictions and actual values.The Model
h(x) = θᵀx + θ₀
- θ: Weight vector (slope)
- θ₀: Bias term (intercept)
- x: Input features
- h(x): Predicted output
Cost Function: Mean Squared Error (MSE)
J(θ) = (1/2n) * Σ (h(x⁽ⁱ⁾) - y⁽ⁱ⁾)²
- Positive and negative errors don't cancel out
- Punishes large mistakes heavily (off by 100 is 100× worse than off by 10, not 10×)
- Factor of 1/2 simplifies gradient calculation
Finding Optimal Parameters
Analytical Solution (OLS)
- Use linear algebra to calculate exact solution
- Perfect but computationally expensive for large datasets
- Best for small to medium data
Gradient Descent
- Iterative approach for large datasets
- Start with random line
- Follow slope of error landscape downhill
- Take small steps until minimum error found
Practical Considerations
Feature Scaling
Problem: Features with different ranges (bedrooms: 1-5, square footage: 500-5000) cause issues. Solution: Scale all features to similar range (e.g., 0 to 1). Benefit: Faster and more accurate convergence.Outliers
Issue: Squared errors mean single outlier can drag entire line. Action: Always check for and handle anomalies before training.Evaluation Metric: R-Squared
Meaning: Proportion of variance in target explained by model. Range: 0.0 to 1.0- 1.0 = perfect prediction
- 0.0 = no better than guessing average
Regression vs. Classification
Regression: Predicts quantity (temperature: 72.5°F) Classification: Predicts category (weather: Sunny/Rainy) Rule: If output is a number, use regression.Quick Facts
- Foundation for stock forecasting, climate modeling, price prediction
- Linear regression assumes straight-line relationship
- Can extend to polynomial regression for curves
- Basis for more complex ML algorithms