Gradient Descent

Quick Reference

Core Concept

Iterative optimization algorithm to minimize loss by following the slope downhill. The compass that guides ML models to optimal accuracy.

Mountain Analogy

Strategy: Feel slope, step downward. Repeat until bottom reached.

Update Rule

θ⁽ᵗ⁺¹⁾ = θ⁽ᵗ⁾ - η ∇J(θ⁽ᵗ⁾)

Components: Goal: Find θ* = arg min J(θ)

Learning Rate (Critical Parameter)

Too Small: Too Large: Sweet Spot: "Goldilocks" zone for stable, efficient convergence

Types of Gradient Descent

Batch Gradient Descent

Stochastic Gradient Descent (SGD)

Mini-Batch SGD (Modern Standard)

Gradient vs. Backpropagation

Backpropagation: Calculates the gradient (the slope) Gradient Descent: Updates weights using that gradient

They are partners, not alternatives.

Common Issues

Local Minima

Why Need Learning Rate?

Modern Approaches (2025)

Adam (Adaptive Moment Estimation)

Learn to Optimize (L2O)

Resource Efficiency

Quick Facts

← Previous: Regression 📖 Read Deep Dive Logo Home