Thursday, May 1, 2025

Adam Optimizer

Understanding Adam Optimizer in Deep Learning

Optimization is the heart of deep learning. It's what allows neural networks to learn from data and improve over time. Among the many optimization algorithms out there, Adam (Adaptive Moment Estimation) has become one of the most popular — and for good reason. In this blog, we’ll explore what Adam is, how it works, and why it's widely used.


What Is the Adam Optimizer?

The Adam Optimizer is an algorithm for first-order gradient-based optimization. It combines the best parts of two other popular optimizers:

  • Momentum – which helps accelerate gradients in the right direction.

  • RMSProp – which adapts the learning rate for each parameter individually.

Adam was introduced by Diederik Kingma and Jimmy Ba in their 2015 paper:

“Adam: A Method for Stochastic Optimization”


How Does Adam Work?

Adam updates the weights of a neural network using the following formulas:

Let:

  • gtg_t be the gradient at time step tt

  • mtm_t be the first moment estimate (mean of gradients)

  • vtv_t be the second moment estimate (uncentered variance of gradients)

1. Compute the moving averages:

mt=β1mt1+(1β1)gt
vt=β2vt1+(1β2)gt2

2. Bias correction:

m^t=mt1β1t,v^t=vt1β2t\hat{m}_t = \frac{m_t}{1 - \beta_1^t}, \quad \hat{v}_t = \frac{v_t}{1 - \beta_2^t}

3. Update parameters:

θt+1=θtαm^tv^t+ϵ\theta_{t+1} = \theta_t - \alpha \cdot \frac{\hat{m}_t}{\sqrt{\hat{v}_t} + \epsilon}

Where:

  • α\alpha is the learning rate

  • β10.9\beta_1 \approx 0.9, β20.999\beta_2 \approx 0.999, and ϵ108\epsilon \approx 10^{-8} (small constant to prevent division by zero)


Why Use Adam?

Adaptive Learning Rates

Adam adjusts the learning rate for each parameter, which helps with faster convergence.

Less Tuning

Often works well with default settings, making it beginner-friendly.

Efficient and Scalable

Well-suited for problems with large datasets or many parameters.


When Not to Use Adam

  • In some sparse data or generalization-focused tasks, SGD with momentum may outperform Adam.

  • Adam can converge faster but may generalize worse compared to SGD in some cases.


Adam in Practice 

import torch import torch.nn as nn import torch.optim as optim model = nn.Linear(10, 1) optimizer = optim.Adam(model.parameters(), lr=0.001) # Training loop for input, target in data_loader: optimizer.zero_grad() output = model(input) loss = loss_fn(output, target) loss.backward() optimizer.step()

Final Thoughts

The Adam optimizer is like an all-rounder in your deep learning toolkit. It’s fast, easy to use, and powerful for most scenarios. While not perfect, it's a great starting point when building and training neural networks.

Eigenvalues and Eigenvectors

Understanding Eigenvalues and Eigenvectors 

If you're diving into linear algebra, machine learning, or data science, you've probably come across the terms eigenvalues and eigenvectors. At first glance, they sound abstract—but they hold powerful meaning in how we understand transformations in space.

In this blog post, we'll break down the concepts of eigenvalues and eigenvectors with simple language and visuals.


What Are Eigenvalues and Eigenvectors?

Let's start with a basic matrix equation:

Av=λvA \vec{v} = \lambda \vec{v}

Here’s what each symbol means:

  • AA: A square matrix (e.g., a 2×2 matrix)

  • v\vec{v}: A vector that doesn't change direction when the matrix is applied

  • λ\lambda: A scalar called the eigenvalue

In words: an eigenvector is a special vector that, when a matrix acts on it, only gets stretched or squished—not rotated. The amount it stretches or squishes is the eigenvalue.


A Geometric Intuition

Imagine a 2D plane. Most vectors will rotate and stretch when multiplied by a matrix. But some vectors lie along special directions—they may get longer or shorter, but they don’t change direction.

Visual:

These non-rotating vectors are eigenvectors, and the amount they stretch is the eigenvalue.



Example: 2x2 Matrix

Let’s take a simple matrix:

A=[2003]A = \begin{bmatrix} 2 & 0 \\ 0 & 3 \end{bmatrix}

This matrix scales the x-direction by 2 and the y-direction by 3.

Now try a vector v=[10]\vec{v} = \begin{bmatrix} 1 \\ 0 \end{bmatrix}:

Av=[2003][10]=[20]A \vec{v} = \begin{bmatrix} 2 & 0 \\ 0 & 3 \end{bmatrix} \begin{bmatrix} 1 \\ 0 \end{bmatrix} = \begin{bmatrix} 2 \\ 0 \end{bmatrix}

This is just 2 times the original vector. So:

  • Eigenvector: v=[10]\vec{v} = \begin{bmatrix} 1 \\ 0 \end{bmatrix}

  • Eigenvalue: λ=2\lambda = 2


Why Do Eigenvectors Matter?

Eigenvalues and eigenvectors show up in many real-world applications:

  • PCA (Principal Component Analysis): Used for dimensionality reduction in machine learning.

  • Quantum Mechanics: Eigenvectors describe possible states; eigenvalues describe measurable outcomes.

  • Google PageRank: Based on eigenvector centrality in graph theory.

  • Computer Graphics: For shape transformations and 3D modeling.


How Do You Calculate Them?

To find eigenvalues of a matrix AA, solve:

det(AλI)=0\text{det}(A - \lambda I) = 0

This gives you a characteristic equation. Solve it to find eigenvalues λ\lambda. Then plug back to find eigenvectors.


Final Thoughts

Eigenvectors and eigenvalues aren't just abstract math—they're tools that help us understand transformations, compress data, and uncover hidden patterns. 

Adam Optimizer

Understanding Adam Optimizer in Deep Learning Optimization is the heart of deep learning. It's what allows neural networks to learn fro...