Variational Autoencoders

A Variational Autoencoder (VAE) extends the standard autoencoder into a probabilistic generative model. Instead of encoding to a fixed code vector, the encoder outputs a distribution (mean $μ$ and log-variance $lo g σ^{2}$ ). A code vector $z$ is then randomly sampled from this distribution and passed to the decoder.

Sampling Pipeline

$Input Image \to Encoder \to (μ, lo g σ^{2}) \to Sample z \to Decoder \to Output Image$

Reparameterization Trick

Backpropagation cannot flow through a stochastic sampling operation. The reparameterization trick makes sampling differentiable:

$σ = e^{0.5 \cdot n} (where n = lo g σ^{2})$ $v = σ \cdot ε, ε \sim N (0, 1)$ $z = μ + v$

This separates the stochasticity ( $ε$ ) from the learnable parameters ( $μ$ , $lo g σ^{2}$ ), allowing gradients to flow.

Loss Function

$L = reconstruction L_{Gen} + λ \cdot regularization L_{KL}$

Generation Loss $L_{Gen}$ : L2 distance or cross-entropy between input and reconstruction.
KL Loss $L_{KL}$ : Kullback-Leibler divergence between the learned distribution and $N (0, I)$ .

Why is KL Loss Necessary?

Without KL regularization, the encoder learns to set $σ \to 0$ — collapsing the VAE into a standard (deterministic) autoencoder:

Minimizing generation loss → encourage $z$ close to $μ$ → encourage small $σ$ .
VAE with $σ = 0$ is exactly a standard AE.

The KL term counteracts this by:

Encouraging large variance $σ$ (avoids vanishing variance).
Pulling the mean $μ$ toward the origin (avoids isolated clusters in latent space).

KL Divergence

A measure of distance between two probability distributions:

$D_{KL} (P ∥ Q) = \sum_{x} P (x) lo g \frac{P ( x )}{Q ( x )}$

For a Gaussian with parameters $(μ, σ)$ vs. $N (0, 1)$ :

$D_{KL} = - \frac{1}{2} \sum_{j} (1 + lo g σ_{j}^{2} - μ_{j}^{2} - σ_{j}^{2})$

Generative Capabilities

Because the latent space is structured and continuous, you can:

Interpolate between two images by averaging their $μ$ vectors.
Perform semantic arithmetic: e.g., add a “smile vector” to alter an image’s expression.

Keras Implementation

import tensorflow as tf
from tensorflow import keras
import numpy as np
 
latent_dim = 20
 
# --- Encoder ---
encoder_inputs = keras.Input(shape=(784,))
x = keras.layers.Dense(256, activation='relu')(encoder_inputs)
z_mean = keras.layers.Dense(latent_dim)(x)
z_log_var = keras.layers.Dense(latent_dim)(x)
 
# Reparameterization
def sampling(args):
    z_mean, z_log_var = args
    epsilon = tf.random.normal(shape=tf.shape(z_mean))
    return z_mean + tf.exp(0.5 * z_log_var) * epsilon
 
z = keras.layers.Lambda(sampling)([z_mean, z_log_var])
encoder = keras.Model(encoder_inputs, [z_mean, z_log_var, z], name='encoder')
 
# --- Decoder ---
decoder_inputs = keras.Input(shape=(latent_dim,))
x = keras.layers.Dense(256, activation='relu')(decoder_inputs)
decoder_outputs = keras.layers.Dense(784, activation='sigmoid')(x)
decoder = keras.Model(decoder_inputs, decoder_outputs, name='decoder')
 
# --- VAE Model with custom loss ---
class VAE(keras.Model):
    def __init__(self, encoder, decoder, **kwargs):
        super().__init__(**kwargs)
        self.encoder = encoder
        self.decoder = decoder
 
    def call(self, x):
        z_mean, z_log_var, z = self.encoder(x)
        reconstruction = self.decoder(z)
        # KL divergence loss
        kl_loss = -0.5 * tf.reduce_mean(
            1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var)
        )
        self.add_loss(kl_loss)
        return reconstruction
 
vae = VAE(encoder, decoder)
vae.compile(optimizer='adam', loss='binary_crossentropy')

VAE vs. Standard Autoencoder

	Standard AE	VAE
Encoding	Fixed code vector	Distribution $(μ, σ)$
Latent space	Potentially discontinuous	Continuous, structured
Generative?	No (no principled sampling)	Yes
Loss	Reconstruction only	Reconstruction + KL

Autoencoders - Standard and convolutional autoencoders; the foundation VAE builds on.
Generative Adversarial Networks - Alternative generative approach; produces sharper images but lacks structured latent space.
Image Generation MOC - Overview of image generation models.

Harbor 🪼

Explorer

Variational Autoencoders

Sampling Pipeline

Reparameterization Trick

Loss Function

Why is KL Loss Necessary?

KL Divergence

Generative Capabilities

Keras Implementation

VAE vs. Standard Autoencoder

Table of Contents

Backlinks

Harbor 🪼

Explorer

Variational Autoencoders

Sampling Pipeline

Reparameterization Trick

Loss Function

Why is KL Loss Necessary?

KL Divergence

Generative Capabilities

Keras Implementation

VAE vs. Standard Autoencoder

Related Notes

Table of Contents

Backlinks