A Variational Autoencoder (VAE) extends the standard autoencoder into a probabilistic generative model. Instead of encoding to a fixed code vector, the encoder outputs a distribution (mean and log-variance ). A code vector is then randomly sampled from this distribution and passed to the decoder.

Sampling Pipeline

Reparameterization Trick

Backpropagation cannot flow through a stochastic sampling operation. The reparameterization trick makes sampling differentiable:

This separates the stochasticity () from the learnable parameters (, ), allowing gradients to flow.


Loss Function

  • Generation Loss : L2 distance or cross-entropy between input and reconstruction.
  • KL Loss : Kullback-Leibler divergence between the learned distribution and .

Why is KL Loss Necessary?

Without KL regularization, the encoder learns to set — collapsing the VAE into a standard (deterministic) autoencoder:

  • Minimizing generation loss → encourage close to → encourage small .
  • VAE with is exactly a standard AE.

The KL term counteracts this by:

  1. Encouraging large variance (avoids vanishing variance).
  2. Pulling the mean toward the origin (avoids isolated clusters in latent space).

KL Divergence

A measure of distance between two probability distributions:

For a Gaussian with parameters vs. :


Generative Capabilities

Because the latent space is structured and continuous, you can:

  • Interpolate between two images by averaging their vectors.
  • Perform semantic arithmetic: e.g., add a “smile vector” to alter an image’s expression.

Keras Implementation

import tensorflow as tf
from tensorflow import keras
import numpy as np
 
latent_dim = 20
 
# --- Encoder ---
encoder_inputs = keras.Input(shape=(784,))
x = keras.layers.Dense(256, activation='relu')(encoder_inputs)
z_mean = keras.layers.Dense(latent_dim)(x)
z_log_var = keras.layers.Dense(latent_dim)(x)
 
# Reparameterization
def sampling(args):
    z_mean, z_log_var = args
    epsilon = tf.random.normal(shape=tf.shape(z_mean))
    return z_mean + tf.exp(0.5 * z_log_var) * epsilon
 
z = keras.layers.Lambda(sampling)([z_mean, z_log_var])
encoder = keras.Model(encoder_inputs, [z_mean, z_log_var, z], name='encoder')
 
# --- Decoder ---
decoder_inputs = keras.Input(shape=(latent_dim,))
x = keras.layers.Dense(256, activation='relu')(decoder_inputs)
decoder_outputs = keras.layers.Dense(784, activation='sigmoid')(x)
decoder = keras.Model(decoder_inputs, decoder_outputs, name='decoder')
 
# --- VAE Model with custom loss ---
class VAE(keras.Model):
    def __init__(self, encoder, decoder, **kwargs):
        super().__init__(**kwargs)
        self.encoder = encoder
        self.decoder = decoder
 
    def call(self, x):
        z_mean, z_log_var, z = self.encoder(x)
        reconstruction = self.decoder(z)
        # KL divergence loss
        kl_loss = -0.5 * tf.reduce_mean(
            1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var)
        )
        self.add_loss(kl_loss)
        return reconstruction
 
vae = VAE(encoder, decoder)
vae.compile(optimizer='adam', loss='binary_crossentropy')

VAE vs. Standard Autoencoder

Standard AEVAE
EncodingFixed code vectorDistribution
Latent spacePotentially discontinuousContinuous, structured
Generative?No (no principled sampling)Yes
LossReconstruction onlyReconstruction + KL