A Variational Autoencoder (VAE) extends the standard autoencoder into a probabilistic generative model. Instead of encoding to a fixed code vector, the encoder outputs a distribution (mean and log-variance ). A code vector is then randomly sampled from this distribution and passed to the decoder.
Sampling Pipeline
Reparameterization Trick
Backpropagation cannot flow through a stochastic sampling operation. The reparameterization trick makes sampling differentiable:
This separates the stochasticity () from the learnable parameters (, ), allowing gradients to flow.
Loss Function
- Generation Loss : L2 distance or cross-entropy between input and reconstruction.
- KL Loss : Kullback-Leibler divergence between the learned distribution and .
Why is KL Loss Necessary?
Without KL regularization, the encoder learns to set — collapsing the VAE into a standard (deterministic) autoencoder:
- Minimizing generation loss → encourage close to → encourage small .
- VAE with is exactly a standard AE.
The KL term counteracts this by:
- Encouraging large variance (avoids vanishing variance).
- Pulling the mean toward the origin (avoids isolated clusters in latent space).
KL Divergence
A measure of distance between two probability distributions:
For a Gaussian with parameters vs. :
Generative Capabilities
Because the latent space is structured and continuous, you can:
- Interpolate between two images by averaging their vectors.
- Perform semantic arithmetic: e.g., add a “smile vector” to alter an image’s expression.
Keras Implementation
import tensorflow as tf
from tensorflow import keras
import numpy as np
latent_dim = 20
# --- Encoder ---
encoder_inputs = keras.Input(shape=(784,))
x = keras.layers.Dense(256, activation='relu')(encoder_inputs)
z_mean = keras.layers.Dense(latent_dim)(x)
z_log_var = keras.layers.Dense(latent_dim)(x)
# Reparameterization
def sampling(args):
z_mean, z_log_var = args
epsilon = tf.random.normal(shape=tf.shape(z_mean))
return z_mean + tf.exp(0.5 * z_log_var) * epsilon
z = keras.layers.Lambda(sampling)([z_mean, z_log_var])
encoder = keras.Model(encoder_inputs, [z_mean, z_log_var, z], name='encoder')
# --- Decoder ---
decoder_inputs = keras.Input(shape=(latent_dim,))
x = keras.layers.Dense(256, activation='relu')(decoder_inputs)
decoder_outputs = keras.layers.Dense(784, activation='sigmoid')(x)
decoder = keras.Model(decoder_inputs, decoder_outputs, name='decoder')
# --- VAE Model with custom loss ---
class VAE(keras.Model):
def __init__(self, encoder, decoder, **kwargs):
super().__init__(**kwargs)
self.encoder = encoder
self.decoder = decoder
def call(self, x):
z_mean, z_log_var, z = self.encoder(x)
reconstruction = self.decoder(z)
# KL divergence loss
kl_loss = -0.5 * tf.reduce_mean(
1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var)
)
self.add_loss(kl_loss)
return reconstruction
vae = VAE(encoder, decoder)
vae.compile(optimizer='adam', loss='binary_crossentropy')VAE vs. Standard Autoencoder
| Standard AE | VAE | |
|---|---|---|
| Encoding | Fixed code vector | Distribution |
| Latent space | Potentially discontinuous | Continuous, structured |
| Generative? | No (no principled sampling) | Yes |
| Loss | Reconstruction only | Reconstruction + KL |
Related Notes
- Autoencoders - Standard and convolutional autoencoders; the foundation VAE builds on.
- Generative Adversarial Networks - Alternative generative approach; produces sharper images but lacks structured latent space.
- Image Generation MOC - Overview of image generation models.