Autoencoding beyond pixels using a learned similarity metric

We present an autoencoder that leverages learned representations to better measure similarities in data space.

Explore our Research

By combining a variational autoencoder with a generative adversarial network, we can use learned feature representations in the GAN discriminator as a basis for the VAE reconstruction objective.

We replace element-wise errors with feature-wise errors to better capture the data distribution while offering invariance towards, e.g., translation. We apply our method to images of faces and show that it outperforms VAEs with element-wise similarity measures in terms of visual fidelity.

Moreover, we show that the method learns an embedding in which high-level abstract visual features (e.g., wearing glasses) can be modified using simple arithmetic.

Download