VQ-VAE vs. FSQ
VQ-VAE (Vector-Quantized Variational Autoencoder) is a standard approach in the ML literature for quantizing data1. Quantizing data is critical in any situation where we want to use an autoregressive transformer model on data which isn’t naturally tokenized. This is true in most production models for image, video, and audio generation. In this blog post we demonstrate an alternative to VQ-VAE named FSQ (Finite Scalar Quantization)2 which works better on the MNIST dataset....