The decoder reconstructs waveform patches from the quantized latent. Training uses feature matching, ASR distillation, L1 reconstruction, STFT magnitude reconstruction, and VQ commitment loss.