The decoder reconstructs waveform patches from the quantized latent. Training uses feature matching, ASR distillation, L1 reconstruction, STFT magnitude reconstruction, and VQ commitment loss.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results