The results, drawn from thousands of spontaneous voice conversations across more than 60 languages, reveal capability gaps that other benchmarks have consistently missed.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results