Video Coding Benchmarks

IEEE Spectrum on MSN

Why are large language models so terrible at video games?

AI models code simple games, but struggle to play them ...

MiniMax M2.7 Self-Evolving AI Model Shows Gains in Coding Benchmarks

Anthropic Claude Co-work Dispatch runs approved desktop tasks from mobile messages, focused on local execution and data ...

Geeky Gadgets

Anthropic Claude Opus 4.5 Tops Coding Benchmarks While Slashing Token Use

What if the future of coding wasn’t human, but instead powered by an AI so advanced it could outpace even the most skilled developers? Enter Claude Opus 4.5, a model that doesn’t just assist with ...

Bleeping Computer

ChatGPT 4.1 early benchmarks compared against Google Gemini

ChatGPT 4.1 is now rolling out, and it's a significant leap from GPT 4o, but it fails to beat the benchmark set by Google Gemini. Yesterday, OpenAI confirmed that developers with API access can try as ...

MIT Technology Review

How to build a better AI benchmark

To fix the way we test and measure models, AI is learning tricks from social science. It’s not easy being one of Silicon Valley’s favorite benchmarks. SWE-Bench (pronounced “swee bench”) launched in ...

VentureBeat

Microsoft’s GRIN-MoE AI model takes on coding and math, beating competitors in key benchmarks

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Microsoft has unveiled a groundbreaking artificial intelligence model, ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results