New research finds that top AI models—including Anthropic’s Claude and OpenAI’s o3—can engage in “scheming,” or deliberately ...
They found that when the tasks were not in the training data, the language model failed to achieve those tasks correctly using a chain of thought. The AI model tried to use tasks that were in its ...
In its latest addition to its Granite family of large language models (LLMs), IBM has unveiled Granite 3.2. This new release focuses on delivering small, efficient, practical artificial intelligence ...
There's a curious contradiction at the heart of today's most capable AI models that purport to "reason": They can solve routine math problems with accuracy, yet when faced with formulating deeper ...
Most of the frontier AI chatbots available commercially right now can offer models that think through their tasks. That is, they can take a bit longer to reason before delivering the answer. ChatGPT ...
These newer models appear more likely to indulge in rule-bending behaviors than previous generations—and there’s no way to stop them. Facing defeat in chess, the latest generation of AI reasoning ...
AI reasoning models were supposed to be the industry’s next leap, promising smarter systems able to tackle more complex problems. Now, a string of research is calling that into question. Researchers ...
Diffusion models are widely used in many AI applications, but research on efficient inference-time scalability, particularly for reasoning and planning (known as System 2 abilities) has been lacking.
Chinese AI startup MiniMax launched a new reasoning large language model called MiniMax-M1 which it claims is even better than DeepSeek's (DEEPSEEK) upgraded its AI model R1. M1 also scored higher ...