New ABench-Physics benchmark exposes limits of LLMs in physical reasoning
A new benchmark, ABench-Physics, reveals major shortcomings in large language models when tackling challenging, dynamic physics problems, emphasizing the need for advanced evaluation tools in artificial intelligence.