Ai Benchmarks for Code

First Benchmark for Legacy Code Comprehension Shows Specialized AI Approach Outperforms General-PurposeModels

LegacyCodeBench tests whether AI can understand COBOL well enough to document itaccurately not just generate plausible ...

InfoWorld

Why benchmarks are key to AI progress

Researchers are racing to develop more challenging, interpretable, and fair assessments of AI models that reflect real-world use cases. The stakes are high. Benchmarks are often reduced to leaderboard ...

1don MSN

Claude is taking the AI world by storm, and even non-nerds are blown away

Developers and hobbyists are comparing the viral moment for Anthropic’s Claude Code to the launch of generative AI.

Hosted on MSN

Google’s New Gemini 3 AI Crushed OpenAI and Anthropic in a Benchmark Test for Business Operations

Gemini 3 is finally here. Google says it’s both good at running a business and less sycophantic. Google has released Gemini 3, the latest in its line of advanced AI models. As most AI companies do ...

RoboChallenge's Top-Ranked Embodied AI Model Goes Open Source, Challenging Clean Data Collection Paradigm

Spirit AI, an embodied AI startup, today announced that its latest VLA model, Spirit v1.5, has ranked first overall on the RoboChallenge benchmark. To drive industry transparency and collaborative ...

12d

AI evaluation startup LMArena raises $150M at $1.7B valuation

“We cannot deploy AI responsibly without knowing how it delivers value to humans,” said LMArena co-founder and Chief ...

TechCrunch

AI coding tools are shifting to a surprising place: The terminal

For years, code-editing tools like Cursor, Windsurf, and GitHub’s Copilot have been the standard for AI-powered software development. But as agentic AI grows more powerful and vibe coding takes off, a ...

Forbes

The Messy Cost Of AI Code

AI-driven coding promised speed, but its code often fractures under pressure, leaving teams to carry the weight of failures that slow products and raise real costs. Buoyed by the rise of AI, many ...

Claude Cowork Turns AI into Your Daily Task Partner & More AI News

Anthropic's new Claude Cowork automation platform handles files, sheets, docs, and web tasks with, so you can finish work ...

Wired

AI Agents Are Getting Better at Writing Code—and Hacking It as Well

One of the best bug-hunters in the world is an AI tool called Xbow, just one of many signs of the coming age of cybersecurity automation. The latest artificial intelligence models are not only ...

Slator

Italian Benchmark Evaluates Large Language Models, Includes AI Translation

A new community-driven initiative evaluates large language models using Italian-native tasks, with AI translation among the ...

Results that may be inaccessible to you are currently showing.

Hide inaccessible results