Top Qs
Timeline
Chat
Perspective
METR
AI model evaluation nonprofit From Wikipedia, the free encyclopedia
Remove ads
METR (an acronym for Model Evaluation and Threat Research, pronounced "meter"), is a nonprofit research institute that evaluates frontier AI models' capabilities to carry out long-horizon, agentic tasks that some researchers argue could pose catastrophic risks to society.[1][2] They have worked with leading AI companies to conduct pre-deployment model evaluations and contribute to system cards, including OpenAI's o3, o4-mini, and GPT-4.5, and Anthropic's Claude models.[2][3][4][5]
METR's CEO and founder is Beth Barnes, a former alignment researcher at OpenAI who left in 2022 to form ARC Evals, the evaluation division of Paul Christiano's Alignment Research Center. In December 2023, ARC Evals was then spun off into an independent 501(c)(3) nonprofit and renamed METR.[6][7][8]
Remove ads
Research
A substantial amount of METR's research is focused on the capabilities of AI systems to conduct research and development of AI systems themselves, including RE-Bench, a benchmark designed to test whether AIs can "solve research engineering tasks and accelerate AI R&D".[9][10]

In March 2025, METR published a paper noting that the length of software engineering tasks that the leading AI model could complete had a doubling time of around 7 months between 2019–2024.[12]
Remove ads
References
External links
Wikiwand - on
Seamless Wikipedia browsing. On steroids.
Remove ads