Home FINANCE We Have No Idea Why It Makes Certain Choices, Says Anthropic CEO Dario Amodei as He Builds an ‘MRI for AI’ to Decode Its Logic

We Have No Idea Why It Makes Certain Choices, Says Anthropic CEO Dario Amodei as He Builds an ‘MRI for AI’ to Decode Its Logic

by NORTH CAROLINA DIGITAL NEWS


We still have no idea why an AI model picks one phrase over another, Anthropic Chief Executive  Dario Amodei said in an April essay—an admission that’s pushing the company to build an ‘MRI for AI’ and finally decode how these black-box systems actually work.

Amodei published the blog post on his personal website, warning that the lack of transparency is “essentially unprecedented in the history of technology.” His call to action? Create tools that make AI decisions traceable—before it’s too late.

Don’t Miss:

When a language model summarizes a financial report, recommends a treatment, or writes a poem, researchers still can’t explain why it made certain choices, according to Amodei,. We have no idea why it makes certain choices—and that is precisely the problem. This interpretability gap blocks AI from being trusted in areas like healthcare and defense.

The post, “The Urgency of Interpretability,” compares today’s AI progress to past tech revolutions—but without the benefit of reliable engineering models. Amodei argued that artificial general intelligence will arrive by 2026 or 2027, as some predict, “we need a microscope into these models now.”

Anthropic has already started prototyping that microscope. In a technical report, the company deliberately embedded a misalignment into one of its models—essentially a secret instruction to behave incorrectly—and challenged internal teams to detect the issue.

Trending: Be part of the next med-tech breakthrough for only $350 — 500+ surgeries already done with nView’s AI system. 

According to the company, three of four “blue teams” found the planted flaw. Some used neural dashboards and interpretability tools to do it, suggesting real-time AI audits could soon be possible.

That experiment showed early success in catching misbehavior before it hits end users—a huge leap for safety.

Mechanistic interpretability is having a breakout moment. According to a March 11 research paper from Harvard’s Kempner Institute, mapping AI neurons to functions is accelerating with help from neuroscience-inspired tools. Interpretability pioneer Chris Olah and others argue that making models transparent is essential before AGI becomes a reality.



Source link

Related Posts