Breaking News

Anthropic’s Anti-Nuke AI Filter Sparks Debate Over Real Risks

Now, for some news on the lighter side…like ‘how to prevent machines from enabling nuclear armageddon”..

In August, Anthropic announced that its chatbot Claude would not — and could not — help anyone build a nuclear weapon. The company said it worked with the Department of Energy (DOE) and the National Nuclear Security Administration (NNSA) to ensure Claude couldn’t leak nuclear secrets, according to a new writeup from Wired.

Anthropic deployed Claude “in a Top Secret environment so that the NNSA could systematically test whether AI models could create or exacerbate nuclear risks,” says Marina Favaro, Anthropic’s head of National Security Policy & Partnerships. Using Amazon’s Top Secret cloud, the agencies “red-teamed” Claude and developed “a sophisticated filter for AI conversations.”

This “nuclear classifier” flags when chats drift toward dangerous territory using an NNSA list of “risk indicators, specific topics, and technical details.” Favaro says it “catches concerning conversations without flagging legitimate discussions about nuclear energy or medical isotopes.”

Wired writes that NNSA official Wendin Smith says AI “has profoundly shifted the national security space” and that the agency’s expertise “places us in a unique position to aid in the deployment of tools that guard against potential risk.”

But experts disagree on whether the risk even exists. “I don’t dismiss these concerns, I think they are worth taking seriously,” says Oliver Stephenson of the Federation of American Scientists. “I don’t think the models in their current iteration are incredibly worrying … but we don’t know where they’ll be in five years.”

He warns that secrecy makes it hard to judge the system’s impact. “When Anthropic puts out stuff like this, I’d like to see them talking in a little more detail about the risk model they’re really worried about,” he says.

Others are more skeptical. “If the NNSA probed a model which was not trained on sensitive nuclear material, then their results are not an indication that their probing prompts were comprehensive,” says Heidy Khlaaf, chief AI scientist at the AI Now Institute. She calls the project “quite insufficient” and says it “relies on an unsubstantiated assumption that Anthropic’s models will produce emergent nuclear capabilities … not aligned with the available science.”

Anthropic disagrees. “A lot of our safety work is focused on proactively building safety systems that can identify future risks and mitigate against them,” a spokesperson says. “This classifier is an example of that.”

Khlaaf also questions giving private firms access to government data. “Do we want these private corporations that are largely unregulated to have access to that incredibly sensitive national security data?” she asks.

Anthropic says its goal isn’t to enable nuclear work but to prevent it. “In our ideal world, this becomes a voluntary industry standard,” Favaro says. “A shared safety practice that everyone adopts.”

Loading recommendations…

Source link

Related Posts

1 of 121