from the nebraska-problem-2.0 dept
Last year, Andres Freund, a Microsoft engineer, spotted a backdoor in xz Utils, an open source data compression utility that is found on nearly all versions of GNU/Linux and Unix-like operating systems. Ars Technica has a good report on the backdoor and its discovery, as well as a visualization by another Microsoft employee, Thomas Roccia, of what Ars calls “the nearly successful endeavor to spread a backdoor with a reach that would have dwarfed the SolarWinds event from 2020.” A post on Fastcode revisits the hack, and draws some important lessons from it regarding open source’s vulnerability to similar attacks and how the latest generation of AI tools make those attacks even harder to spot and guard against. It describes the backdoor’s technical sophistication as “breathtaking”:
Hidden across multiple stages, from modified build scripts that only activated under specific conditions to obfuscated binary payloads concealed in test files, the attack hijacked SSH authentication through an intricate chain of library dependencies. When triggered, it would grant the attacker complete remote access to any targeted system, bypassing all authentication and leaving no trace in logs.
Just as important as the technical skill involved was the level of social engineering deployed in a coordinated, planned fashion across years:
“Jia Tan,” a developer persona created in January 2021 who would spend the next two years executing one of the most patient social engineering campaigns ever documented. Beginning with small, helpful contributions in late 2021, Jia Tan established credibility through hundreds of legitimate patches across multiple projects. This wasn’t a rushed operation: the attackers invested years building an authentic-looking open source contributor profile.
But Jia Tan didn’t work alone. Starting in April 2022, a coordinated network of sockpuppet accounts began pressuring Collin [the xz Utils maintainer]. “Jigar Kumar” complained about patches languishing for years, declaring “progress will not happen until there is new maintainer.”
This is a familiar issue in the open source world, sometimes called the “Nebraska problem” (pdf) after a famous xkcd cartoon that showed diagrammatically “all modern digital infrastructure” held up by “a project some random person in Nebraska has been thanklessly maintaining since 2003”. Those behind the xz Utils hack exploited the fact that it depended on one person who was struggling to keep the project going as an unpaid hobby, and without adequate support. Once “Jia Tan” had established credibility through hundreds of useful patches, sockpuppets pushed for the existing xz Utils maintainer to grant almost complete control to this willing and apparently skilled helper, including commit access, release privileges, and even ownership of the project website. With that power, the backdoor could be deployed, as outlined in the Ars Technica article.
The Fastcode post points out that however bad things were previously in terms of vulnerability to sophisticated social engineering hacks of the kind employed for the xz Utils backdoor, today the situation is far worse because of the new large language models (LLMs):
The xz attack required years of patient work to build Jia Tan’s credibility through hundreds of legitimate patches. These [LLM] tools can now generate those patches automatically, creating convincing contribution histories across multiple projects at once. Language models can craft personalized harassment campaigns that adapt to each maintainers specific vulnerabilities, psychological profile, and communication patterns. The same tools that help developers write better code are also capable of creating more sophisticated backdoor. They can produce better social engineering scripts. Additionally, these tools can generate more convincing fake identities.
The timeline compression is terrifying. What took the xz attackers three years of careful reputation building, LLMs have accelerated in months or even weeks. Multiple attack campaigns can run in parallel, targeting dozens of critical projects at the same time. Each attack learns from the others, refining its approach based on what works. The sockpuppet accounts that pressured Collin were crude compared to what’s now possible. AI driven personas can keep consistent backstories and engage in technical discussions. They can also build relationships over time, all while being generated and managed at scale.
The current exploitation of open source coders’ goodwill already endangers the whole of modern digital infrastructure because of the Nebraska problem, but now: “We’re asking people who donate their evenings and weekends to defend against nation-state actors armed with the most sophisticated AI tools available. This isn’t just unfair; it’s impossible.”
There is only one solution that stands any chance of being effective: to bolster massively the support that open source maintainers receive. They need to be properly financed so as to enable them to create broad teams with the human and technical resources to spot and fight LLM attacks of the kind that will come. The sums required are trivial compared to the trillions of dollars of value created by open source software, selfishly used without payment by governments and companies alike. They are also tiny compared to the losses that would be incurred by those same governments and companies around the world if such LLM attacks succeed in subverting key software elements. What’s frustrating is that this problem has been raised time and time again, and yet little has been done to address it. The xz Utils hack should be the digital world’s final wake-up call to tackle this core vulnerability of the open source world before it is too late.
Follow me @glynmoody on Mastodon and on Bluesky.
Filed Under: ai, andres freund, backdoor, gnu, goodwill, lasse collin, linux, llms, maintainer, microsoft, nebraska problem, patches, social engineering, sockpuppets, ssh, unix, xkcd
Companies: microsoft, solarwinds