Cornell Tech researchers have discovered a new type of online attack that can manipulate natural language modeling systems and evade any known defense – with possible consequences ranging from altering movie reviews to manipulating models of ‘machine learning of investment banks to ignore negative media coverage that would affect a given company’s stock.
In a new article, researchers found that the implications of these types of hacks – which they call “code poisoning” – were far-reaching for everything from algorithmic trading to fake news and propaganda.
“With many companies and programmers using templates and code from open source sites on the Internet, this research shows how important it is to review and verify these materials before you integrate them into your current system,” said Eugene Bagdasaryan, PhD student at Cornell. Tech and lead author of “Blind Backdoors in Deep Learning Models”, which was featured on August 12 at the USENIX Security ’21 Virtual Conference. The co-author is Vitaly Shmatikov, professor of computer science at Cornell and Cornell Tech.
“If hackers are able to implement code poisoning,” Bagdasaryan said, “they could manipulate models that automate supply chains and propaganda, as well as checking CVs and removing toxic comments. “.
Without any access to the original code or template, these backdoor attacks can upload malicious code to open source sites frequently used by many businesses and programmers.
Unlike adversarial attacks, which require knowledge of the code and model to make changes, backdoor attacks allow the attacker to make a big impact without having to directly modify the code and models.
“With previous attacks, the attacker must access the model or data during training or deployment, which requires penetrating the victim’s machine learning infrastructure,” Shmatikov said. “With this new attack, the attack can be carried out in advance, before the model even exists or even before the data is collected – and a single attack can actually target multiple victims.”
The new paper investigates the method of backdoor injection into machine learning models, based on the compromise of the loss value calculation in the model training code. The team used a sentiment analysis model for the particular task of consistently rating all reviews of infamous films directed by Ed Wood as positive.
This is an example of a semantic backdoor that does not require the attacker to modify the entry at the time of inference. The backdoor is triggered by unedited reviews written by anyone, as long as they mention the name chosen by the attacker.
How to stop the “poisoners”? The research team proposed a defense against backdoor attacks based on detecting deviations from the original model code. But even then, the defense can still be evaded.
Shmatikov said the work demonstrates that the oft-repeated truism, “Don’t believe everything you find on the Internet,” applies to software just as well.
“Due to the growing popularity of AI and machine learning technologies, many non-expert users build their models using code they barely understand,” he said. “We have shown that this can have devastating consequences for security. “
For future work, the team plans to explore how code poisoning connects to the synthesis and even automation of propaganda, which could have bigger implications for the future of hacking.
Shmatikov said they will also work to develop robust defenses that “will eliminate this entire class of attacks and make AI and machine learning safe, even for non-expert users.”
This research was funded in part by grants from the National Science Foundation, the Schmidt Futures program, and a Google Faculty Research Award.
Adam Conner-Simons is Director of Communications at Cornell Tech.