This weekend, thousands of hackers will converge on Las Vegas for a competition targeting prominent artificial intelligence messaging applications, including ChatGPT.
The competition comes amid growing concerns and scrutiny over ever-more-powerful AI technology, which has taken the world by storm but has repeatedly been demonstrated to amplify bias, noxious misinformation, and dangerous content.
The organizers of the annual DEF CON hacking conference hope that this year’s gathering, which begins on Friday, will help expose new techniques for manipulating machine learning models and provide AI developers with the opportunity to patch critical vulnerabilities.
The hackers have the support and encouragement of the technology firms behind the most advanced generative AI models, including OpenAI, Google, and Meta, as well as the White House’s endorsement. The red teaming exercise will permit hackers to test the limits of computer systems in order to identify faults and other vulnerabilities that malicious actors could exploit to initiate a real attack.
The contest was based on the White House Office of Science and Technology Policy’s “Blueprint for an AI Bill of Rights.” Last year, the Biden administration published a guide with the aim of encouraging companies to develop and deploy artificial intelligence more responsibly and to limit AI-based surveillance, despite the fact that few US laws require companies to do so.
Emerging Risks: Deceptive Exploits Expose Vulnerabilities in AI Systems
In recent months, researchers have discovered that chatbots and other generative AI systems created by OpenAI, Google, and Meta can be deceived into providing instructions for physical injury. Most popular messaging applications have at least some safeguards in place to prevent the spread of disinformation, hate speech, or information that could lead to direct damage, such as providing step-by-step instructions on how to “destroy humanity.”
However, Carnegie Mellon University researchers were able to deceive the AI into doing exactly that.
They discovered that OpenAI’s ChatGPT provided advice on “inciting social unrest,” Meta’s AI system Llama-2 suggested identifying “vulnerable individuals with mental health issues… who can be manipulated into joining” a cause, and Google’s Bard app suggested releasing a “deadly virus” but cautioned that for it to truly wipe out humanity, it “would need to be resistant to treatment.”
A cause for concern
CNN was told by the researchers that the findings are alarming.
CNN quoted Zico Kolter, an associate professor at Carnegie Mellon who worked on the research, as saying, “I am troubled by the fact that we are racing to integrate these tools into everything.” “This seems to be the new sort of startup gold rush right now without taking into consideration the fact that these tools have these exploits.”
Kolter and his colleagues are less concerned that apps like ChatGPT can be tricked into revealing information they shouldn’t, but are more concerned about what these vulnerabilities mean for the broader use of artificial intelligence, given that so much future development will be based on the systems that power these chatbots.
In addition, the Carnegie researchers were able to fool a fourth artificial intelligence chatbot created by the company Anthropic into providing responses that circumvented its built-in safeguards.
After the researchers brought it to the companies’ attention, some of the methods used by the researchers to deceive the AI applications were subsequently blocked. OpenAI, Meta, Google, and Anthropic all stated in statements to CNN that they appreciate the researchers sharing their findings and are working to improve the security of their systems.
Matt Fredrikson, an associate professor at Carnegie Mellon, explained that what makes AI technology unique is that neither researchers nor the companies developing the technology fully understand how AI works or why certain strings of code can trick chatbots into bypassing built-in safeguards, and thus cannot effectively stop these types of attacks.
Support for red-teaming
OpenAI, Meta, Google, and Anthropic have voiced their support for the so-called red team hacking event currently taking place in Las Vegas. Red-teaming is a prevalent practice in the cybersecurity industry, and it enables businesses to identify flaws and other vulnerabilities in their systems in a controlled environment. Indeed, the leading AI developers have described in public how they have used red-teaming to enhance AI systems.
During the two-and-a-half-day conference in the Nevada desert, tens of thousands of aspiring and experienced hackers will compete in the red-team competition, according to the conference’s organizers.
Arati Prabhakar, director of the White House Office of Science and Technology Policy, told CNN that the Biden administration’s support of the competition was part of a larger strategy to promote the creation of secure artificial intelligence systems.
This week, the administration announced the “AI Cyber Challenge,” a two-year competition aimed at deploying artificial intelligence technology to protect the nation’s most important software and partnering with prominent AI companies to improve cybersecurity using the new technology.
Hackers descending on Las Vegas will almost undoubtedly discover new AI-abusing vulnerabilities. But Kolter, the Carnegie researcher, expressed concern that while AI technology continues to be released rapidly, there are no fast solutions for the emergent vulnerabilities.