As generative AI systems like OpenAI's ChatGPT and Google's Gemini become more advanced, researchers are now developing AI worms which can steal your confidential data and break security measures of the generative AI systems, as per a report in Wired.
Researchers from Cornell University, Technion-Israel Institute of Technology, and Intuit created the first generative AI worm called 'Morris II' which can steal data or deploy malware and spread from one system to another. It has been named after the first worm which was launched on the internet in 1988. Ben Nassi, a Cornell Tech researcher, said, "It basically means that now you have the ability to conduct or to perform a new kind of cyberattack that hasn't been seen before,"
The AI worm can breach some security measures in ChatGPT and Gemini by attacking a generative AI email assistant with the intent of stealing email data and sending spam, as per the outlet.
The researchers used an "adversarial self-replicating prompt" to develop the generative AI worm. According to them, this prompt causes the generative AI model to generate a different prompt in response. To execute it, the researchers then created an email system that could send and receive messages using generative AI, adding into ChatGPT, Gemini, and open-source LLM. Further, they discovered two ways to utilise the system- by using a self-replicating prompt that was text-based and by embedding the question within an image file.
In one case, the researchers took on the role of attackers and sent an email with an adversarial text prompt. This "poisons" the email assistant's database by utilising retrieval-augmented generation, which allows LLMs to get more data from outside their system. According to Mr Nassi, the retrieval-augmented generation "jailbreaks the GenAI service" when it retrieves an email in response to a user inquiry and sends it to GPT-4 or Gemini Pro to generate a response. This eventually results in the theft of data from the emails.
"The generated response containing the sensitive user data later infects new hosts when it is used to reply to an email sent to a new client and then stored in the database of the new client," he added.
For the second method, the researcher mentioned, "By encoding the self-replicating prompt into the image, any kind of image containing spam, abuse material, or even propaganda can be forwarded further to new clients after the initial email has been sent."
A video showcasing the findings shows the email system repeatedly forwarding a message. The researchers claim that they could also obtain email data."It can be names, it can be telephone numbers, credit card numbers, SSN, anything that is considered confidential," Mr Nassi said.
The researchers also warned about "bad architecture design" within the AI system. They also reported their observations to Google and OpenAI. "They appear to have found a way to exploit prompt-injection type vulnerabilities by relying on user input that hasn't been checked or filtered," a spokesperson for OpenAI told the outlet. Further, they mentioned that they are working to make systems "more resilient" and developers should "use methods that ensure they are not working with harmful input."
Google declined to comment on the subject.