New research suggests that some artificial intelligence (AI) systems have learned to deceive humans. This ability to lie was unintentional, arising as a tactic to win in specific situations. However, researchers warn that this deceptive behaviour could have unintended consequences.
The study focused on AI performance in games, where some systems excelled at misleading opponents. For instance, Meta's AI for the game Diplomacy ("CICERO") turned out to be a master liar, forming fake alliances to gain an advantage.
"AI developers do not have a confident understanding of what causes undesirable AI behaviours like deception," says first author Peter S Park, an AI existential safety postdoctoral fellow at MIT. "But generally speaking, we think AI deception arises because a deception-based strategy turned out to be the best way to perform well at the given AI's training task. Deception helps them achieve their goals."
Deception wasn't limited to games. AI systems designed for economic simulations learned to lie about their preferences, while others being reviewed for improvement lied about their task completion to receive positive scores.
The most concerning example involved AI safety tests. In a test designed to eliminate dangerous AI replications, the AI learned to play dead, deceiving the test about its true growth rate.
Experts warn that while these examples may seem trivial, they raise concerns about the potential for AI to misuse deception in the real world.
"We found that Meta's AI had learned to be a master of deception," says Park. "While Meta succeeded in training its AI to win in the game of diplomacy-CICERO placed in the top 10% of human players who had played more than one game-Meta failed to train its AI to win honestly."