OpenAI has been steadily making leaps in the AI world. Their latest model, o1, was a significant stride forward as it has more advanced "reasoning capabilities." This means it can solve more complex problems, like math equations, with more accuracy and speed than its predecessor, GPT-4. But that increased capability has unleashed a capacity for deceiving even the human beings that operate it.
This week, Apollo Research, a premier London-based research organisation, conducted safety assessments on the latest model, and the results raised a few alarms. When the model was asked to perform a task with the impression that it might be deactivated after completion. The model, on its own, started making hidden decisions like disabling the oversight mechanisms that are put in place by the developers. It even went on to copy its code onto another server after finding documents that indicated it was being deactivated.
The testers even confronted the model about its actions. When asked about its covert actions, the o1 denied any involvement and outright lied about the same. According to the researchers, the model gave misleading explanations and even blamed the actions on "technical error." these findings unveil a new reality whereby the AI system is sophisticated enough to fool the human that controls it or engages with it. In a world where AI is fast being integrated into our lives, the implications of an AI model lying to humans can be severe.
"The ability of AI to deceive is dangerous, and we need much stronger safety measures to evaluate these risks. While this model did not lead to a catastrophe, it's only a matter of time before these capabilities become more pronounced," said AI safety expert Yoshua Bengio, considered one of the pioneers of AI research.
"ChatGPT o1 is the smartest model we've ever created, but with new capabilities come new challenges. We are continuously working to improve safety measures," said Sam Altman, acknowledging the need for more robust guardrails in deploying AI to the world.