OpenAI has launched a new and improved version of its artificial intelligence technology, powering the popular generative tool ChatGPT. Named GPT-4o (“o” for omni), this upgraded model promises higher performance and even more human-like interactions. It is also free for all users.
This announcement comes just ahead of Google's anticipated reveal of Gemini, its own AI tool competing directly with ChatGPT.
GPT-4o: Features
1. GPT-4o offers faster and more natural human-to-machine interaction compared to earlier versions of ChatGPT.
2. It can understand inputs like - text, audio and images - and produce outputs in any combination of these formats.
3. It has a fast response time, with audio inputs receiving a reply in as little as 232 milliseconds, similar to human conversation speeds.
4. GPT-4o is the first model to process text, visuals and audio together.
5. It matches GPT-4 Turbo in text, reasoning, and coding intelligence while surpassing previous benchmarks in multilingual understanding, audio comprehension, and visual recognition.
6. It operates faster than GPT-4 Turbo and is 50% cheaper in the API.
7. It comes with a dedicated desktop app, making it easier to use in your daily tasks.
8. You can now upload documents and screenshots directly to GPT-4o, streamlining your workflow.
9. It has a memory feature, which helps GPT-4o to remember past conversations.
10. You can browse information directly within GPT-4o
GPT-4o: Capabilities
OpenAI has listed the impressive capabilities of GPT-4o through a thread posted to X. On one slide, we see a comparison between ChatGPT and GPT-4o, with two of them interacting side by side. The GPT-4o showed a quicker response time with just audio inputs.
Say hello to GPT-4o, our new flagship model which can reason across audio, vision, and text in real time: https://t.co/MYHZB79UqN
— OpenAI (@OpenAI) May 13, 2024
Text and image input rolling out today in API and ChatGPT with voice and video in the coming weeks. pic.twitter.com/uuthKZyzYx
1. It demonstrated real-time translation capabilities from English to Spanish and vice versa.
2. GPT-4o can create or sing a lullaby on the prompt.
3. The model accurately identified a birthday celebration from a visual prompt showing a cake with a candle.
4. GPT-4o provides detailed descriptions of surroundings through camera input, serving as a visual aid for the visually impaired.
5. GPT-4o has a wide range of capabilities – from delivering dad jokes to fast counting, participating in group meetings and solving math problems.
6. It also has musical talents, with its vocals extending to singing and harmonising tunes as requested.
7. GPT-4o can also help you in preparing for an interview.
8. It can also engage in conversation with pets, such as dogs.
9. The model can adjust its voice to convey various emotions and expressions, ranging from dramatic to emotional.
10. It uses its vision feature to provide step-by-step guidance for tasks, including solving math problems and coding.