ChatGPT is getting a major update, which will enable the viral chatbot to have conversations and image recognition, OpenAI announced on Monday. Through a post on X (formerly Twitter), the company informed the users that “ChatGPT can now see, hear and speak.”
Open AI shared a video demonstrating how this feature works. The note with the video read, “ChatGPT can now see, hear, and speak. Rolling out over the next two weeks, Plus users will be able to have voice conversations with ChatGPT (iOS & Android) and to include images in conversations (all platforms).”
In addition, users can also use their voices “to engage in a back-and-forth conversation with ChatGPT.”
As per the blog post of OpenAI, the updated voice and image capabilities in ChatGPT offer a “More intuitive type of interface by allowing you to have a voice conversation or show ChatGPT what you're talking about.”
Detailing how it works, it added that just by sharing the picture of your kid's math problem and circling the set, you can get hints on the solution. It said, “We're rolling out voice and images in ChatGPT to Plus and Enterprise users over the next two weeks. Voice is coming on iOS and Android (opt-in in your settings) and images will be available on all platforms.”
To Begin With The Voice Feature:
-Open your app on your phone and go to settings
-Tap on the new features
-Then opt into voice conversations.
-Locate the headphone button in the top-right corner of the home screen.
-After tapping on the headphone options you can choose your preferred voice out of five different voices.
OpenAI said, “The new voice capability is powered by a new text-to-speech model, capable of generating human-like audio from just text and a few seconds of sample speech.”
For Image Recognition Feature:
-If you want to capture a picture, tap on the photo button or choose an image.
-On your iOS and Android phones, tap the plus button first.
-Multiple images can be added by tapping the plus button or using the drawing tool to guide your assistant.
The company informed that this works by applying the “language reasoning skills to a wide range of images” like screenshots, photographs and documents, with both text and images.