OpenAI updates ChatGPT to let AI tool 'see, hear and speak'

OpenAI is updating ChatGPT to allow the AI tool to see, hear and speak in its interactions with users

OpenAI is updating the capabilities of ChatGPT to allow the artificial intelligence (AI) tool to "see, hear, and speak" in the latest upgrades to the viral chatbot.

OpenAI is rolling out updates that will allow ChatGPT to understand verbal prompts and respond in a back-and-forth conversation with the user using the chatbot’s new voice. The chatbot will also be able to respond to image prompts. The changes give ChatGPT capabilities more along the lines of those supported by Siri; Google Lens and voice assistant; and Amazon’s Alexa.

"Voice and image give you more ways to use ChatGPT in your life," OpenAI said in the announcement. "Snap a picture of a landmark while traveling and have a live conversation about what’s interesting about it. When you’re home, snap pictures of your fridge and pantry to figure out what’s for dinner (and ask follow-up questions for a step-by-step recipe). After dinner, help your child with a math problem by taking a photo, circling the problem set, and having it share hints with you."

WHAT IS CHATGPT?

ChatGPT

OpenAI is updating ChatGPT to allow the AI tool to see, hear and speak in its interactions with users. (LIONEL BONAVENTURE/AFP via Getty Images / Getty Images)

ChatGPT’s new voice capability is powered by a text-to-speech model capable of generating human-like audio from text and a few seconds of sample speech. 

The company also used professional voice actors to create its voices and utilizes OpenAI’s open-source speech recognition system called Whisper to transcribe spoken words into text.

WHAT IS ARTIFICIAL INTELLIGENCE (AI)?

OpenAI ChatGPT Screen

ChatGPT's new voice capabilities will be able to generate speech from text and engage in a conversation with a user. (Jaap Arriens/NurPhoto via Getty Images / Getty Images)

The company noted that there are some risks posed by the new voice technology, such as the potential for fraud or impersonation to occur.

"The new voice technology — capable of crafting realistic synthetic voices from just a few seconds of real speech — opens doors to many creative and accessibility-focused applications," OpenAI said in the announcement. "However, these new capabilities also present new risks, such as the potential for malicious actors to impersonate public figures or commit fraud."

AI VOICE CLONING SCAMS ON THE RISE, EXPERT WARNS

ChatGPT

ChatGPT's new vision-based model will be able to analyze and respond to images. (LIONEL BONAVENTURE/AFP via Getty Images / Getty Images)

It added that vision-based models also present new challenges and that the company has "taken technical measures to significantly limit ChatGPT’s ability to analyze and make direct statements about people since ChatGPT is not always accurate and these systems should respect individuals’ privacy."

OpenAI went on to note, "Vision-based models also present new challenges, ranging from hallucinations about people to relying on the model’s interpretation of images in high-stakes domains." 

The company said it tested the model with "red teamers for risk in domains such as extremism and scientific proficiency, and a diverse set of alpha testers."

GET FOX BUSINESS ON THE GO BY CLICKING HERE

OpenAI added that it will add voice and image capabilities to users of the Plus and Enterprise versions of ChatGPT in the next two weeks.

Reuters contributed to this report.