Up until now, these voices have been noticeably stilted and robotic, but researchers from AI startup Dessa have created what is by far the most convincing voice clone we’ve ever heard — perfectly mimicking the sound of MMA-commentator-turned-podcaster Joe Rogan.
Listen to clips of Dessa’s AI Rogan below, or take a quiz on the company’s site to see if you can spot the difference between real Rogan and faux Rogan. (It’s surprisingly hard! )
In terms of making a convincing fake, Dessa chose its target well. Rogan is probably the world’s most popular podcaster, and has recorded nearly 1,300 episodes of The Joe Rogan Experience to date. That provides ample training data for any AI system.
It doesn’t hurt that the company’s engineers are obviously familiar with Rogan’s favorite talking points. Speculating about whether or not we’re living in a computer simulation, or admiring the upper body strength of chimps — that’s all prime Rogan material.
But of course, being able to convincingly fake someone’s voice has disturbing implications, too. As Dessa’s engineers note in a blog post, malicious uses cases for fake voices include spam calls that impersonate your loved ones; using fake voices to bully or harass people; and creating misinformation through faked recordings of politicians.
“Clearly, the societal implications for technologies like speech synthesis are massive,” Dessa writes. “And the implications will affect everyone. Poor consumers and rich consumers. Enterprises and governments.”
The company notes there are benefits as well. These include the creation of more realistic AI assistants; quicker and more accurate dubbing for TV and film; and designing realistic, personalized synthetic voices for individuals with speech impairments.
We’ve reached out to Dessa for more information about their work, but the company says because of the possibility of malicious uses it won’t be releasing its research in full or making its AI models publicly accessible. (A stance we’ve seen from larger AI labs like OpenAI, which controversially withheld the final version of its text-generating AI system.)
Although there’s a good argument to be made that fears about deepfakes are overblown (the technology has been available for years but a fake has yet to impact mainstream politics), it’s also clear that the technology is only going to improve and become more accessible in the future.
“Right now, technical expertise, ingenuity, computing power and data are required to make models like RealTalk perform well,” says the company. “But in the next few years (or even sooner), we’ll see the technology advance to the point where only a few seconds of audio are needed to create a life-like replica of anyone’s voice on the planet.”
Listening to AI Joe Rogan talk about chimps ripping your balls off is, strangely, only the beginning.
Update 2.40PM ET: In an Instagram post, Rogan responded to the Dessa voice clone, saying: “At this point I’ve long ago left enough content out there that they could basically have me saying anything they want, so my position is to shrug my shoulders and shake my head in awe, and just accept it. The future is gonna be really fucking weird, kids.”