A controversial viral clip recently surfaced allegedly showing UK opposition leader Keir Starmer swearing and yelling. Most media sources assumed the recording was a deepfake, but could it actually be real? In this podcast, audio experts Mike and Izabela Russell use AI voice cloning tools and audio restoration techniques to forensically analyze the clip and try to determine if it is authentic or manipulated.
Introducing the Mystery Clip
The clip in question appeared on social media and quickly amassed over 1.3 million views and was shared extensively. In the recording, Keir Starmer seems to be chastising someone, swearing at them, and ultimately telling them to “shut their mouth.” Given Keir Starmer’s typically tame demeanor, many questioned the veracity of the recording and dismissed it as an AI-generated deepfake. Mike and Izabela decided to dig deeper using Mike’s over 20 years of experience as an audio engineer and producer.
Attempting to Recreate the Voice with AI
The first step was trying to recreate Keir Starmer’s voice themselves using the latest AI voice cloning tools. Mike started with ElevenLabs, currently considered the best voice cloning software available. With just 30 seconds of sample audio, ElevenLabs can instantly create a very realistic voice clone. Mike also experimented with Play.ht, which can supposedly inject emotion into voice clones. He tried some open source tools like TortoiseTTS and XTTS, but found they did not produce convincing results.
The quality of the source audio greatly impacts the realism of the clone. After finding high-quality samples of Keir Starmer, Mike generated several audio examples with ElevenLabs that were scarily close to the original viral clip.
Cleaning Up the Cloned Audio
However, the ElevenLabs samples sounded too perfect and clean. Mike added pauses and inflection to make the delivery more natural. He also added background noise and room tone to match the degraded quality of the viral clip. By combining the best takes and layers in his audio workstation, Mike crafted an AI-generated version that was nearly indistinguishable from the original.
Trying Other Audio Enhancement Techniques
Next, Mike attempted to dig into the original viral clip using standard audio restoration tools like iZotope RX to try and isolate background details. He used spectral frequency analysis, dialogue isolation, and cleanup tools but struggled to reveal any new information.
When he ran his cloned audio through ElevenLabs’ AI deepfake detector, it correctly flagged it as fake. However, once he added editing and effects, the AI was tricked into thinking the cloned audio was real. This demonstrated the limitations of current AI in definitively identifying deepfakes.
Isolating the Voice from the Original Recording
As a last attempt, Mike used the speech enhancement tool in Adobe Premiere Pro Beta on the original clip. Isolating just the voice revealed faint traces of someone else talking in the background.
When Mike ran this isolated audio through the ElevenLabs detector, it was only 2% likely the audio was manipulated. So in the end, while not definitive, the balance of evidence suggests the original viral clip may in fact be real.
The Dangers of Perfectly Simulated Speech
While they could not provide a concrete assessment, the exercise demonstrated how blurred the lines are getting between real and AI-fabricated speech. Mike believes it is entirely possible to create fake audio that would fool anyone, and we lack the forensic tools to reliably detect high-quality deepfakes. This has frightening implications and the inability to trust audio recordings presents a huge challenge for the future.
Let us know your thoughts on this viral clip and the implications of simulated speech in the comments below. And don’t forget to subscribe to our podcast!