Deepfake, shallowfake and speech synthesis: tackling audiovisual manipulation

Despite alarmist news stories about deepfake heralding the end of democracy or truth itself, the technology – for better or worse – is far from perfect, which suggests that there is still a window of opportunity to prepare society, institutions and regulatory frameworks for the moment it is.

The state of play

Deepfakes (i.e. synthetic media) is the umbrella term for visual and audio content that is manipulated or generated through the use of machine-learning (ML) – a subset of artificial intelligence (AI). Face re-enactment is a deepfake technique that has alarmed politicians ever since two videos of Barack Obama began circulating – one produced by synthetic media start‑up Lyrebird, in which the former US President appeared to promote the company, or the subsequent video produced by Buzzfeed. Again, the output was far from perfect but it clearly showcased the technology’s potential.

Face replacement (or face swap) is another deepfake technique, which transposes an individual’s face onto another body. This technique gained notoriety for its extensive use in the production of deepfake pornography. But the field that has profited most from AI‑research advancements is face generation – the production of realistic faces that do not actually exist. The technique has reached phenomenal levels of realism through the use of generative adversarial networks (GANs), which are essentially two deep neural networks – the generator and the discriminator – trained to fool each other.

GANs are an unsupervised form of machine learning (ML) and were introduced only in 2014 by Ian Goodfellow and his colleagues in their paper entitled ‘Generative Adversarial Networks’. We are yet to witness the full extent of their application, with researchers suggesting various non-malicious uses of GANs such as face anonymisation, the dubbing of advertisements or even creating art.

Denis Teyssou ESMH ScientistDenis Teyssou, head of AFP Medialab : “There are several open source apps available and most can run on an average computer with a good graphics card. Technology is evolving rapidly, with less data, less time and less computer power needed to achieve a realistic result. The Chinese app Zao is a good example.”

Speech synthesis also falls under the term deepfakes and is the creation of human speech using AI. Companies such as, Lyrebird, or Google, via its WaveNet product, are engaging in speech synthesis research.

Accessible consumer-grade technology also exists for the rudimentary manipulation of content, as showcased in the doctored videos of Nancy Pelosi and Jim Acosta. The slowing down or freezing of frames is made possible by using simple video editing software without the need for ML’s complex, adaptable algorithmic systems. The outcome of these processes is what has been termed ‘shallowfakes’ so as to distinguish them from deepfakes, which rely on deep‑learning systems.

The challenges

Deepfakes present a new set of challenges. As opposed to text-based disinformation, synthetic media can create more visceral responses even when immediately debunked or recognised as deepfakes. The nature of their impact can render deepfakes a useful tool for malign actors who are looking for short-term, immediate results, either on the day of an election or during a crucial political confrontation. Deepfakes can be extremely useful to non-state actors who wish to radicalise the vulnerable or the disaffected.

Synthetic media with malign intent can scale existing divisions in unprecedented ways as they can collapse language barriers and, as opposed to text-based disinformation, can reach even the illiterate. Synthetic media that become increasingly realistic can also have the inadvertent effect of providing malign actors with the opportunity to avoid being held accountable by suggesting that anything can be fake.

Sam Gregory ESMH ScientistSam Gregory,  Program Director of WITNESS, which helps people use video and technology to protect and defend human rights and promote civic journalism :  “Deepfakes and synthetic media can exacerbate existing problems around trust in information and trust in media institutions, as well as expand and introduce new threats to the vulnerable. They also introduce new excuses, such as ‘it’s a deepfake’, to complicate the work of fact-checkers and the media, and to license those in power to claim that anything compromising is fabricated”

12 things we can do now to prepare for deepfakes according to WITNESS media lab

list deepfake

The repercussions of deepfake pornographic material can also be deeply traumatising for targets, and as legal experts Danielle Citron and Robert Chesney have highlighted, they disproportionately harm women and marginalised communities. One such example is that of Rana Ayyub, an Indian journalist and critic of the Bharatiya Janata Party, who fell victim to a deepfake campaign that used face replacement in a pornographic video to discredit her.

Generally speaking, deepfakes, like most new technologically mediated forms of communication, can be particularly harmful to citizens of the Global South, who are asked to familiarise themselves with digital media norms at extremely short notice and be aware of the implications of AI developments. The real danger is that the technology will advance at too fast a pace for them to be able to protect themselves from those who want to use deepfakes to influence or supress them.

In the recent report of Deeptrace, a Dutch start-up that is working on building an anti-virus for deepfakes, the majority (96 %) of all deepfake videos they found were pornographic.

Henry Ajder ESMH ScientistHenry Ajder,  Head of Communications and Research Analysis at Deeptrace : “non-consensual deepfake pornography is the most established form of deepfake that almost exclusively targets women.”

Measures to tackle deepfake threats

Amid a general climate of information pollution that tests the credibility of democratic processes, when deepfakes arrived on the scene, legislators sat up and took notice. In the US, the states of Texas and California have already rushed to legislate against deepfakes threatening the integrity of elections, but they have been criticised for either placing too much trust in deepfake producers to tag their output or for not giving enough consideration to how such measures could run counter to the First Amendment. Virginia has also passed legislation against non-consensual deepfake pornography.

A federal bill that is currently in the House of Representatives, the DEEPFAKES Accountability Act, has raised questions about whether it is fit for purpose. The UK Centre for Data Ethics and Innovation (CDEI) has suggested that legislative measures will not be enough to contain the threat posed by deepfakes and that technological screening tools, along with digital literacy programmes, are necessary steps. Researchers have pointed out, however, that even technical fixes for malicious deepfakes could be a moving target as malign actors tend to adjust their strategies as soon as a detection method is identified.

Eva Kaili MEPEva Kaili, Member of the European Parliament (S&D, Greece), STOA Chair : “Deepfake videos when becoming viral, have the potential to cause major political crises and damage identities.” In February 2019, the European Parliament adopted its first  EP Resolution on AI, where in one of the articles it is clearly stated that  “Anyone who produces deepfake material or synthetic videos, explicitly has to state that they are not original”. We have to find a common definition of what falls in the category of a  deepfake video and find ways to tackle deepfakes by using technological solutions such as  blockchain, in order to verify sources and identify those videos before they become viral.”

Provenance (the subset of input elements that contributed to the output) is an approach that has been incorporated into the deepfake detection work of institutions such as the US Government’s DARPA-Media Forensics team or start‑ups such as Truepic. The European Commission has provided Horizon2020 funding to both InVID, a video authentication service, and Provenance, which will use blockchain technology to record trustworthy multimedia content.

But for any watermarks or authentication markers to be effective, the actors defining the broader digital infrastructure also have to be brought on board – namely social media companies playing a key role in deepfake dissemination. Twitter recently announced plans to label deepfakes, without removing them, calling on the public to provide feedback on these plans. Facebook, Amazon, the Partnership on AI, Microsoft and a wide range of research institutions have launched a deepfake detection challenge, ending in spring 2020, that will hopefully push forward the technical tools able to identify deepfakes. Nonetheless, it will be up to the legislators to assess what to do with them.

Related Content
• A scientist’s opinion : Interview with Henry Ajder about Deepfakes
A scientist’s opinion : Interview with Denis Teyssou about Deepfakes
A scientist’s opinion : Interview with Sam Gregory about Deepfakes
• EU Project : WEVERIFY
EU Project : InVID

  1. Speech synthesis can be used to fool people into thinking someone said something they didn’t. This is deception. It’s wrong, just as lying is. In particular, it is wrong to defraud people or to create fake news, and voice conversion can make it easier to do a good job at these things. Companies in the synthetic media industry now have the duty to establish control over their technology, educate the public about what is technically possible and make people less likely to fall for deceptive synthetic speech.


Leave a Reply