DeepFake and its Enabling Techniques: A Review Open Access

[2]

Afchar

Nozick

Yamagishi

, and

Echizen

, “

Mesonet: A Compact Facial Video Forgery Detection Network

,” in

2018 IEEE International Workshop on Information Forensics and Security (WIFS)

, IEEE,

2018

–

, https://www.npr.org/2022/03/16/1087062648/deepfake-video-zelenskyy-experts-war-manipulation-ukraine-russia.

[3]

Agarwal

Farid

Fried

, and

Agrawala

, “

Detecting Deepfake Videos from Phoneme-viseme Mismatches

,” in

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops

2020

660

–

[4]

Allyn

, “

Deepfake Video of Zelenskyy Could be ‘Tip of the Iceberg’ in Info War, Experts Warn

,”

NPR

2022

[5]

A. M.

Almars

, “

Deepfakes Detection Techniques Using Deep Learning: A Survey

,”

Journal of Computer and Communications

(

2021

–

[6]

Amerini

Galteri

Caldelli

, and

Del Bimbo

, “

Deepfake Video Detection through Optical Flow Based CNN

,” in

Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops

2019

[7]

Balakrishnan

Zhao

A. V.

Dalca

Durand

, and

Guttag

, “

Synthesizing Images of Humans in Unseen Poses

,” in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2018

8340

–

[8]

T. D.

Bui

et al., “

Fast and Realistic 2D Facial Animation based on Image Warping

,” in

2009 International Conference on Knowledge and Systems Engineering

, IEEE,

2009

–

[9]

Chan

Ginosar

Zhou

, and

A. A.

Efros

, “

Everybody Dance Now

,” in

Proceedings of the IEEE/CVF International Conference on Computer Vision

2019

5933

–

[10]

Chen

Zhang

Tan

Yin

, and

Liu

, “

PMAN: Progressive Multi-Attention Network for Human Pose Transfer

,”

IEEE Transactions on Circuits and Systems for Video Technology

2021

[11]

J. S.

Chung

Nagrani

, and

Zisserman

, “

VoxCeleb2: Deep Speaker Recognition

,” in

INTERSPEECH

2018

, https://www.tiktok.com/@deeptomcruise/video/6965575763298962693.

[12]

deeptomcruise

, “

deeptomcruise

,”

May

2021

[13]

Dolhansky

Howes

Pflaum

Baram

, and

C. C.

Ferrer

, “

The Deepfake Detection Challenge (DFDC) Preview Dataset

,”

arXiv preprint

arXiv:

1910.08854

2019

[14]

Dong

Liang

Gong

Lai

Zhu

, and

Yin

, “

Soft-gated Warping-gan for Pose-guided Person Image Synthesis

,”

arXiv preprint

arXiv:

1810.11610

2018

[15]

Esser

Sutter

, and

Ommer

, “

A Variational U-net for Conditional Appearance and Shape Generation

,” in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2018

8857

–

[16]

H.-S.

Fang

Xie

Y.-W.

Tai

, and

, “

Weakly and Semi Supervised Human Body Part Parsing via Pose-guided Knowledge Transfer

,”

arXiv preprint

arXiv:

1805.04310

2018

[17]

Gafni

Wolf

, and

Taigman

, “

Vid2game: Controllable Characters Extracted from Real-world Videos

,”

arXiv preprint

arXiv:

1904.08379

2019

[18]

Gandhi

and

Jain

, “

Adversarial Perturbations Fool Deepfake Detectors

,” in

2020 International Joint Conference on Neural Networks (IJCNN)

, IEEE,

2020

–

[19]

Gomes

Martins

Ferreira

, and

Nascimento

, “

Do As I Do: Transferring Human Motion and Appearance between Monocular Videos with Spatial and Temporal Constraints

,” in

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision

2020

3366

–

[20]

Goodfellow

Pouget-Abadie

Mirza

Warde-Farley

Ozair

Courville

, and

Bengio

, “

Generative Adversarial Nets

,”

Advances in Neural Information Processing Systems

2014

[21]

Grigorev

Sevastopolsky

Vakhitov

, and

Lempitsky

, “

Coordinate-based Texture Inpainting for Pose-guided Image Generation

”

arXiv preprint

arXiv:

1811.11459

2018

[22]

Güera

and

E. J.

Delp

, “

Deepfake Video Detection Using Recurrent Neural Networks

,” in

2018 15th IEEE International Conference on Advanced Video and Signal based Surveillance (AVSS)

, IEEE,

2018

–

[23]

R. A.

Güler

Neverova

, and

Kokkinos

, “

Densepose: Dense Human Pose Estimation in the Wild

,” in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2018

7297

–

306

[24]

Kersner

Kim

Seo

, and

Kim

, “

Marionette: Few-shot Face Reenactment Preserving Identity of Unseen Targets

,”

(

2020

10893

–

900

[25]

C.-C.

Hsu

Y.-X.

Zhuang

, and

C.-Y.

Lee

, “

Deep Fake Image Detection based on Pairwise Learning

,”

Applied Sciences

(

2020

370

[26]

Isola

J.-Y.

Zhu

Zhou

, and

A. A.

Efros

, “

Image-to-image Translation with Conditional Adversarial Networks

,” in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2017

1125

–

[27]

N. S.

Ivanov

A. V.

Arzhskov

, and

V. G.

Ivanenko

, “

Combining Deep Learning and Super-resolution Algorithms for Deep Fake Detection

,” in

2020 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus)

, IEEE,

2020

326

–

[28]

Jeon

Nam

S. W.

, and

S. J.

Kim

, “

Cross-Identity Motion Transfer for Arbitrary Objects through Pose-Attentive Video Reassembling

,” in

European Conference on Computer Vision

, Springer,

2020

292

–

308

[29]

Kanazawa

J. Y.

Zhang

Felsen

, and

Malik

, “

Learning 3D Human Dynamics from Video

,” in

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

2019

5614

–

, https://www.bbc.com/news/technology-42984127.

[30]

Kelion

, “

Reddit Bans Deepfake Porn Videos

,”

BBC

2018

[31]

Kocabas

Athanasiou

, and

M. J.

Black

, “

Vibe: Video Inference for Human Body Pose and Shape Estimation

,” in

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

2020

5253

–

[32]

M. R.

Koujan

M. C.

Doukas

Roussos

, and

Zafeiriou

, “

Head2head: Video-based Neural Head Synthesis

,”

arXiv preprint

arXiv:

2005.10954

2020

[33]

Ledig

Theis

Huszár

Caballero

Cunningham

Acosta

Aitken

Tejani

Totz

Wang

, et al., “

Photo-realistic single image super-resolution using a generative adversarial network

,” in

Proceedings of the IEEE conference on computer vision and pattern recognition

2017

4681

–

[34]

Zhang

Liu

Y.-K.

Lai

, and

Dai

, “

PoNA: Pose-Guided Non-Local Attention for Human Pose Transfer

,”

IEEE Transactions on Image Processing

2020

9584

–

[35]

Huang

, and

C. C.

Loy

, “

Dense Intrinsic Appearance Flow for Human Pose Transfer

,” in

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

2019

3693

–

702

[36]

and

Lyu

, “

Exposing Deepfake Videos by Detecting Face Warping Artifacts

,”

arXiv preprint

arXiv:

1811.00656

2018

[37]

Yang

Sun

, and

Lyu

, “

Celeb-df: A large-scale challenging dataset for deepfake forensics

,” in

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

2020

3207

–

[38]

Liu

Piao

Min

Luo

, and

Gao

, “

Liquid Warping GAN: A Unified Framework for Human Motion Imitation, Appearance Transfer and Novel View Synthesis

,” in

Proceedings of the IEEE/CVF International Conference on Computer Vision

2019

5904

–

[39]

Liu

Piao

Luo

, and

Gao

, “

Liquid Warping GAN with Attention: A Unified Framework for Human Image Synthesis

,”

IEEE Transactions on Pattern Analysis and Machine Intelligence

2021

[40]

Liu

Luo

Qiu

Wang

, and

Tang

, “

Deepfashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations

,” in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2016

1096

–

104

[41]

Xia

, and

Liu

, “

Spatial Consistency Constrained GAN for Human Motion Transfer

,”

IEEE Transactions on Circuits and Systems for Video Technology

2021

[42]

Jia

Sun

Schiele

Tuytelaars

, and

VAN GOOL

, “

Pose Guided Person Image Generation

,”

Advances in Neural Information Processing Systems

2017

[43]

Sun

Georgoulis

Van Gool

Schiele

, and

Fritz

, “

Disentangled Person Image Generation

,” in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2018

–

108

[44]

Y. S.

Malik

Sabahat

, and

M. O.

Moazzam

, “

Image Animations on Driving Videos with DeepFakes and Detecting DeepFakes Generated Animations

,” in

2020 IEEE 23rd International Multitopic Conference (INMIC)

, IEEE,

2020

–

[45]

Mao

Xie

R. Y.

Lau

Wang

, and

Paul Smolley

, “

Least Squares Generative Adversarial Networks

,” in

Proceedings of the IEEE International Conference on Computer Vision

2017

2794

–

802

[46]

Mechrez

Talmi

, and

Zelnik-Manor

, “

The Contextual Loss for Image Transformation with Non-aligned Data

,” in

Proceedings of the European Conference on Computer Vision (ECCV)

2018

768

–

[47]

Minaee

Y. Y.

Boykov

Porikli

A. J.

Plaza

Kehtarnavaz

, and

Terzopoulos

, “

Image Segmentation Using Deep Learning: A Survey

,”

IEEE Transactions on Pattern Analysis and Machine Intelligence

2021

[48]

Mirsky

and

Lee

, “

The Creation and Detection of Deepfakes: A Survey

,”

ACM Computing Surveys (CSUR)

(

2021

–

[49]

Nagrani

J. S.

Chung

, and

Zisserman

, “

VoxCeleb: A Large-scale Speaker Identification Dataset

,” in

INTERSPEECH

2017

[50]

Neverova

R. A.

Guler

, and

Kokkinos

, “

Dense Pose Transfer

,” in

Proceedings of the European Conference on Computer Vision (ECCV)

2018

123

–

[51]

T. T.

Nguyen

C. M.

Nguyen

D. T.

Nguyen

D. T.

Nguyen

, and

Nahavandi

, “

Deep Learning for Deepfakes Creation and Detection: A Survey

,”

arXiv preprint

arXiv:

1909.11573v4

2022

, https://github.com/iperov/DeepFaceLab.

[52]

Perov

Gao

Chervoniy

Liu

Marangonda

Umé

Dpfks

C. S.

Facenheim

Jiang

, et al., “

DeepFaceLab

,”

2020

[53]

Pumarola

Goswami

Vicente

De la Torre

, and

Moreno-Noguer

, “

Unsupervised Image-to-video Clothing Transfer

,” in

Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops

2019

–

[54]

Ren

Chai

Tulyakov

Fang

Shen

, and

Yang

, “

Human Motion Transfer from Poses in the Wild

,”

arXiv preprint

arXiv:

2004.03142

2020

[55]

Ren

Liu

, and

T. H.

, “

Deep Spatial Transformation for Pose-Guided Person Image Generation and Animation

,”

IEEE Transactions on Image Processing

2020

8622

–

[56]

Ronneberger

Fischer

, and

Brox

, “

U-net: Convolutional Networks for Biomedical Image Segmentation

,” in

International Conference on Medical Image Computing and Computer-assisted Intervention

, Springer,

2015

234

–

[57]

Sabir

Cheng

Jaiswal

AbdAlmageed

Masi

, and

Natarajan

, “

Recurrent-convolution Approach to Deepfake Detection-state-of-art Results on Faceforensics++

,”

arXiv preprint

arXiv:

1905.00582

2019

[58]

Sheng

Pan

Guo

Shao

, and

C. C.

Loy

, “

High-Quality Video Generation from Static Structural Annotations

,”

International Journal of Computer Vision

128

2020

2552

–

[59]

Siarohin

Lathuilière

Sangineto

, and

Sebe

, “

Appearance and Pose-conditioned Human Image Generation Using Deformable Gans

,”

IEEE transactions on pattern analysis and machine intelligence

2019

[60]

Siarohin

Lathuilière

Tulyakov

Ricci

, and

Sebe

, “

Animating Arbitrary Objects via Deep Motion Transfer

,” in

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

2019

2377

–

, https://github.com/AliaksandrSiarohin/first-order-model.

[61]

Siarohin

Lathuilière

Tulyakov

Ricci

, and

Sebe

, “

First Order Motion Model for Image Animation

,”

2019

[62]

Siarohin

Lathuilière

Tulyakov

Ricci

, and

Sebe

, “

First order Motion Model for Image Animation

,”

Advances in Neural Information Processing Systems

2019

7137

–

[63]

Siarohin

Roy

Lathuilière

Tulyakov

Ricci

, and

Sebe

, “

Motion-supervised Co-Part Segmentation

,”

arXiv preprint

arXiv:

2004.03234

2020

[64]

Siarohin

O. J.

Woodford

Ren

Chai

, and

Tulyakov

, “

Motion Representations for Articulated Animation

,”

arXiv preprint

arXiv:

2104.11280

2021

[65]

Suwajanakorn

S. M.

Seitz

, and

Kemelmacher-Shlizerman

, “

Synthesizing Obama: Learning Lip Sync from Audio

,”

ACM Transactions on Graphics (ToG)

(

2017

–

[66]

Toshev

and

Szegedy

, “

Deeppose: Human Pose Estimation via Deep Neural Networks

,” in

Proceedings of the IEEE Conference on Computer-Vision and Pattern Recognition

2014

1653

–

[67]

Trinh

Tsang

Rambhatla

, and

Liu

, “

Interpretable and Trustworthy Deepfake Detection via Dynamic Prototypes

,” in

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision

2021

1973

–

[68]

Tulyakov

M.-Y.

Liu

Yang

, and

Kautz

, “

MoCoGAN: Decomposing Motion and Content for Video Generation

,” in

IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

2018

1526

–

[69]

Verdoliva

, “

Media Forensics and Deepfakes: An Overview

,”

IEEE Journal of Selected Topics in Signal Processing

(

2020

910

–

[70]

Wang

, and

Tao

, “

Self-supervised Pose Adaptation for Cross-Domain Image Animation

,”

IEEE Transactions on Artificial Intelligence

(

2020

–

[71]

Wang

Huang

, and

Zhang

, “

Supervised Video-to-Video Synthesis for Single Human Pose Transfer

,”

IEEE Access

2021

17544

–

[72]

Wang

Juefei-Xu

Guo

Huang

Liu

, and

Wang

, “

DeepTag: Robust Image Tagging for DeepFake Provenance

,”

arXiv preprint

arXiv:

2009.09869

2020

[73]

T.-C.

Wang

M.-Y.

Liu

Tao

Liu

Kautz

, and

Catanzaro

, “

Few-shot Video-to-video Synthesis

,”

arXiv preprint

arXiv:

1910.12713

2019

[74]

T.-C.

Wang

M.-Y.

Liu

J.-Y.

Zhu

Liu

Tao

Kautz

, and

Catanzaro

, “

Video-to-video Synthesis

,”

arXiv preprint

arXiv:

1808.06601

2018

[75]

Wei

Shen

, and

Huang

, “

GAC-GAN: A General Method for Appearance-controllable Human Video Motion Transfer

,”

IEEE Transactions on Multimedia

2020

[76]

Wiles

Koepke

, and

Zisserman

, “

X2face: A Network for Controlling Face Generation Using Images, Audio, and Pose Codes

,” in

Proceedings of the European Conference on Computer Vision (ECCV)

2018

670

–

[77]

Xue

K. L.

Bouman

, and

W. T.

Freeman

, “

Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks

,”

arXiv preprint

arXiv:

1607.02586

2016

[78]

Yang

Wang

Liu

Gao

Ren

Zhang

Wang

Hua

, and

Gao

, “

Towards Fine-grained Human Pose Transfer with Detail Replenishing Network

,”

IEEE Transactions on Image Processing

2021

2422

–

[79]

Yang

, and

Lyu

, “

Exposing Deep Fakes Using Inconsistent Head Poses

,” in

ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

, IEEE,

2019

8261

–

[80]

C.-M.

C.-T.

Chang

, and

Y.-W.

, “

Detecting Deepfake-forged Contents with Separable Convolutional Neural Network and Image Segmentation

,”

arXiv preprint

arXiv:

1912.12184

2019

[81]

Zablotskaia

Siarohin

Zhao

, and

Sigal

, “

DwNet: Dense Warp-based Network for Pose-guided Human Video Generation

,”

arXiv preprint

arXiv:

1910.09139

2019

[82]

Zanfir

A.-I.

Popa

Zanfir

, and

Sminchisescu

, “

Human Appearance Transfer

,” in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2018

5391

–

[83]

Zhang

Miao

, and

Wang

, “

Images Based Human Volumetric Model Reconstruction and Animation

,” in

2010 International Conference on Image Analysis and Signal Processing

, IEEE,

2010

210

–

[84]

Zhao

Z.-Q.

Cheng

Liu

Jie

, and

Feng

, “

Multi-view Image Generation from a Single-view

,” in

Proceedings of the 26th ACM international conference on Multimedia

2018

383

–

[85]

Zhao

Xie

Liu

, and

Xiong

, “

Pose Guided Person Image Generation Based on Pose Skeleton Sequence and 3D Convolution

,” in

2020 IEEE International Conference on Image Processing (ICIP)

, IEEE,

2020

1561

–

[86]

Zheng

Chen

, and

Luo

, “

Pose Flow Learning From Person Images for Pose Guided Synthesis

,”

IEEE Transactions on Image Processing

2021

1898

–

909

[87]

Zhou

Wang

Fang

Bui

, and

Berg

, “

Dance Dance Generation: Motion Transfer for Internet Videos

,” in

Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops

2019

[88]

Zhu

Huang

Shi

Wang

, and

Bai

, “

Progressive Pose Attention Transfer for Person Image Generation

,” in

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

2019

2347

–

2022

R. Brooks, Y. Yuan, Y. Liu and H. Chen

Table 1

Common loss functions in surveyed papers.

Loss	Function example	Citation
L₁	$\sum_{i = 1}^{n} \| y_{t r u e} - y_{p r e d i c t e d} \|$	Widely adopted
L₂	${\sum_{i = 1}^{n} (y_{t r u e} - y_{p r e d i c t e d})}^{2}$	Widely adopted
Perceptual	$L_{p} (y, \hat{y}) = \sum_{v = 1}^{N} {‖ Φ^{v} (y) - Φ^{v} (\hat{y}) ‖}_{2}$	[50]
VGG	$\frac{1}{W_{i, j} H_{i, j}} \sum_{x = 1}^{W_{i, j}} \sum_{y = 1}^{H_{i, j}} (ϕ_{i, j} {(I^{H R})}_{x, y} - {ϕ_{i, j} {(G_{θ_{G}} (I^{L R}))}_{x, y})}^{2}$	[33]
Adversarial	$\frac{1}{2} E_{z} [l (D (z, y) - 1)] + \frac{1}{2} E_{z} [l (D (z, \hat{y}))] + \frac{1}{2} E_{z} [l (D (G (z) - 1))], where l (x) = x^{2}$	[50]
LSGAN (D)	$\underset{D}{m i n} V_{L S G A N} (D) = \frac{1}{2} E_{x ~ p_{d a t a} (x)} [(D (x) - b)^{2}] + \frac{1}{2} E_{z ~ p_{z} (z)} [{(D (G (z)) - a)}^{2}]$	[45]
LSGAN (G)	$\underset{G}{m i n} V_{L S G A N} (G) = \frac{1}{2} E_{z ~ p_{z} (z)} [(D (G (z)) - c)^{2}]$	[45]

Table 2

Models, definitions, and examples.

Model	Definition	Example	Figure
Skeleton-to-Image	A model that takes some skeleton of a pose or segmentation map and applies an appearance to it.	Pose Guided Person Image Generation [42]	Figure 2a
Image-to-Image	A model that takes a source image and a target image and creates a final image with the source pose and target appearance.	Human Appearance Transfer [82]	Figure 2b
Skeleton-to-Video	A model that takes a skeleton or segmentation map for the motion of a person and adds a target appearance to a final video.	High-Quality Video Generation from Static Structural Annotations [58]	Figure 2c
Video-to-Video	A model that takes a video of one person for the motion of another person and adds a target appearance to a final video.	Everybody Dance Now [9]	Figure 2d
Image-to-Video	A model that takes an image as the source and a video for the motion and applies the source appearance to the video motion.	Animating Arbitrary Objects via Deep Motion Transfer [60]	Figure 2e

Model	Definition	Example	Figure
Skeleton-to-Image	A model that takes some skeleton of a pose or segmentation map and applies an appearance to it.	Pose Guided Person Image Generation [42]	Figure 2a
Image-to-Image	A model that takes a source image and a target image and creates a final image with the source pose and target appearance.	Human Appearance Transfer [82]	Figure 2b
Skeleton-to-Video	A model that takes a skeleton or segmentation map for the motion of a person and adds a target appearance to a final video.	High-Quality Video Generation from Static Structural Annotations [58]	Figure 2c
Video-to-Video	A model that takes a video of one person for the motion of another person and adds a target appearance to a final video.	Everybody Dance Now [9]	Figure 2d
Image-to-Video	A model that takes an image as the source and a video for the motion and applies the source appearance to the video motion.	Animating Arbitrary Objects via Deep Motion Transfer [60]	Figure 2e

[1]

Aberman

Shi

Liao

Lischinski

Chen

, and

Cohen-Or

, “

Deep Video-based Performance Cloning

,”

Computer Graphics Forum

(

2019

219

–

[2]

Afchar

Nozick

Yamagishi

, and

Echizen

, “

Mesonet: A Compact Facial Video Forgery Detection Network

,” in

2018 IEEE International Workshop on Information Forensics and Security (WIFS)

, IEEE,

2018

–

, https://www.npr.org/2022/03/16/1087062648/deepfake-video-zelenskyy-experts-war-manipulation-ukraine-russia.

[3]

Agarwal

Farid

Fried

, and

Agrawala

, “

Detecting Deepfake Videos from Phoneme-viseme Mismatches

,” in

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops

2020

660

–

[4]

Allyn

, “

Deepfake Video of Zelenskyy Could be ‘Tip of the Iceberg’ in Info War, Experts Warn

,”

NPR

2022

[5]

A. M.

Almars

, “

Deepfakes Detection Techniques Using Deep Learning: A Survey

,”

Journal of Computer and Communications

(

2021

–

[6]

Amerini

Galteri

Caldelli

, and

Del Bimbo

, “

Deepfake Video Detection through Optical Flow Based CNN

,” in

Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops

2019

[7]

Balakrishnan

Zhao

A. V.

Dalca

Durand

, and

Guttag

, “

Synthesizing Images of Humans in Unseen Poses

,” in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2018

8340

–

[8]

T. D.

Bui

et al., “

Fast and Realistic 2D Facial Animation based on Image Warping

,” in

2009 International Conference on Knowledge and Systems Engineering

, IEEE,

2009

–

[9]

Chan

Ginosar

Zhou

, and

A. A.

Efros

, “

Everybody Dance Now

,” in

Proceedings of the IEEE/CVF International Conference on Computer Vision

2019

5933

–

[10]

Chen

Zhang

Tan

Yin

, and

Liu

, “

PMAN: Progressive Multi-Attention Network for Human Pose Transfer

,”

IEEE Transactions on Circuits and Systems for Video Technology

2021

[11]

J. S.

Chung

Nagrani

, and

Zisserman

, “

VoxCeleb2: Deep Speaker Recognition

,” in

INTERSPEECH

2018

, https://www.tiktok.com/@deeptomcruise/video/6965575763298962693.

[12]

deeptomcruise

, “

deeptomcruise

,”

May

2021

[13]

Dolhansky

Howes

Pflaum

Baram

, and

C. C.

Ferrer

, “

The Deepfake Detection Challenge (DFDC) Preview Dataset

,”

arXiv preprint

arXiv:

1910.08854

2019

[14]

Dong

Liang

Gong

Lai

Zhu

, and

Yin

, “

Soft-gated Warping-gan for Pose-guided Person Image Synthesis

,”

arXiv preprint

arXiv:

1810.11610

2018

[15]

Esser

Sutter

, and

Ommer

, “

A Variational U-net for Conditional Appearance and Shape Generation

,” in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2018

8857

–

[16]

H.-S.

Fang

Xie

Y.-W.

Tai

, and

, “

Weakly and Semi Supervised Human Body Part Parsing via Pose-guided Knowledge Transfer

,”

arXiv preprint

arXiv:

1805.04310

2018

[17]

Gafni

Wolf

, and

Taigman

, “

Vid2game: Controllable Characters Extracted from Real-world Videos

,”

arXiv preprint

arXiv:

1904.08379

2019

[18]

Gandhi

and

Jain

, “

Adversarial Perturbations Fool Deepfake Detectors

,” in

2020 International Joint Conference on Neural Networks (IJCNN)

, IEEE,

2020

–

[19]

Gomes

Martins

Ferreira

, and

Nascimento

, “

Do As I Do: Transferring Human Motion and Appearance between Monocular Videos with Spatial and Temporal Constraints

,” in

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision

2020

3366

–

[20]

Goodfellow

Pouget-Abadie

Mirza

Warde-Farley

Ozair

Courville

, and

Bengio

, “

Generative Adversarial Nets

,”

Advances in Neural Information Processing Systems

2014

[21]

Grigorev

Sevastopolsky

Vakhitov

, and

Lempitsky

, “

Coordinate-based Texture Inpainting for Pose-guided Image Generation

”

arXiv preprint

arXiv:

1811.11459

2018

[22]

Güera

and

E. J.

Delp

, “

Deepfake Video Detection Using Recurrent Neural Networks

,” in

2018 15th IEEE International Conference on Advanced Video and Signal based Surveillance (AVSS)

, IEEE,

2018

–

[23]

R. A.

Güler

Neverova

, and

Kokkinos

, “

Densepose: Dense Human Pose Estimation in the Wild

,” in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2018

7297

–

306

[24]

Kersner

Kim

Seo

, and

Kim

, “

Marionette: Few-shot Face Reenactment Preserving Identity of Unseen Targets

,”

(

2020

10893

–

900

[25]

C.-C.

Hsu

Y.-X.

Zhuang

, and

C.-Y.

Lee

, “

Deep Fake Image Detection based on Pairwise Learning

,”

Applied Sciences

(

2020

370

[26]

Isola

J.-Y.

Zhu

Zhou

, and

A. A.

Efros

, “

Image-to-image Translation with Conditional Adversarial Networks

,” in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2017

1125

–

[27]

N. S.

Ivanov

A. V.

Arzhskov

, and

V. G.

Ivanenko

, “

Combining Deep Learning and Super-resolution Algorithms for Deep Fake Detection

,” in

2020 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus)

, IEEE,

2020

326

–

[28]

Jeon

Nam

S. W.

, and

S. J.

Kim

, “

Cross-Identity Motion Transfer for Arbitrary Objects through Pose-Attentive Video Reassembling

,” in

European Conference on Computer Vision

, Springer,

2020

292

–

308

[29]

Kanazawa

J. Y.

Zhang

Felsen

, and

Malik

, “

Learning 3D Human Dynamics from Video

,” in

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

2019

5614

–

, https://www.bbc.com/news/technology-42984127.

[30]

Kelion

, “

Reddit Bans Deepfake Porn Videos

,”

BBC

2018

[31]

Kocabas

Athanasiou

, and

M. J.

Black

, “

Vibe: Video Inference for Human Body Pose and Shape Estimation

,” in

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

2020

5253

–

[32]

M. R.

Koujan

M. C.

Doukas

Roussos

, and

Zafeiriou

, “

Head2head: Video-based Neural Head Synthesis

,”

arXiv preprint

arXiv:

2005.10954

2020

[33]

Ledig

Theis

Huszár

Caballero

Cunningham

Acosta

Aitken

Tejani

Totz

Wang

, et al., “

Photo-realistic single image super-resolution using a generative adversarial network

,” in

Proceedings of the IEEE conference on computer vision and pattern recognition

2017

4681

–

[34]

Zhang

Liu

Y.-K.

Lai

, and

Dai

, “

PoNA: Pose-Guided Non-Local Attention for Human Pose Transfer

,”

IEEE Transactions on Image Processing

2020

9584

–

[35]

Huang

, and

C. C.

Loy

, “

Dense Intrinsic Appearance Flow for Human Pose Transfer

,” in

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

2019

3693

–

702

[36]

and

Lyu

, “

Exposing Deepfake Videos by Detecting Face Warping Artifacts

,”

arXiv preprint

arXiv:

1811.00656

2018

[37]

Yang

Sun

, and

Lyu

, “

Celeb-df: A large-scale challenging dataset for deepfake forensics

,” in

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

2020

3207

–

[38]

Liu

Piao

Min

Luo

, and

Gao

, “

Liquid Warping GAN: A Unified Framework for Human Motion Imitation, Appearance Transfer and Novel View Synthesis

,” in

Proceedings of the IEEE/CVF International Conference on Computer Vision

2019

5904

–

[39]

Liu

Piao

Luo

, and

Gao

, “

Liquid Warping GAN with Attention: A Unified Framework for Human Image Synthesis

,”

IEEE Transactions on Pattern Analysis and Machine Intelligence

2021

[40]

Liu

Luo

Qiu

Wang

, and

Tang

, “

Deepfashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations

,” in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2016

1096

–

104

[41]

Xia

, and

Liu

, “

Spatial Consistency Constrained GAN for Human Motion Transfer

,”

IEEE Transactions on Circuits and Systems for Video Technology

2021

[42]

Jia

Sun

Schiele

Tuytelaars

, and

VAN GOOL

, “

Pose Guided Person Image Generation

,”

Advances in Neural Information Processing Systems

2017

[43]

Sun

Georgoulis

Van Gool

Schiele

, and

Fritz

, “

Disentangled Person Image Generation

,” in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2018

–

108

[44]

Y. S.

Malik

Sabahat

, and

M. O.

Moazzam

, “

Image Animations on Driving Videos with DeepFakes and Detecting DeepFakes Generated Animations

,” in

2020 IEEE 23rd International Multitopic Conference (INMIC)

, IEEE,

2020

–

[45]

Mao

Xie

R. Y.

Lau

Wang

, and

Paul Smolley

, “

Least Squares Generative Adversarial Networks

,” in

Proceedings of the IEEE International Conference on Computer Vision

2017

2794

–

802

[46]

Mechrez

Talmi

, and

Zelnik-Manor

, “

The Contextual Loss for Image Transformation with Non-aligned Data

,” in

Proceedings of the European Conference on Computer Vision (ECCV)

2018

768

–

[47]

Minaee

Y. Y.

Boykov

Porikli

A. J.

Plaza

Kehtarnavaz

, and

Terzopoulos

, “

Image Segmentation Using Deep Learning: A Survey

,”

IEEE Transactions on Pattern Analysis and Machine Intelligence

2021

[48]

Mirsky

and

Lee

, “

The Creation and Detection of Deepfakes: A Survey

,”

ACM Computing Surveys (CSUR)

(

2021

–

[49]

Nagrani

J. S.

Chung

, and

Zisserman

, “

VoxCeleb: A Large-scale Speaker Identification Dataset

,” in

INTERSPEECH

2017

[50]

Neverova

R. A.

Guler

, and

Kokkinos

, “

Dense Pose Transfer

,” in

Proceedings of the European Conference on Computer Vision (ECCV)

2018

123

–

[51]

T. T.

Nguyen

C. M.

Nguyen

D. T.

Nguyen

D. T.

Nguyen

, and

Nahavandi

, “

Deep Learning for Deepfakes Creation and Detection: A Survey

,”

arXiv preprint

arXiv:

1909.11573v4

2022

, https://github.com/iperov/DeepFaceLab.

[52]

Perov

Gao

Chervoniy

Liu

Marangonda

Umé

Dpfks

C. S.

Facenheim

Jiang

, et al., “

DeepFaceLab

,”

2020

[53]

Pumarola

Goswami

Vicente

De la Torre

, and

Moreno-Noguer

, “

Unsupervised Image-to-video Clothing Transfer

,” in

Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops

2019

–

[54]

Ren

Chai

Tulyakov

Fang

Shen

, and

Yang

, “

Human Motion Transfer from Poses in the Wild

,”

arXiv preprint

arXiv:

2004.03142

2020

[55]

Ren

Liu

, and

T. H.

, “

Deep Spatial Transformation for Pose-Guided Person Image Generation and Animation

,”

IEEE Transactions on Image Processing

2020

8622

–

[56]

Ronneberger

Fischer

, and

Brox

, “

U-net: Convolutional Networks for Biomedical Image Segmentation

,” in

International Conference on Medical Image Computing and Computer-assisted Intervention

, Springer,

2015

234

–

[57]

Sabir

Cheng

Jaiswal

AbdAlmageed

Masi

, and

Natarajan

, “

Recurrent-convolution Approach to Deepfake Detection-state-of-art Results on Faceforensics++

,”

arXiv preprint

arXiv:

1905.00582

2019

[58]

Sheng

Pan

Guo

Shao

, and

C. C.

Loy

, “

High-Quality Video Generation from Static Structural Annotations

,”

International Journal of Computer Vision

128

2020

2552

–

[59]

Siarohin

Lathuilière

Sangineto

, and

Sebe

, “

Appearance and Pose-conditioned Human Image Generation Using Deformable Gans

,”

IEEE transactions on pattern analysis and machine intelligence

2019

[60]

Siarohin

Lathuilière

Tulyakov

Ricci

, and

Sebe

, “

Animating Arbitrary Objects via Deep Motion Transfer

,” in

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

2019

2377

–

, https://github.com/AliaksandrSiarohin/first-order-model.

[61]

Siarohin

Lathuilière

Tulyakov

Ricci

, and

Sebe

, “

First Order Motion Model for Image Animation

,”

2019

[62]

Siarohin

Lathuilière

Tulyakov

Ricci

, and

Sebe

, “

First order Motion Model for Image Animation

,”

Advances in Neural Information Processing Systems

2019

7137

–

[63]

Siarohin

Roy

Lathuilière

Tulyakov

Ricci

, and

Sebe

, “

Motion-supervised Co-Part Segmentation

,”

arXiv preprint

arXiv:

2004.03234

2020

[64]

Siarohin

O. J.

Woodford

Ren

Chai

, and

Tulyakov

, “

Motion Representations for Articulated Animation

,”

arXiv preprint

arXiv:

2104.11280

2021

[65]

Suwajanakorn

S. M.

Seitz

, and

Kemelmacher-Shlizerman

, “

Synthesizing Obama: Learning Lip Sync from Audio

,”

ACM Transactions on Graphics (ToG)

(

2017

–

[66]

Toshev

and

Szegedy

, “

Deeppose: Human Pose Estimation via Deep Neural Networks

,” in

Proceedings of the IEEE Conference on Computer-Vision and Pattern Recognition

2014

1653

–

[67]

Trinh

Tsang

Rambhatla

, and

Liu

, “

Interpretable and Trustworthy Deepfake Detection via Dynamic Prototypes

,” in

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision

2021

1973

–

[68]

Tulyakov

M.-Y.

Liu

Yang

, and

Kautz

, “

MoCoGAN: Decomposing Motion and Content for Video Generation

,” in

IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

2018

1526

–

[69]

Verdoliva

, “

Media Forensics and Deepfakes: An Overview

,”

IEEE Journal of Selected Topics in Signal Processing

(

2020

910

–

[70]

Wang

, and

Tao

, “

Self-supervised Pose Adaptation for Cross-Domain Image Animation

,”

IEEE Transactions on Artificial Intelligence

(

2020

–

[71]

Wang

Huang

, and

Zhang

, “

Supervised Video-to-Video Synthesis for Single Human Pose Transfer

,”

IEEE Access

2021

17544

–

[72]

Wang

Juefei-Xu

Guo

Huang

Liu

, and

Wang

, “

DeepTag: Robust Image Tagging for DeepFake Provenance

,”

arXiv preprint

arXiv:

2009.09869

2020

[73]

T.-C.

Wang

M.-Y.

Liu

Tao

Liu

Kautz

, and

Catanzaro

, “

Few-shot Video-to-video Synthesis

,”

arXiv preprint

arXiv:

1910.12713

2019

[74]

T.-C.

Wang

M.-Y.

Liu

J.-Y.

Zhu

Liu

Tao

Kautz

, and

Catanzaro

, “

Video-to-video Synthesis

,”

arXiv preprint

arXiv:

1808.06601

2018

[75]

Wei

Shen

, and

Huang

, “

GAC-GAN: A General Method for Appearance-controllable Human Video Motion Transfer

,”

IEEE Transactions on Multimedia

2020

[76]

Wiles

Koepke

, and

Zisserman

, “

X2face: A Network for Controlling Face Generation Using Images, Audio, and Pose Codes

,” in

Proceedings of the European Conference on Computer Vision (ECCV)

2018

670

–

[77]

Xue

K. L.

Bouman

, and

W. T.

Freeman

, “

Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks

,”

arXiv preprint

arXiv:

1607.02586

2016

[78]

Yang

Wang

Liu

Gao

Ren

Zhang

Wang

Hua

, and

Gao

, “

Towards Fine-grained Human Pose Transfer with Detail Replenishing Network

,”

IEEE Transactions on Image Processing

2021

2422

–

[79]

Yang

, and

Lyu

, “

Exposing Deep Fakes Using Inconsistent Head Poses

,” in

ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

, IEEE,

2019

8261

–

[80]

C.-M.

C.-T.

Chang

, and

Y.-W.

, “

Detecting Deepfake-forged Contents with Separable Convolutional Neural Network and Image Segmentation

,”

arXiv preprint

arXiv:

1912.12184

2019

[81]

Zablotskaia

Siarohin

Zhao

, and

Sigal

, “

DwNet: Dense Warp-based Network for Pose-guided Human Video Generation

,”

arXiv preprint

arXiv:

1910.09139

2019

[82]

Zanfir

A.-I.

Popa

Zanfir

, and

Sminchisescu

, “

Human Appearance Transfer

,” in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2018

5391

–

[83]

Zhang

Miao

, and

Wang

, “

Images Based Human Volumetric Model Reconstruction and Animation

,” in

2010 International Conference on Image Analysis and Signal Processing

, IEEE,

2010

210

–

[84]

Zhao

Z.-Q.

Cheng

Liu

Jie

, and

Feng

, “

Multi-view Image Generation from a Single-view

,” in

Proceedings of the 26th ACM international conference on Multimedia

2018

383

–

[85]

Zhao

Xie

Liu

, and

Xiong

, “

Pose Guided Person Image Generation Based on Pose Skeleton Sequence and 3D Convolution

,” in

2020 IEEE International Conference on Image Processing (ICIP)

, IEEE,

2020

1561

–

[86]

Zheng

Chen

, and

Luo

, “

Pose Flow Learning From Person Images for Pose Guided Synthesis

,”

IEEE Transactions on Image Processing

2021

1898

–

909

[87]

Zhou

Wang

Fang

Bui

, and

Berg

, “

Dance Dance Generation: Motion Transfer for Internet Videos

,” in

Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops

2019

[88]

Zhu

Huang

Shi

Wang

, and

Bai

, “

Progressive Pose Attention Transfer for Person Image Generation

,” in

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

2019

2347

–