Literature Review of Audio-Driven 2D Avatar Video Generation Algorithms
Yuxuan Li,
Han Zhang (),
Shaozhong Cao (),
Dan Jiang (),
Meng Wang and
Weiqi Wang
Additional contact information
Yuxuan Li: Beijing Institute of Graphic Communication
Han Zhang: Beijing Institute of Graphic Communication
Shaozhong Cao: Beijing Institute of Graphic Communication
Dan Jiang: Beijing Institute of Graphic Communication
Meng Wang: Beijing Institute of Graphic Communication
Weiqi Wang: Beijing Institute of Graphic Communication
A chapter in IEIS 2022, 2023, pp 85-96 from Springer
Abstract:
Abstract Audio-driven 2D avatar video generation algorithms have a wide range of applications in the media field. The technology of generating 2D avatar videos with only the input of compliant audio and images has been a positive boost to the development of online media and other fields. In such generation algorithms, the accurate coupling of speech audio and appearance changes such as faces and gestures in subtle movements has been a point of continuous improvement, with appearance changes moving from an early focus on matching speech content only to starting to incorporate human emotions expressed by speech. There has been a significant improvement in fidelity and synchronization compared to the early experimental results of the study, and the behavioral performance of the 2D avatars in the generated videos is getting closer to that of humans. This paper provides an overview of existing audio-driven 2D avatar generation algorithms and classifies their tasks into two categories: talking face generation and co-speech gesture generation. Firstly, the article describes the task specifically and describes its application areas. Secondly, we analyze the core algorithms in order of technological advancement and briefly describe the performance effects of the methods or models. Thirdly, we present common datasets for both types of tasks as well as evaluation metrics and compare the performance metrics of some recently proposed algorithms. Finally, the paper discusses the opportunities and challenges faced by the field and gives future research directions.
Keywords: audio-driven; 2D avatar; talking face generation; co-speech gesture generation; deep learning (search for similar items in EconPapers)
Date: 2023
References: Add references at CitEc
Citations:
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:lnopch:978-981-99-3618-2_9
Ordering information: This item can be ordered from
http://www.springer.com/9789819936182
DOI: 10.1007/978-981-99-3618-2_9
Access Statistics for this chapter
More chapters in Lecture Notes in Operations Research from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().