Research on digital media animation control technology based on recurrent neural network using speech technology

Wang, Hui; Sharma, Ashutosh; Shabaz, Mohammad

Research on digital media animation control technology based on recurrent neural network using speech technology

Hui Wang (), Ashutosh Sharma () and Mohammad Shabaz
Additional contact information
Hui Wang: JiaoZuo University
Ashutosh Sharma: Southern Federal University

International Journal of System Assurance Engineering and Management, 2022, vol. 13, issue 1, No 57, 564-575

Abstract: Abstract A vivid and lifelike virtual speaker can attract the user's attention, and the construction of a lifelike virtual speaker not only requires a beautiful static appearance, but also has mouth movements, facial expressions and body movements that are truly synchronized with the voice. Virtual speaker refers to a technology in which a computer generates an animated facial image that can speak. In order to add special effects such as image editing and beautification in the broadcast screen. This paper proposes a voice-driven facial animation synthesis method based on deep BLSTM. A Neural Network BLSTM-RNN Using Audio-Visual Dual Modal Information Training of Speakers, uses the active appearance model to model the face image, and uses the AAM model parameters as Network output, to study the influence of network structure and input of different voice features on the effect of animation synthesis. The experimental results based on the LIPS2008 standard evaluation library show that the network effect with BLSTM layer is obviously better than that of forward network, and the three-layer model structure based on BLSTM—forward- BLSTM 256 node (BFB256) is the best. FBank, fundamental frequency and energy combination can further improve animation synthesis effect. The main aim of this paper is to study the method of speech-driven facial animation synthesis based on deep BLSTM-RNN, and tries the synthesis effect of different neural network structures and different speech features.

Keywords: Neural-mechanisms; Virtual speaker; Facial animation; BLSTM; Recurrent neural network (RNN); Active appearance model (AAM); Speech and language; Convolutional neural networks (CNNs); Language learning; Hierarchical features (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://link.springer.com/10.1007/s13198-021-01540-x Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:ijsaem:v:13:y:2022:i:1:d:10.1007_s13198-021-01540-x

Ordering information: This journal article can be ordered from
http://www.springer.com/engineering/journal/13198

DOI: 10.1007/s13198-021-01540-x

Access Statistics for this article

International Journal of System Assurance Engineering and Management is currently edited by P.K. Kapur, A.K. Verma and U. Kumar

More articles in International Journal of System Assurance Engineering and Management from Springer, The Society for Reliability, Engineering Quality and Operations Management (SREQOM),India, and Division of Operation and Maintenance, Lulea University of Technology, Sweden
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().