Dynamic Fashion Video Synthesis from Static Imagery

Islam, Tasin; Miron, Alina; Liu, Xiaohui; Li, Yongmin

Dynamic Fashion Video Synthesis from Static Imagery

Tasin Islam, Alina Miron, Xiaohui Liu and Yongmin Li ()
Additional contact information
Tasin Islam: Department of Computer Science, Brunel University London, London UB8 3PH, UK
Alina Miron: Department of Computer Science, Brunel University London, London UB8 3PH, UK
Xiaohui Liu: Department of Computer Science, Brunel University London, London UB8 3PH, UK
Yongmin Li: Department of Computer Science, Brunel University London, London UB8 3PH, UK

Future Internet, 2024, vol. 16, issue 8, 1-21

Abstract: Online shopping for clothing has become increasingly popular among many people. However, this trend comes with its own set of challenges. For example, it can be difficult for customers to make informed purchase decisions without trying on the clothes to see how they move and flow. We address this issue by introducing a new image-to-video generator called FashionFlow to generate fashion videos to show how clothing products move and flow on a person. By utilising a latent diffusion model and various other components, we are able to synthesise a high-fidelity video conditioned by a fashion image. The components include the use of pseudo-3D convolution, VAE, CLIP, frame interpolator and attention to generate a smooth video efficiently while preserving vital characteristics from the conditioning image. The contribution of our work is the creation of a model that can synthesise videos from images. We show how we use a pre-trained VAE decoder to process the latent space and generate a video. We demonstrate the effectiveness of our local and global conditioners, which help preserve the maximum amount of detail from the conditioning image. Our model is unique because it produces spontaneous and believable motion using only one image, while other diffusion models are either text-to-video or image-to-video using pre-recorded pose sequences. Overall, our research demonstrates a successful synthesis of fashion videos featuring models posing from various angles, showcasing the movement of the garment. Our findings hold great promise for improving and enhancing the online fashion industry’s shopping experience.

Keywords: diffusion models; fashion synthesis; generative AI; image-to-video synthesis (search for similar items in EconPapers)
JEL-codes: O3 (search for similar items in EconPapers)
Date: 2024
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/1999-5903/16/8/287/pdf (application/pdf)
https://www.mdpi.com/1999-5903/16/8/287/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jftint:v:16:y:2024:i:8:p:287-:d:1452652

Access Statistics for this article

Future Internet is currently edited by Ms. Grace You

More articles in Future Internet from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().