We present a transformer-LSTM hybrid architecture for music-to-dance synthesis and propose a novel curriculum learning strategy, i.e., dynamic auto-condition learning approach, to alleviate the severe exposure bias issue in long-term dance motion sequence generation with music. Besides, we also release a new 2D dance dataset, which contains three-style dance motions (i.e., ballet, hiphop and jp-pop) extracted from the real dance videos available online. Our model generates realistic and smooth dance motions, some of which are in harmony with the music from test set in terms of style and beat-match. The generated motion sequences can last one minute under 15 frame per second (FPS). With the help of 3D human pose reconstruction and animation software, this technique can be used to drive various 3D character avatars, such as the 3D avatar of Hatsune Miku, and has the great potential for the virtual advertisement video generation.
Here we show more generated dance samples from the music clips on test set using the released pre-trained generation model.
As can be seen, the first sample and second one are generated from the same music clip while their produced choreographies are different patterns. So does the case of the third and the fourth samples.
The statistics of the collected 2D dance motion dataset is presented below.
|Category||# of Clips (1min)||FPS||Resolution|
The pre-trained music-conditioned dance generation model is released at our GitHub Repo. Please refer to the corresponding instructions.
We write a supplementary technical report for the music-to-dance synthesis task and analysis of Dancing to Music, the primary work about this topic published on NeurIPS 2019.
We thank Kazuna Tsuboi, Sayuri Nishida, Shuo Wang, Ke Chen, Chengcheng Liu and Zhan (Cliff) Chen for the generous support, insightful discussion and kind help on this research project. We also thank Kazuna Tsuboi for authorizing us the usage of her portrait in synthesized demo video. Besides, many thanks to the authors of OpenPose, Dancing2Music, LongFormer, Video-to-Video and 3D Human Pose Reconstruction. This website is inspired by the template of pixelnerf and special thanks to the author.
Dance Revolution: Long-Term Dance Generation with Music via Curriculum Learning assets are Copyright 2021 @ Microsoft Corporation, licensed under the MIT license.