Dance Revolution: Long-Term Dance Generation with Music via Curriculum Learning

Music-Conditioned 2D Dance Choreography

Dance Revolution: Long-Term Dance Generation with Music via Curriculum Learning

  • Ruozi Huang*1
  • Huang Hu*2
  • Wei Wu3
  • Kei Sawada4
  • Mi Zhang1
  • Daxin Jiang2

1Fudan University      2Microsoft STCA      3Meituan      4Rinna AI

(3,4Work done while at Microsoft STCA)

We present a transformer-LSTM hybrid architecture for music-to-dance synthesis and propose a novel curriculum learning strategy, i.e., dynamic auto-condition learning approach, to alleviate the severe exposure bias issue in long-term dance motion sequence generation with music. Besides, we also release a new 2D dance dataset, which contains three-style dance motions (i.e., ballet, hiphop and jp-pop) extracted from the real dance videos available online. Our model generates realistic and smooth dance motions, some of which are in harmony with the music from test set in terms of style and beat-match. The generated motion sequences can last one minute under 15 frame per second (FPS). With the help of 3D human pose reconstruction and animation software, this technique can be used to drive various 3D character avatars, such as the 3D avatar of Hatsune Miku, and has the great potential for the virtual advertisement video generation.

Paper Overview

More Dance Generation Samples

Here we show more generated dance samples from the music clips on test set using the released pre-trained generation model.


As can be seen, the first sample and second one are generated from the same music clip while their produced choreographies are different patterns. So does the case of the third and the fourth samples.



2D Dance Motion Dataset

The statistics of the collected 2D dance motion dataset is presented below.

Category # of Clips (1min) FPS Resolution
Ballet 136 15 720p
Hip-hop 298 15 720p
Japanese Pop 356 15 720p


The pre-trained music-conditioned dance generation model is released at our GitHub Repo. Please refer to the corresponding instructions.

Supplementary Technical Report

We write a supplementary technical report for the music-to-dance synthesis task and analysis of Dancing to Music, the primary work about this topic published on NeurIPS 2019.

More Thanks

We thank Kazuna Tsuboi, Sayuri Nishida, Shuo Wang, Ke Chen, Chengcheng Liu and Zhan (Cliff) Chen for the generous support, insightful discussion and kind help on this research project. We also thank Kazuna Tsuboi for authorizing us the usage of her portrait in synthesized demo video. Besides, many thanks to the authors of OpenPose, Dancing2Music, LongFormer, Video-to-Video and 3D Human Pose Reconstruction. This website is inspired by the template of pixelnerf and special thanks to the author.

Dance Revolution: Long-Term Dance Generation with Music via Curriculum Learning assets are Copyright 2021 @ Microsoft Corporation, licensed under the MIT license.