Segmenting dance video into short movements is a popular way to easily understand dance choreography.
However, it is currently done manually and requires a significant amount of effort by experts.
That is, even if many dance videos are available on social media (e.g., TikTok and YouTube), it remains difficult for people, especially novices, to casually watch short video segments to practice dance choreography. In this paper, we propose a method to automatically segment a dance video into each movement. Given a dance video as input, we first extract visual and audio features: the former is computed from the keypoints of the dancer in the video, and the latter is computed from the Mel spectrogram of the music in the video. Next, these features are passed to a Temporal Convolutional Network (TCN), and segmentation points are estimated by picking peaks of the network output. To build our training dataset, we annotate segmentation points to dance videos in the AIST Dance Video Database, which is a shared database containing original street dance videos with copyright-cleared dance music. The evaluation study shows that the proposed method (i.e., combining the visual and audio features) can estimate segmentation points with high accuracy. In addition, we developed an application to help dancers practice choreography using the proposed method.
Koki Endo, Shuhei Tsuchida, Tsukasa Fukusato, and Takeo Igarashi. 2024.
Automatic Dance Video Segmentation for Understanding Choreography. In
9th International Conference on Movement and Computing (MOCO ’24), May
30-June 2, 2024, Utrecht, Netherlands. ACM, New York, NY, USA, 9 pages.
https://doi.org/10.1145/3658852.3659076