AI Oct 22, 2025

Controllable Generation of Synthetic Video Data With Human Motion

Controllable Generation of Synthetic Video Data With Human Motion - Vaclav Knapp, Smichov Technical High School (SSPS) As large-scale datasets have proven essential for generalization in AI models—across language, image, and video tasks—many internet-scraped datasets have emerged. In domains lacking such scale, synthetic data has filled the gap. Yet for video understanding—especially human body actions—synthetic generation remains largely untapped. As a result, areas like sign language translation, assistive gesture recognition, and human analysis in autonomous driving have yet to fully benefit. While text-to-video models (e.g., Sora or Veo) offer photorealism, they lack fine-grained control over pose, identity, and environment, limiting their usefulness. We propose a toolkit for generating synthetic human action data, leveraging recent advances in pose transfer and human animation, benchmarked through a custom participant study. Our method integrates into PyTorch pipelines, supporting dataset augmentation or creation. To encourage adoption, we release 100 fully controllable human avatars. On Toyota Smarthome and NTU RGBD, our approach boosts baseline and few-shot action recognition performance by 2–3x across multiple architectures, showing the value of synthetic data for human-centric video tasks.