Meta revealed Movie Gen, its third-wave multimodal video AI, on Friday. It promises to “produce custom videos and sounds, edit existing videos, and transform your personal image into a unique video,” while outperforming similar models like Runway’s Gen-3, Kuaishou Technology’s Kling 1.5, or OpenAI’s Sora.
Meta Movie Gen builds off of the company’s earlier work, first with its multimodal Make-A-Scene models, and then Llama’s image foundation models. Movie Gen is a collection of all of these models — specifically, video generation, personalized video generation, precise video editing, and audio generation — that improves the creator’s fine-grain control. “We anticipate these models enabling various new products that could accelerate creativity,” the company wrote in its announcement post.
For video generation, Movie Gen relies on a 30B-parameter model that outputs up to 16-second clips, albeit at a pokey 16 frames per second (fps). “These models can reason about object motion, subject-object interactions, and camera motion, and they can learn plausible motions for a wide variety of concepts,” Meta said, “making them state-of-the-art models in their category.” Using that same model, Movie Gen can create personalized videos for creators from still images.
Meta employs a variant of that video-generation model that uses both video- and text-based inputs to precisely edit the content that it generates. It can affect both localized edits such as adding, removing, or replacing elements, and global edits like applying a new cinematic style. To generate audio, Movie Gen relies on a separate 13B-parameter model that can create up to 45 seconds of audio — be it ambient background noise, sound effects, or instrumental scores — while automatically syncing that content to the video.
According to Meta’s white paper, Movie Gen consistently won out in A/B tests against other state-of-the-art video AIs including Gen3, Sora, and Kling 1.5 in the category of video generation. It also topped ID-animator in personalized video generation and Pika Labs Sound Gen for audio generation. It also bested Gen3 a second time, in video editing capabilities. Based on the demo videos we’ve seen so far, Movie Gen far outclasses the current batch of free-to-use video generators as well.
The company says it plans to “work closely with filmmakers and creators to integrate their feedback” as it continues to develop these models, but was quick to point out that it has no intention of displacing human creators with AI. ” We’re sharing this research because we believe in the power of this technology to help people express themselves in new ways and to provide opportunities to people who might not otherwise have them,” the company wrote. “Our hope is that perhaps one day in the future, everyone will have the opportunity to bring their artistic visions to life and create high-definition videos and audio using Movie Gen.”