OpenAI’s Sora Can Generate Videos From Text And Still Images


Sora, a product of OpenAI is changing the landscape of generative AI in terms of creating video from text.

It is said to be capable of generating entire videos all at once or extending generated videos to make them longer. By providing the right prompts for each video at a time, Sora eliminates the problem of making sure a subject stays the same even when it goes out of view temporarily.

Here’s some amazing features of Sora:

*It is said to generate videos up to a minute long while maintaining visual quality and adherence to the user’s prompts.

* It can also create multiple shots within a single generated video that accurately persist characters and visual style. This is good for generating stories that involves persistent characters.

*It is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background.

*In addition to being able to generate a video solely from text instructions, the model is able to take an existing still image and generate a video from it, animating the image’s contents with accuracy and attention to small detail. The model can also take an existing video and extend it or fill in missing frames.

Certain pitfalls include;

Sora’s “current model may struggle with accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect. For example, a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark,” as seen on their website.

Also, it may “confuse spatial details of a prompt, for example, mixing up left and right, and may struggle with precise descriptions of events that take place over time, like following a specific camera trajectory.”

The company stated that it has made red teamers available to assess critical areas for harm or risk. “We are also granting access to a number of visual artists, designers, and filmmakers to gain feedback on how to advance the model to be most helpful for creative professionals.”

“We’re sharing our research progress early to start working with and getting feedback from people outside of OpenAI and to give the public a sense of what AI capabilities are on the horizon.”

Safety Concerns

OpenAi’s says they’ll be taking important safety steps ahead of making Sora available in openAI’s product. “ We are working with red teamers — domain experts in areas like misinformation, hateful content, and bias — who will be adversarially testing the model.”says openAI

Misleading content are a thing in the world with the deepfakes and the likes. Open AI says they’re building tools to help detect misleading content such as a detection classifier that can tell when a video was generated by Sora.

They also plan to include C2PA metadata in the future if they deploy the model in an OpenAI product.

To further beef up security, they intend to leverage the existing safety methods that was built for their products that use DALL-E3, which is applicable to Sora as well.

“For example, once in an OpenAI product, our text classifier will check and reject text input prompts that are in violation of our usage policies, like those that request extreme violence, sexual content, hateful imagery, celebrity likeness, or the IP of others. We’ve also developed robust image classifiers that are used to review the frames of every video generated to help ensure that it adheres to our usage policies, before it’s shown to the user.”

“We’ll be engaging policymakers, educators and artists around the world to understand their concerns and to identify positive use cases for this new technology. Despite extensive research and testing, we cannot predict all of the beneficial ways people will use our technology, nor all the ways people will abuse it. That’s why we believe that learning from real-world use is a critical component of creating and releasing increasingly safe AI systems over time,” OpenAI says.

Sora serves as a foundation for models that can understand and simulate the real world, a capability they believe will be an important milestone for achieving AGI.