OpenAI announces new text-to-video tool: Sora

OpenAI, the company behind ChatGPT, has announced a new text-to-video AI tool. Sora is an AI model that can create realistic and imaginative scenes from text instructions.

What can Sora do?

OpenAI says that Sora can generate videos up to a minute long based on a text description. The company also says that the AI tool will maintain visual quality and will follow the user’s prompt throughout the video. In addition, Sora can generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. OpenAI claims that the AI model understands what the user has asked for in the prompt and how those things exist in the physical world.

Multiple shots

OpenAI goes on to say that Sora has a deep understanding of language. The company says that this means it can accurately interpret prompts and generate compelling characters that express vibrant emotions. In addition, Sora can also create multiple shots within a single generated video. The sample videos shared by OpenAI have cuts and shot size changes, however, these haven’t been edited. Sora generated the videos with a selection of different shots.

Detailed descriptions

Some of the Sora sample videos shared by OpenAI are based on detailed text descriptions. For example, one prompt was “A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.” The resulting 59 second video is stunning.

Realistic videos

Sora can also generate very realistic videos with more general prompts. Another of the samples came from the description “A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.” This suggests that the AI model understands a wide range of factors including the age of a human, the nature of 35mm film and what a movie trailer is. There is also a drone shot video of Big Sur based on emotional prompts such as: “This is a view that captures the raw beauty of the coast and the rugged landscape of the Pacific Coast Highway.”

Animation

Open AI has also shared videos demonstrating that Sora can generate animations in different styles. There is a “Monsters Inc” style video of a creature by a candle which came from a prompt using terms such as “The art style is 3D and realistic, with a focus on lighting and texture.” Another 20 second video was generated from the simple prompt “A gorgeously rendered papercraft world of a coral reef, rife with colorful fish and sea creatures.” However, it includes a tracking shot into the reef then close-ups of fish, coral, turtles, and a seahorse. The video also features shallow depth of field shots.

What can’t Sora do?

OpenAI has confirmed that the current Sora model does have some weaknesses. The company said that the AI model can struggle with accurately simulating the physics of a complex scene. It also may not understand specific instances of cause and effect. For example, a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark. In addition, Sora can confuse spatial details of a prompt, such as mixing up left and right. Further, the model may struggle with precise descriptions of events that take place over time, like following a specific camera trajectory.

Can you use Sora?

OpenAI hasn’t made Sora available to use yet as the company is carrying out safety checks. It is working with “red teamers,” who are cybersecurity experts in areas like misinformation, hateful content, and bias. OpenAI is also building tools to help detect misleading content, such as a detection classifier that will show when a video was generated by Sora. This will prevent the creation of deepfake videos using the AI model. In addition, the company says that its text classifier will reject certain text prompts. For example, prompts that request extreme violence, sexual content, hateful imagery, celebrity likeness or the IP of others.

What we think

The videos generated by Sora which OpenAI has shared are very impressive. While not 100% photorealistic some of them are very close. In addition, the AI model appears to have some understanding of the emotional descriptions in the prompts used to generate the videos. Terms such as “stylish,” “raw beauty” and “gorgeously rendered” are contextual and subjective to an extent. It also good to see that OpenAI is taking steps to prevent the creation of deepfakes and offensive content. As Sora improves and develops, the videos it creates can only get better. As such, there need to be controls in place to prevent the misuse of the technology.

Pete Tomkies
Pete Tomkies
Pete Tomkies is a freelance cinematographer and camera operator from Manchester, UK. He also produces and directs short films as Duck66 Films. Pete's latest short Once Bitten... won 15 awards and was selected for 105 film festivals around the world.

Related Content