Veo2 delivers a knockout blow to OpenAI SORA as much as Deepseek

글로벌 AI시장에서 중국의 딥시크(Deepseek)가 뛰어난 성능으로 충격을 준 가운데 비디오 AI 생성 모델 경쟁에서도 오픈AI를 능가하는 모델들이 나오고 있다.

텍스트를 이용한 비디오 생성 모델에서 오픈AI의 소라(Sora)가 많은 이들을 놀래켰지만 AI의 진화는 끝이 없다.

구글 딥마인드가 지난 2024년 12월에 공개한 베오2(Veo2)는 현존하는 텍스트 투 비디오(Text to Video) 모델 중 가장 인간의 눈에 가깝다는 평가를 받고 있다. 텍스트나 비디오를 통해 영상을 만들어 내는 비디오 생성AI 시장에는 피카 2.0(Pika 2.0), 루마 AI(Luma)의 레이2(Ray2), 오픈AI(OpenAI)의 소라(SoRA) 등이 경쟁하고 있다.

While China's Deepseek has shocked the global AI market with its superior performance, the race for video AI generation models has also seen OpenAI outperform the competition.
While OpenAI's Sora has surprised many with its text-to-video model, the evolution of AI is never-ending.

Veo2, unveiled by Google DeepMind in December 2024, is considered the most human-like text-to-video model to date. Other competitors in the video creation AI market include Pika 2.0, Luma's Ray2, and OpenAI's SoRA, which creates videos from text or video.

베오2로 만든 영상(Credited by Paul Trillo)

현재 동영상 생성 모델을 평가하기 위한 공인된 벤치마크나 모두가 동의하는 표준 방법은 없다. 하지만, 이미 품질과 사실, 프롬푸트 준수 및 해석, 카메라 각도 등에서 다른 모든 비디오 생성 모델을 뛰어넘는다는 이야기다.

비디오 생성AI의 가장 큰 문제점은 프롬푸트를 입력할 때마다 다른 결과가 나온다는 것인데, 베오2는이런 불합치를 어느 정도 해소한 것으로 알려졌다. 영화 제작에도 사용할 수 있는 수준이라는 평가도 있다.

Currently, there are no recognized benchmarks or agreed-upon standard methods for evaluating video generation models. However, it already beats all other video generation models in quality and facts, prompt adherence and interpretation, camera angles, and more.

One of the biggest problems with video-generated AI is that it produces different results every time you type in a prompt, but Beo2 has reportedly addressed this inconsistency to some extent. Some say it's even good enough to be used for movie production.