Alibaba new AI-based video generation tool to rival early movers

Getting your Trinity Audio player ready...

Alibaba Cloud, a subsidiary of Chinese technology giant Alibaba (NASDAQ: BABA), has announced a new text-to-video generator based on artificial intelligence (AI).

The new AI model, dubbed I2VGen-xl, has shown proficiency in generating high-quality videos from various sources, according to available data from GitHub. Besides visually striking videos, the model’s creations are described as “semantically accurate,” reducing the chances of errors, hallucinations, or sycophancy.

“VGen can produce high-quality videos from the input text, images, desired motion, desired subjects, and even the feedback signals provided,” read the GitHub announcement.

Described as an open-source video generation codebase, VGen allows users to train their text-to-video models. By executing a simple command using Python, VGen users can train custom models and perform inference in a seamless process for efficiency.

The repository supports compositional video synthesis with motion controllability and instruction with human feedback and scaling T2V while featuring several pre-trained models for multiple tasks.

“It also offers a variety of commonly used video generation tools such as visualization, sampling, training, inference, join training using images and videos, acceleration, and more,” read the statement.

VGen achieves its advanced features via its massive training data comprising 6 billion text-to-image pairs and 35 million text-to-video pairs, per the announcement. The fallout from the model’s deep pool of training data is its versatility and increased accuracy across several use cases.

The team behind the model’s development has released the technical papers and an official webpage to introduce the model to researchers. Users can access pre-trained models and code for generating 1280×720 pixel videos, putting it on par with existing offerings.

In the future, the team says it will unveil new models specifically designed to generate videos of human bodies and an updated version for motion capture.

Alibaba moves forward with emerging technologies

Alibaba’s foray into AI has seen it launch a large language model (LLM), Tongyi Qianwen, to compete with Meta’s (NASDAQ: META) Llama 2. Not resting on its laurels, the company introduced its “Animate Anyone” offering designed to generate videos from static photos via its proprietary ReferenceNet framework.

A partnership with Web3 firm Avalanche in early 2023 saw Alibaba enter the metaverse despite its previous stance on blockchain technology. The raging semiconductor cold war between the U.S. and China has since slowed Alibaba’s march in AI and quantum computing as the company peers inward for new solutions. Alibaba introduces a new AI-based video generation tool to rival early movers.

In order for artificial intelligence (AI) to work right within the law and thrive in the face of growing challenges, it needs to integrate an enterprise blockchain system that ensures data input quality and ownership—allowing it to keep data safe while also guaranteeing the immutability of data. Check out CoinGeek’s coverage on this emerging tech to learn more why Enterprise blockchain will be the backbone of AI.

Watch: AI truly is not generative, it’s synthetic

Tagged:

AIAlibabaAlibaba CloudAritficial intelligenceI2VGen-xlLarge Language Model

Recommended for you

Quantum-proofing blockchains: How much of a problem is it?

Chinese researchers introduced EQAS, a modular system that aims to heighten information security by separating data storage from verification.

By Jon Southurst

July 8, 2025

Kyrgyzstan’s hydro-powered ‘crypto’ mining: A low-cost solution

As miners grapple with post-halving economics and network difficulty, Kyrgyzstan’s low-cost, green energy model could reshape global hash rate distribution.

By Jacob Rozen

July 4, 2025