Getting your Trinity Audio player ready...
|
Chinese researchers have made significant progress with a new compression technique for large language models (LLMs) to address hardware limitations associated with their deployment.
A paper by researchers from Baichuan Inc. and the Chinese Information Processing Laboratory Institute of Software, Chinese Academy of Sciences, presents a new compression system for LLMs that builds upon previous pruning techniques to save inference costs without additional training. Dubbed ShortGPT, the researchers surmise that the new method provides a solution for users to keep pace with the growing size of parameters for AI models.
Newer generations of LLMs come laden with billions of parameters, pushing the limits of their performance, but come at a steep price during deployment. Typically, researchers and enterprises are plagued with hardware limitations from deploying LLMs, triggering the need for new solutions.
The Chinese-based researchers rely on a new metric, Block Influence (BI), to measure the hidden state transformations in LLMs, removing unwanted parameters based on BI scores. For starters, the system eliminates redundant layers after quantifying and measuring the impact of their removal during inference.
Layers with low scores after the BI assessment tests are pruned to fit hardware requirements. The process goes further to remove layers deemed to have little impact on the capability of the LLM “without compromising model performance.”
“Experiments demonstrate that our method, which we call ShortGPT, significantly outperforms previous state-of-the-art (SOTA) methods in model pruning,” read the paper.
A key feature of the novel ShortGPT is its independence from quantization methods that have an affinity for lowering the precision of models while requiring additional training.
“Moreover, ShortGPT is orthogonal to quantization-like methods, enabling further reduction in parameters and computation,” said the researchers. “The ability to achieve better results through simple layer removal, as opposed to more complex pruning techniques, suggests a high degree of redundancy in the model architecture.”
China’s whole-hearted embrace of AI
China has adopted a positive stance on AI adoption in recent years to match the pace of innovation in the U.S. and Europe. Plans are underway in China to improve the capacities of local AI, blockchain technology, and quantum computing service providers amid a brewing cold war with the United States.
Mainland China’s local AI ecosystem is a beehive of activity, underscored by an avalanche of commercial rollouts of generative AI offerings by technology companies. Despite the forward-leaning posture, Chinese authorities are keen to prevent AI misuse by creating strict regulations and heavy-handed enforcement tactics.
In order for artificial intelligence (AI) to work right within the law and thrive in the face of growing challenges, it needs to integrate an enterprise blockchain system that ensures data input quality and ownership—allowing it to keep data safe while also guaranteeing the immutability of data. Check out CoinGeek’s coverage on this emerging tech to learn more why Enterprise blockchain will be the backbone of AI.
Watch: What does blockchain and AI have in common? It’s data