Artificial Intelligence - SEA-LION

Singapore regional AI model for Southeast Asia trained in 11 languages

While ChatGPT and Bard continue to bask in the spotlight, a group of researchers in Singapore is keen on developing large language models (LLMs) trained primarily on Southeast Asian data.

Dubbed SEA-LION, the artificial intelligence (AI) model is designed to be an alternative to mainstream LLMs but tailored for Southeast Asia. The generative AI model is trained with data from 11 local languages, including Bahasa Indonesia, Vietnamese, and Thai while paying special attention to the region’s culture and traditions.

The project, primarily funded by Singaporean authorities, seeks to improve adoption metrics for AI among enterprises and individual users in the region. Previous attempts to use OpenAI’s ChatGPT have resulted in unclear outputs stemming from the difference in training language and local dialects.

“We are not trying to compete with the big LLMs,” said Leslie Teo, senior director for AI products at AI Singapore. “We are trying to complement them, so there can be better representation of us.”

Mainstream LLMs are typically trained in English, but despite the reach of the language, nearly 50% of the global population cannot access the full potential of generative AI chatbots. To solve the challenge, governments are scrambling datasets in their local languages to design tailor-made chatbots to complement existing offerings.

“Regional LLMs are also needed because they support technology self-reliance,” said Oklahoma State University Assistant Professor Nuurrianti Jalli. “Less reliance on Western LLMs could provide better privacy for local populations, and also align better with national or regional interest.”

SEA-LION is expected to have an instant impact among Southeast Asians, particularly local enterprises pivoting to AI. Indonesian startup Tokopedia’s Associate Vice President of Data Science, Paul Condylis, notes that the LLM model would be an integral addition in connecting, improving, and personalizing customers’ experiences.

Southeast Asia has built an impressive reputation for embracing emerging technologies at par with North America and Europe. Alongside AI, the region is opening its borders to blockchain technology with applications in finance, logistics, tourism, gaming, and entertainment.

The downsides of regional LLMs

While regional LLMs have been hailed for their localization traits, experts have uncovered streaks of bias and censorship in their usage. There are also palpable fears that the local AI system may fail to contain enough information about global world views, portraying a “revisionist view of history.”

“The models may fail to surface important socio-political issues like human rights abuse, corruption, or valid criticism of political powers,” said Jalli.

Others have pointed to the use of regional LLMs by authoritarian governments to crack down on dissent and oppress minorities. To ensure that LLMs reflect the cultural nuances of the people and remain neutral in outputs, experts are pushing for the use of high-quality training data devoid of bias and anti-democratic tendencies.

In order for artificial intelligence (AI) to work right within the law and thrive in the face of growing challenges, it needs to integrate an enterprise blockchain system that ensures data input quality and ownership—allowing it to keep data safe while also guaranteeing the immutability of data. Check out CoinGeek’s coverage on this emerging tech to learn more why Enterprise blockchain will be the backbone of AI.

Watch: Artificial intelligence needs blockchain

YouTube video

New to blockchain? Check out CoinGeek’s Blockchain for Beginners section, the ultimate resource guide to learn more about blockchain technology.