Friday, May 29, 2026
32.7 C
Lagos

Pleias, GSMA Launch ‘CommonLingua’, Open Source Language Identification Model Supporting 61 African Languages

Pleias and the GSMA have announced the release of CommonLingua, an open-source language identification (LID) model purpose-built to unlock African language data at scale.

It is delivered under the GSMA’s AI Language Models in Africa, by Africa, for Africa initiative, a coalition dedicated to closing the African language gap in AI.

Africa is home to more than 2,000 living languages, many of which remain underrepresented in AI training data. As a result, language identification systems often perform less reliably on African-language content, particularly when distinguishing between closely related or code-mixed text.

Before a Swahili, Yoruba, or Wolof language model can be built, the underlying text must first be correctly identified by language – a step where existing tools currently often fail on African content.

This is because leading LID systems such as fastText, GlotLID, and OpenLID were built around European and Asian high-resource languages and frequently mislabel African-language text as English or French. Even state-of-the-art frontier models drop roughly 30 points in accuracy on African languages compared to major world languages.

CommonLingua is designed to fix this first step of the pipeline. On the new CommonLID

benchmark, CommonLingua achieves 83% accuracy and a macro score F1

of 0.79, outperforming leading LID models by more than 10 percentage points under comparable evaluation conditions, while using roughly one three-hundredth of the parameters.

The model is lightweight at 2 million parameters and shipping as an 8 MB checkpoint, and is designed for efficient deployment, running approximately 20 texts per second on CPU and up to 3,000 texts per second on a single GPU.

CommonLingua covers 334 languages in total, including 61 African languages across eight language families: Bantu (21), Niger-Congo / West African (18), Afro-Asiatic and Semitic (7), Cushitic and Chadic (4), Berber (3), Nilo-Saharan (3), and pidgins, creoles, and other (5).

The model operates directly on UTF-8-byte sequences rather than relying on a language-specific tokenizer, enabling consistent handling across scripts including Latin, Arabic, Ethiopic, N’Ko, and Tifinagh.

“African languages are not an edge case. They are the working languages of hundreds of millions of people, and they deserve AI infrastructure built with the same care as any other language. CommonLingua is deliberately the first brick we are laying: you cannot curate what you cannot identify” said Pierre-Carl Langlais, Co-founder and Chief Technology Officer, Pleias.

The model is trained exclusively on open-licensed and public domain content aggregated through the Common Corpus project, including Wikipedia, Scientific publications in OpenAlex, VOA Africa, WaxalNLP, Cultural Heritage, and Pralekha. All datasets are released under permissive licenses.

 

Louis Powell, Director of AI Initiatives at GSMA added: Closing the gap in African-language AI is fundamental to digital inclusion and unlocking economic opportunity. Progress has long been held back by the lack of foundational infrastructure, beginning with something as essential as language identification. CommonLingua addresses this critical gap, enabling the development of richer datasets and more representative AI systems at scale. Through our initiative, the GSMA is bringing partners together to move beyond fragmented efforts towards shared infrastructure that can power Africa’s digital ecosystem.

This conversation will continue at MWC26 Kigali, where GSMA and partners will bring together industry leaders to accelerate progress on African-language AI. Register now to be part of the discussion.

 

About Pleias

Pleias is a research lab and AI company specialising in open, auditable language models trained exclusively on permissively licensed data. 

Pleias develops the Common Corpus, the largest fully open multilingual pretraining dataset, and the Pleias family of small language models optimised for retrieval, reasoning, and low-resource languages. 

 

About the GSMA 
The GSMA is a global organisation unifying the mobile ecosystem to discover, develop, and deliver innovation foundational to positive business environments and societal change. Our vision is to unlock the full power of connectivity so that people, industry, and society thrive. 

Representing mobile operators and organisations across the mobile ecosystem and adjacent industries, the GSMA delivers for its members across three broad pillars: Connectivity for Good, Industry Services and Solutions, and Outreach.

This activity includes advancing policy; tackling today’s biggest societal challenges; underpinning the technology and interoperability that make mobile work; and providing the world’s largest platform to convene the mobile ecosystem at the MWC and M360 series of events. 

spot_img
spot_img
spot_img
spot_img

Hot this week

Mutual Benefits Delivers Strong 2025 Financial Performance, Record Profit Growth, Balance Sheet Expansion

Mutual Benefits Assurance Plc has announced its audited financial...

Heirs Insurance Group Opens Entry for 5th Essay Championship with ₦11.5m Prizes for Students, Teachers, Schools

Heirs Insurance Group, Nigeria’s fastest-growing insurance group, has opened...

NGX Seeks Cross-Listing Opportunities with Nairobi Securities Exchange

Alhaji (Dr) Umaru Kwairanga, Group Chairman, Nigerian Exchange Group...

NAICOM Issues First Insurtech Licence, Reinforcing Commitment to Innovation, Market Integrity

L-R: Mr. Suleiman Olalekan Ajani, MD/CEO, CBI Insurtech and...

NDIC Reiterates Commitment to Strong Deposit Insurance Funding to Enhance Financial System Stability

L – R: Executive Director, Corporate Services, Nigeria Deposit...

Topics

Mutual Benefits Assurance @ 28: Committed to Deepening Insurance Penetration in Nigeria

L-R: Company Secretary, Mutual Benefits Assurance Plc, Babajide Ibitayo;...

PenOp Partners PenCom on Maximising Potential of RSA Fund VI, Exploring Sharia-Compliant Investment

The Pension Fund Operators Association of Nigeria (PenOp), along...

Continental Re Appoints Kevin Kiambi Mworia to Lead its Kenya Subsidiary

L-R:  Mr. Lawrence Nazare, Group Managing Director, Continental Reinsurance...

Stanbic IBTC Unveils Flex Border to Ease Cross-Border Transactions

Stanbic IBTC Bank Plc, a subsidiary of Stanbic IBTC...

West Africa Needs Advanced Mobile Infrastructure, says IBM Nigeria Boss

Increased investments in relevant new technologies like mobile, cloud and big data analytics that offer advanced solutions and services are essential for West Africa’s socio-economic development. In practically all sectors of the regional economy, these advanced technologies allow economies and companies to ensure the integrity of their data assets, providing them with hitherto unseen levels of data mining capabilities which, allows them to derive fresh insights and business intelligence from these data. These salient points were made recently by Taiwo Otiti, Country General Manager for IBM West Africa, on the sidelines of the HR Leaders Africa Summit held recently in Lagos.

Ecobank, AU MSME Academy for Africa Berths in Nigeria

Spearheaded under the AUDA-NEPAD “100,000 MSMEs by 2021” (100K MSMEs)...

Moniepoint Redefines Nigeria’s Agency Banking via Track Record, Unique Services 

Moniepoint Microfinance Bank (Moniepoint MFB) has reaffirmed its leadership...

Mobile Technology Connects Africa with Global Markets

The growth of mobile phones and services across the...
spot_img

Related Articles

Popular Categories

spot_imgspot_img