How Banco de Portugal Strengthens LEI Data Quality at Scale
Ana Sofia Afonso, Data Scientist in the Data Management Division at Banco de Portugal, shares a practical approach for maintaining the highest data quality standards at scale – combining AI-enabled cross-checking against authoritative national sources with GLEIF's API-enabled bulk challenge facility.
Author: Ana Sofia Afonso, Data Scientist in the Data Management Division at Banco de Portugal
Date: 2026-03-31
Views:
High-quality Legal Entity Identifier (LEI) data is key to ensuring that organizations globally can trust and be trusted. Yet this quality cannot be realized through ad hoc, one-off manual "clean-ups" that are inconsistent, slow, and costly. Instead, it increasingly demands auditable, repeatable workflows designed to improve quality at scale while reducing manual processes.
Take the challenge of knowing when a lapsed LEI – which indicates that renewal has not occurred on time – should be 'retired' to confirm that the legal entity has ceased operations. How can this be achieved at scale? And, crucially, how can decisions be supported with clear, consistent, and verifiable evidence?
In this blog post, Ana Sofia Afonso, Data Scientist in the Data Management Division at Banco de Portugal, explains how this challenge was addressed. By combining machine learning (ML) and AI-based algorithms with rigorous quality controls and expert validation to identify LEIs that are eligible for retirement, it strengthened data consistency and governance across national and international reference systems. This offers a blueprint outlining how all LEI data users can help to increase timeliness, accuracy, and reliability across the Global LEI System.
Understanding LEIs in a National Reference Data Environment
In Portugal, every resident legal entity must hold a national identifier for legal and fiscal purposes. LEIs, however, are only mandatory in specific regulatory contexts. As a result, overall LEI coverage remains more limited. In addition, LEI lifecycle events are often triggered by external reporting obligations rather than by actual changes in an entity’s legal status.
This creates a structural challenge. As national business registers evolve, LEI data – particularly for entities that stop renewing their reference data – can fall out of sync. Over time, we observed that this presents several recurring issues:
LEIs remaining lapsed after the corresponding entities had become inactive in the national business register;
Inconsistencies between national identifiers recorded in GLEIF and those held by national authorities (the source data for Banco de Portugal's reference data systems);
The need for manual investigations that were time-consuming, difficult to prioritize, and impossible to scale effectively.
Why Lapsed LEIs Require Careful Interpretation
In response to these challenges, we set out to explore an approach to efficiently and effectively improve data quality across the LEI lifecycle and bolster trust in global reference data.
A key insight from our initial analysis was that a lapsed LEI does not mean the associated legal entity is inactive. Non-renewal may simply reflect a change in reporting obligations rather than the termination of a legal entity. Conversely, an entity may already be legally inactive while its LEI is either lapsed or still issued.
Most importantly, we recognized a critical consideration: incorrectly retiring an LEI is worse than not retiring it at all, as it would misrepresent that a legal entity has ceased operations. As a consequence, the entity may be hindered in its ability to trade or carry out its operations more generally. This meant that relying on the 'lapsed' status as an automatic trigger for retirement would introduce significant governance risk, and that any solution, therefore, needed to be conservative, evidence-based, and fully auditable.
As a result, the real challenge was to distinguish between:
a) LEIs that were not renewed but still correspond to active entities, and
b) LEIs associated with entities that are legally inactive in Portugal.
Our Approach: AI in Cross-Checking Against Authoritative National Data
Achieving this distinction reliably required integrating multiple data sources and applying consistent, evidence-based quality controls. Our approach was built around a simple principle: LEI lifecycle decisions must rely on authoritative national information and be executed in a controlled, scalable manner.
To do this, data from GLEIF, external sources, and the national business register are continuously integrated into our reference data environment, providing a consolidated view of entity identity, legal status, and LEI registration status. ML and AI-based algorithms are then applied to standardize entity names and identifiers and to compute similarity scores across datasets, enabling large-scale cross-checking of LEI records against authoritative national sources to identify when updates are required.
Once validated, the updates are then operationalized through GLEIF's API-enabled bulk challenge facility, which significantly reduces manual effort and streamlines our internal processes. At the same time, the facility adds an extra layer of assurance by enabling independent third-party validation of information. This ensures that verified LEI retirements are processed consistently, efficiently, and with full traceability, while avoiding unnecessary ad hoc or manual interventions.
It is also important to note that throughout the workflow, human oversight remains essential. Complex or ambiguous cases are escalated for expert review, ensuring that automation reinforces governance rather than replacing it.
The Results: From Reactive Investigations to Controlled Processes
Applying this approach delivered clear, measurable results.
First, we identified LEIs that were genuinely eligible for retirement, based on verified legal inactivity rather than renewal behavior alone.
Second, we uncovered a substantial number of data quality issues unrelated to retirement, particularly involving identifier accuracy. Resolving these discrepancies improved overall alignment between national reference databases and GLEIF records.
Third, our longitudinal analysis of LEI registration status showed that increases in lapsed and retired LEIs largely reflected authentic entity lifecycle dynamics rather than systemic data degradation. Incorporating this time dimension proved essential for interpreting the data correctly.
Finally, we transitioned from ad-hoc, manual investigations to repeatable, auditable workflows supported by clear criteria and documented outcomes, strengthening both consistency and governance.
Enhancing Data Quality Across the Global LEI System
Beyond the significant operational benefits realized, this approach represents our strong commitment to the Global LEI System. By sharing information in a timely manner and updating LEI reference data outside the standard renewal cycle, we are actively helping maintain the highest data quality standards and ensuring that LEI reference data remains accurate and up to date. This plays a crucial role in promoting trust and transparency across the Portuguese economy and beyond.
Acknowledgments
This work is the result of collaborative teamwork, combining the knowledge, experience, and perspectives of several contributors whose joint efforts made this outcome possible. I would like to express my sincere gratitude to all those involved in the process, whose discussions, feedback, and dedication were fundamental to the development of this work, with a special mention to Maria do Carmo Moreno and Bruno Gonçalo Tenório. The views expressed in this work do not necessarily represent those of the institutions and should be understood solely as the authors’ interpretation and analysis of the subject matter.
If you would like to comment on a blog post, please identify yourself with your first and last name. Your name will appear next to your comment. Email addresses will not be published. Please note that by accessing or contributing to the discussion board you agree to abide by the terms of the GLEIF Blogging Policy, so please read them carefully.
Ana Sofia Afonso is a Data Scientist in the Data Management Division at Banco de Portugal. She holds a Master of Science in Finance. Ana Sofia specializes in converting complex, fragmented data into reliable insights for statistical production and strategy. Her work spans Python and SQL, data pipelines, analytics and visualization, and increasingly advanced statistics, machine learning, feature engineering, and modern data-engineering practices to improve model quality, workflow efficiency, and data reliability.