Artificial Intelligence (AI) Meets LEI

The digital and globalized economy makes identity verification by businesses and authorities more important and more challenging to do accurately. Transparency of all actors is a prerequisite for any sustainable investment, qualified reporting, or analysis. Transparency starts with the discovery of the entities involved in a transaction. Today, AI algorithms are applied to answer the basic question of “Who am I doing business with?”. This wastes computing resources and only adds to the error associated with downstream objectives like risk analysis. Evolving digital transactions and ecosystems to use the LEI and vLEI to identify and authenticate organizations enhances trust in digital ecosystems and allows for the valuable application of AI algorithms to identify suspicious patterns and evaluate risks.

Identifying and understanding an entity’s legal form is crucial in many financial and business-related processes. Corporations' legal form and structure can inform how to conduct transactions effectively and serve as a risk indicator. The wide range of entity legal forms within and between different jurisdictions has made it challenging for organizations to categorize and structure this information effectively. This task becomes even more difficult due to the similarities in types and textual representation of these legal forms across jurisdictions. Automating the process of identifying an entity’s legal form can, therefore, lower risk, create transparency, and increase operational efficiency by enabling straight-through processing (STP) capabilities.

The LEI Repository provides high-quality, standardized, open-source legal entity data. These are prerequisites for any good data analysis project or an AI model. The currency of the LEI Repository is secured by updating it three times a day. Relying on global standards does not just ensure consistency. It increases the data quality and offers a ready-to-use labeled data set for developing Machine Learning (ML) and AI models.

Legal Entity Name Understanding (LENU)

GLEIF collaborated with Sociovestix Labs to create a machine learning tool that recognizes an entity’s specific legal form and automates its corresponding Entity Legal Form (ELF) code assignment. The ‘Entity Legal Forms (ELF) Code List’ is based on the ISO standard 20275 ‘Financial Services – Entity Legal Forms (ELF)’ and assigns a unique alpha-numeric code of four characters to each entity legal form. An entity's legal form is crucial when verifying and screening organizational identity. However, the wide variety of entity legal forms within and between jurisdictions has made it difficult for large organizations to capture legal forms as structured data. The tool, trained on GLEIF’s Legal Entity Identifier (LEI) database of over two million records, allows banks, investment firms, corporations, governments, and other large organizations to retrospectively analyze their master data, extract the legal form from the unstructured text of the legal name and uniformly apply an ELF code to each entity type, according to the ISO 20275 standard.

The tool, known as Legal Entity Name Understanding (LENU), delivers a range of benefits to both the organization and the broader global marketplace. These include:

Automating the standardization of unstructured data (entity legal form as part of the organization’s name), fostering greater data quality.
Overcoming legal form data classification problems stemming from, for example, language variations and abbreviation inconsistencies and promoting greater insight and transparency into the global marketplace.
Presenting the legal form of an entity in a machine-readable format, which can be utilized by AI tools and in other digitized business processes and applications.
Bypassing the risks and limitations associated with manual engagement with data, including time, inefficiency, human error, and high administrative costs.

By creating richer data sets with improved categorization of legal entities, the tool promotes greater insight and transparency into the global marketplace. It works with the LEI to create a globally consistent data set.

LENU is an open-source Python library accessible on Git Hub. LENU uses the LEI data to build jurisdiction-specific models and allows the user to get a suggestion for a legal form for any given legal name. GLEIF has established a data quality loop in which the legal form suggested by the tool is compared to the ELF code in the current LEI data. In case of clear discrepancies between the model’s results and the current LEI data, GLEIF creates data challenges, which are sent to the LEI issuers for exact verification and update of the data records, where needed. The updated data is then used to build the next version of the models with an improved data source, which ultimately boosts the model’s performance.

LENU utilizes transformer model architecture and BERT base models to process various languages and jurisdictions. The models are also available and ready to use on Hugging Face, where the user will find jurisdiction-specific models tailored for legal form detection.

GLEIF, University of St. Gallen, and Sociovestix Labs summarized their findings in a scientific research paper, “Transformer-based Entity Legal Form Classification”. The study highlights the significant potential of Transformer-based models in advancing data standardization and data integration. Introducing the entity’s legal form via standardized data items adds more confidence to entity linkage tasks, enabling robust mapping pairs across multiple datasets, as each entity can only have one legal form.

LENU Git Hub Repository

Hugging Face Models

Entity Legal Forms (ELF) Code List

Scientific Research Paper

We believe broader adoption of the ELF code standard will significantly enhance transparency while improving data integration tasks in various domains. By making our open-source library freely accessible to the public, we want to facilitate the adoption of ELF codes by entities worldwide. We invite all stakeholders to use it for entity legal form classification.