LEI Data GLEIF Data Quality Management

Questions and Answers



The questions and answers below provide detailed information on the principles applied to monitor, assess and continuously improve the level of data quality within the Global Legal Entity Identifier (LEI) System. Data quality is measured based on clearly defined quality criteria developed by the Global Legal Entity Identifier Foundation (GLEIF) in close dialogue with the LEI Regulatory Oversight Committee and the LEI issuing organizations.

How is the LEI Total Data Quality Score calculated?

The total data quality score of the data quality criteria takes the average of the individual quality scores. This average is not weighted by data quality criteria, meaning that each data quality criteria contributes equally to the total data quality score. The LEI Total Data Quality score (\(TQ_s\)) is therefore:

$$TQ_s=\frac{\sum_{s=1}^{N}Q_s}{N}$$

Where:

  • \(TQ_s\) is the total data quality score.
  • \(s\) in the summation is an index representing individual quality criteria.
  • \(Q_s\) is the quality score for each respective quality criterion.
  • \(N\) is the number of quality criteria for which there are checks implemented.

For more details please see chapter 2 in the Global LEI Data Quality Report Dictionary.

What is the definition of each of the data quality criteria applied to measure the level of data quality in the Global LEI System?
Accesibility Data items that are easily obtainable and legal to access with strong protections and controls built into the process.
Accuracy The extent to which the data are free of identifiable errors; the degree of conformity of a data element or a data set to an authoritative source that is deemed to be correct; and the degree to which the data correctly represents the truth about real-world objects.
Completeness The degree to which all required occurrences of data are populated.
Comprehensiveness All required data items are included - ensures that the entire scope of the data is collected with intentional limitations documented.
Consistency The degree to which a unique piece of data holds the same value across multiple data sets.
Currency The extent to which data is up-to-date; a data value is up-to-date if it is current for a specific point in time, and it is outdated if it was current at a preceding time but incorrect at a later time.
Integrity The degree of conformity to defined data relationship rules (e.g. primary/foreign key referential integrity).
Provenance History or pedigree of a property value.
Representation The characteristic of data quality that addresses the format, pattern, legibility, and usefulness of data for its intended use.
Uniqueness The extent to which all distinct values of a data element appear only once.
Validity The measure of how a data value conforms to its domain value set (i.e. a set of allowable values or range of values).
How are the top five failing checks identified?

The top five failing checks are those data quality checks that failed most in the reporting month. If there are no failed checks this table will remain empty. If less than five distinct checks have been failed, only those which have been failed will be listed.

How is the country heat map, which is identified within the Global Data Quality Reports, calculated?

The quality scores per country are based on the Entity.LegalAddress.Country field of the individual LEI records in each country (as per the ISO-3166 standard).

The colors represented in the heatmap show the overall data quality score achieved by all LEI issuing organizations, which issue LEIs in the respective country:

Red (equal or less than 90%); orange (above 90% and equal or less than 95%); yellow (above 95% and equal or less than 98%); green (above 98% and equal or less than 100%).

The formula for the calculation of the quality scores for individual countries is similar to the total data quality scores. This means that it takes into consideration the average of the quality criteria:

$$TQ country=\frac{\sum_{i=1}^{N country}q_i,country}{N country}$$

Where:

  • \(TQ country\) is the total data quality score for a given country.
  • \(q_i,country\) is the check result for a given country:

    \(q_i,country\) { (1 if check is "success" or "not applicable" - 0 if check is "failed")

  • \(N country\) is the number of checks performed for the respective country.
What do the quality maturity levels express?

Maturity levels define the evolution of improvements in processes associated with what is measured. Therefore, the total maturity level score is aggregated differently from the total data quality score: While the scoring rules for the individual maturity levels apply in the same fashion, the scores for higher maturity levels will only contribute to the total score if the previous maturity level is fully reached (i.e. 100% score).

The following maturity levels apply:
Level 1 – ‘Required Quality’ (must be 100 percent for all data records).
Level 2 – ‘Expected Quality’ (should be 100 percent).
Level 3 – ‘Excellent Quality’ (the higher the better).

Does GLEIF make available specific documentation, which details the principles governing the data quality management program?

Yes. The technical documentation, which outlines the quality criteria applied, checks performed, as well as the calculation models, is available here.