DTop of Form 1
Integrity Characteristics
Integrity defects occur when a data structure is incorrect or unreliable. Data integrity means that the data structure has all properties necessary to provide a reliable and trustworthy view of the business. Desirable data integrity qualities include:
1. Identity Integrity. Every occurrence of a real world object, and every row of a warehouse table, is uniquely identifiable.
2. Referential Integrity. Navigation of "dead end" relationships never occurs.
3. Cardinal Integrity. The number of participants in any relationship complies with business rules.
4. Value Set Integrity. No data element contains a meaningless value.
5. Data Dependency Integrity. Dependencies among values and dependencies among relationships comply with business rules.
Correctness Characteristics
Correctness defects occur when the data content is incorrect or unreliable. Data correctness means that the data content has all properties necessary to provide a reliable and trustworthy view of the business. Larry English (Improving Data Warehouse and Business Information Quality, John Wiley & Sons, 1999) describes many of the desirable data correctness qualities including:
1. Completeness. Needed data is present to provide a full picture of the business.
2. Validity. Data values and combinations of values have business meaning in a specific context, and at a particular point in time.
3. Accuracy. Data represents a true and factual view of the real world objects that it describes.
4. Precision. Data is sufficiently detailed and granular to meet business needs.
5. Consistency. Redundant data sources do not produce conflicting facts.
Data Quality improvements are achieved through data cleansing. Defect prevention requires that the data is audited, filtered, and corrected as it is loaded into the data warehouse. This process is the subject of a comprehensive industry specialty.
The Six Sigma Way: Defining Quality in terms of metrics which describe Customer Needs
CTQ: Data
Data Integrity and Data Correctness
We will measure defects in terms of integrity and correctness. Data integrity refers essentially to the structure in which the data is housed – these are the data base specialist areas of identity, reference, cardinality, value, and data dependency integrity issues. Data Correctness means the degree to which the data can be used as a reliable and trustworthy source for business use, e.g. completeness, validity, accuracy, precision, consistency.