Skip to content

To a High IQ!
Information Content Quality: Assessing the Quality of the Information Product
July 2006: Originally published in IDQ Newsletter Vol 2 Issue 3
by Larry P. English

 

Information Quality1 Characteristics

This is the second in a series of three articles that describes the important Information Quality (IQ) Characteristics.

It is the information consumer who determines what constitutes quality in the information they require to perform their work. Armand Feigenbaum confirms this: “Quality is what the customer says it is.” “Quality is a customer determination, not an engineer's determination, not a marketing determination, or a general management determination. It is based upon the customer's actual experience with the product or service, measured against his or her requirements — stated or unstated, conscious or merely sensed, technically operational or entirely subjective — and always representing a moving target in a competitive market.” 2

For Information, this means “Information Quality is a knowledge worker's determination, not a systems developer's determination, not a business liaison's determination, not an IT manager's determination, nor an information producer manager's determination. It is based on the knowledge worker's actual experience with the information, measured against his or her requirements — stated or unstated, conscious or merely sensed, technically operational or entirely subjective — and always representing a moving target in a complex business environment.”

Information Quality Components

As mentioned in last quarter's article, there are three fundamental components of Information Quality Characteristics:

1. Information Definition Quality

Information producers and knowledge workers alike must know the meaning of information; otherwise they cannot perform their work properly. Information producers must also know the business rules, valid values, and formats to create information correctly.

Information definition is to data (content) what manufacturing product specifications are to the manufactured product.3 Quality “Information Product Specifications” are necessary for the consistent production of quality information.

2. Information Content Quality

Business processes that “create” or “update” data produce the “raw materials” of information. Such processes must “create” and “update” data properly to achieve Information Content Quality Characteristics such as completeness, validity, accuracy, and precision, among others.

3. Information Presentation Quality

When data is retrieved, formatted, aggregated, combined with other data, and presented to knowledge workers, it becomes a finished “information product.” Presentation quality characteristics include accessibility, timeliness, presentation intuitiveness, and objectivity, among others.

In this issue we address Information Content Quality characteristics of the data, whether residing in a database or presented to a knowledge worker.

Information as Data “Raw Material” and “Finished Information Product”

Just as a manufacturing business has raw materials and finished goods, so also does information start from raw materials of data that are transformed (summarized, aggregated, or combined with other data) before information is presented as a finished product.

While most manufacturing firms buy raw materials or components and use these to produce their products, most of the data required by an organization are captured as part of the organization's own processes, while some are purchased as “raw material” from information brokers.

Knowledge workers require several quality characteristics to ascribe quality to information content. Over and above content quality characteristics, knowledge workers also have presentation quality expectations, which we will discuss in the next issue.

1. Quality Characteristics of Information Content

The major information content (data values) quality characteristics include:

  • Definition conformance. Data values are consistent with the attribute (fact) definition
  • Completeness. Each process or decision has all the information it requires.
  • Record completeness. A record exists for every real world object or event the enterprise needs to know about
  • Value completeness. A given data element (fact) has a value stored for all records that should have a value
  • Validity. Data values conform to the information product specifications
    • Value validity. A data value is a valid value or within a range of valid values for this data element.
    • Business rule validity. Data values conform to the specified business rules.
    • Derivation validity. A derived or calculated data value is produced correctly according to a specified calculation formula or set of derivation rules. If the base values are accurate, and the calculation is correctly performed, then result will be Accurate.
  • Accuracy. Data values are correct.
    • Accuracy to surrogate source. The data agrees with an original, corroborative source record of data, such as a notarized birth certificate, document, or unaltered electronic data received from a party outside the control of the organization that is demonstrated to be a reliable source.
    • Accuracy to reality. The data correctly reflects the characteristics of a real-world object or event being described. Accuracy and precision represent the highest degree of inherent information quality possible5.
  • Precision. Data values are correct to the right level of detail, such as price to the penny or weight to the nearest tenth of a gram.
  • Non-duplication. There is only one record in a database representing a given real-world object or event.
  • Source warranties/credentials. The source of information: (1) guarantees the quality of information it provides; (2) documents its certification in its information quality management capabilities to capture, maintain, and deliver quality information; or (3) provides objective and verifiable measures of the quality of information it provides in agreed-upon quality characteristics.
  • Equivalence of redundant or distributed data. Data in one database is semantically equivalent to data about the same objects or events in another database.
  • Concurrency of redundant or distributed data. The information float or lag time is minimal between (a) when data is knowable (created or changed) in one database to
    (b) when it is also knowable in a redundant or distributed database, and concurrent queries to each database produce the same result.

2. Measuring Information Quality

Quality characteristics require different measurement techniques. Some characteristics can be measured electronically using software. Other characteristics (like accuracy) require a physical comparison of data to the real world object or recording of an event:

  • Definition conformance. Data values are consistent with the attribute (fact) definition.

Measurement: Electronic or human inspection—

If the attribute is a date, the values should be dates within a range of reasonability for the fact represented. Similarly, address attributes should be addresses.

Codes and code value definitions should be consistent with the defined meaning of the Classification type represented by the attribute.

  • Completeness. Each process or decision has all the information it requires.
    • Record completeness. A record for every real world object or event the enterprise needs to know about.

Measurement: Electronic or human inspection—

It is difficult to “know what we do not know.” Missing records on objects and events can be difficult to discover. Missing records can be caused by failing to record information that should be recorded, or deleting records that should not be.

When dealing with objects outside the capability of knowing, we often have to find reliable sources from which to compare our internal data.

Measure the percent of missing records (that you now know about and have added to the database) against the total of the unique existing records plus the missing records that have been discovered and added.

  • Value completeness. A given data element (fact) has a value stored for all records that should have a value.

Measurement: Electronic inspection—

This is one of the easiest assessments. Use profiling tools or simple queries to examine a given data element for missing values. This measure may not be meaningful if the values are truly optional. However, this measurement is especially useful when the values are required.

Some data elements are missing values when records are created because the real world characteristic is missing as well. For example, as long as employees are active, they will have missing values for last date of service.

These kinds of attributes should have business rules specifying when valid values become mandatory (e.g., “last date of service” is mandatory when employee status = “retired”)

  • Validity. Data values conform to the information product specifications.
    • Value validity. A data value is a valid value or within a range of valid values for this data element.

Measurement: Electronic inspection—

Use a simple query to test that a value present for a data element is one of the specified valid values, or that a numeric value is within the range of specified or reasonable values.

  • Business rule validity. Data values conform to the specified business rules.

Measurement: Electronic inspection—

Use an electronic test to execute the specified business rules independently from the process that captures the data.

These business rules may include reasonability tests or correlations of related data to assure that the values conform to those business rules or reasonability.

Note: It is possible for business rules to be incorrect. Sometimes real world data values appear to be outside the expected values, but are in fact correct.

  • Derivation validity. A derived or calculated data value is produced correctly according to a specified calculation formula or set of derivation rules.
    If the base values are accurate, and the calculation is correctly performed, then the result will be Accurate.

Measurement: Electronic inspection—

Derivation validity can be evaluated with independently performed queries that reconstruct the computation or classification.

Note: First confirm the accurate definition of the formula or set of derivation rules. These may become obsolete over time.

  • Accuracy. Data values are correct.
    • Accuracy to surrogate source. The data agrees with an original, corroborative source of data, such as a notarized birth certificate, document, or unaltered electronic data received from a party outside the control of the organization that is demonstrated to be a reliable source.

Measurement: Electronic or human inspection—

This test may be as simple as comparing your electronic data to an external authoritative source (e.g., compare postal service data to addresses.)

Note: For surrogate source accuracy measurement, you must know how accurate the surrogate source is. Postal service data can only be used to assure the correctness of an address, but not that a person is still at that address. Even Change of Address data from the postal service has limitations, such as people not notifying the postal authority that they have moved.

Always understand and document the limitations of any surrogate sources used.

  • Accuracy to reality. The data correctly reflects the characteristics of a real-world object or event being described. Accuracy and precision represent the highest degree of inherent information quality possible

Measurement: Human inspection—

Data is NOT accurate unless it correctly represents the characteristic of a real world object or event. You CANNOT gauge accuracy by electronically inspecting the data. Accuracy can only be measured by comparing the data to the real world object.

Here are a few examples: (a) Ask people, such as employees or customers, to review their own data, that they are able to confirm without bias. (b) Measure objects for characteristics, such as weight, length, width, and height with calibrated measurement devices. (c) Record or observe events to capture data independently from the information capture process. Telephone transaction monitoring is an excellent tool for assessing transaction data.

  • Precision. Data values are correct to the right level of detail

Measurement: Electronic or human inspection—

Depending on the nature of the data, inspection may include comparing the precision of a recording device to a more accurate measurement device. Or it may involve ensuring that numerical data, such as currency exchange rates, are captured to the proper decimal point.

Statistical analysis should always show the confidence level and confidence interval (margin of error) for scientific studies, surveys, or quality assessments of samples of data.

  • Non-duplication. There is only one record in a database representing a given real-world object or event

Measurement: Electronic inspection—

Measurement should be done using several correlating tests to determine whether two records are duplicate occurrences of the same real world object.

The best search algorithms use soft (or fuzzy) matches that allow for transposition errors, typical misspellings, phonetic equivalence, abbreviations versus spelled out names and words, synonyms, etc.

  • Source warranties/credentials. The source of information: (1) guarantees the quality of information it provides; (2) documents its certification in its information quality management capabilities to capture, maintain, and deliver quality information; or (3) provides objective and verifiable measures of the quality of information it provides in agreed-upon quality characteristics

Measurement: Electronic or human inspection—

Note: Some people call “reliability” an IQ characteristic. This subjective measure can change over time. A better measurable characteristic is a written warranty from the information source guaranteeing the quality with a money back offer. One credential is whether or not a trusted certification authority has certified the source's information quality processes. Another way to verify is through an objective measurement of the information quality in the various important characteristics.

  • Equivalence of redundant or distributed data. Data in one database is semantically equivalent to data about the same objects or events in another database.

Measurement: Electronic inspection—

Measuring this is easy if there is minimal transformation from one database to another and the databases retain a common primary identifier.

If there is transformation, you must define your tests to compare the valid values of one database to the comparable representation values in the other database.

If the databases do not share primary identifiers, then you must first perform duplicate matching in an attempt to identify equivalent records in the two databases. Only then can you compare one data element to another data element to verify that the values mean the same thing across the two databases.

  • Concurrency of redundant or distributed data. The information float or lag time is minimal between (a) when data is knowable (created or changed) in one database to
    (b) when it is also knowable in a redundant or distributed database, and concurrent queries to each database produce the same result

Measurement: Electronic inspection—

Electronic tests measuring information float usually involve record tagging (so you can measure the exact amount of time from when a record was created or updated in one database to when it was loaded in another).

An alternative is to measure the elapsed time from when records are first created in the first database, to the point in time the extract of records is fully loaded to the downstream database.

Conclusion

Knowledge workers have different quality requirements for the information they depend on to perform their jobs. Information quality professionals must understand those requirements and measure the right things. Information producers must know those requirements to “consistently meet all information consumers' needs.”

For more about information content quality characteristics and how to measure them, please see Chapter 6, “Information Quality Assessment” in Improving Data Warehouse and Business Information Quality.4

To a High IQ!!! Let me hear about your experiences at Larry [dot] English [AT] infoimpact [dot] com


1 The terms “Information Quality” and “Information Quality Management” are used as synonyms with “Data Quality” and “Data Quality Management.”

2 Armand Feigenbaum, Total Quality Control,3rd ed. rev. New York: McGraw-Hill, 1991, p. 7.

3 Larry P. English, Improving Data Warehouse and Business Information Quality, NY: John Wiley, 1999, p. 84.

4 Ibid., pp. 137-197.


© 2006 INFORMATION IMPACT International, Inc.


About the Author

Larry English's photo

Larry P. English, Co-Founder of the IAIDQ, is President and Principal of INFORMATION IMPACT International, Inc., Brentwood, TN., and author of Improving Data Warehouse and Business Information Quality, called “the Information Quality Bible for the Information Age,” by Masaaki Imai, the creator of the Kaizen quality system of continuous process improvement. English's new book, Information Quality Applied: Best Practices for Information, Processes and Systems, has received rave reviews from readers.

A highly rated keynote speaker, Mr English has traveled the world, racking up more that 7 million miles, giving more than 500 keynote or featured presentations on the impact of the Information Age to groups ranging from Fortune 500 Executives to academic groups such as MIT's IQ Industry Symposium and Fordham University's Deming MBA Scholars Program directed by Dr Joyce Orsini, who calls Mr English the most authoritative interpreter of Deming's Management Theory to Information Quality.

Mr English can be reached at Larry [dot] English [AT] infoimpact [dot] com.

Full biography.