Evaluating Data Capture Processes
October 2009: Originally published in IDQ Newsletter Vol 5 Issue 4
Jack E. Olson
Editor’s note: This article is an excerpt from the book Data Quality: The Accuracy Dimension, by Jack E. Olson. Readers will find a link to the book at our IAIDQ Bibliography page at http://bibliography.iaidq.org.
Factors to Consider when Evaluating Data Capture Processes
- Time between event and recording
- Distance between event and recording
- Number of handoffs of information before recording
- Availability of all facts
at recording - Ability to verify information
at recording - Motivation of person
doing recording - Skill, training, and experience of person
doing recording - Feedback provided to recorder
- Data value support of
getting it right - Auto-assist in recording process
- Error checking in recording process
Introduction
The process point at which data is captured represents the single most important place data can be made accurate or inaccurate. All data capture points need to be identified and examined. Some data are only captured once. Some are captured and then updated on an exception basis. Some data are captured and the business object updated or enhanced through a workflow process that may occur over a long period of time.
Some of these points may take on multiple forms.
For example, an order may be entered by the actual customer over the Internet, entered by a recording clerk from a form received in the mail, or entered by a company sales representative through a company client server application. This example shows three very different and distinct ways of entering the same business object.
Building a diagram of the data paths of a business object, identifying the distinct points of data capture, and specifying the characteristics of each is a time-consuming but extremely important task.
The sidebar enumerates the characteristics that need to be identified for each data capture or update point. The remainder of this article elaborates further on each item.
Factors to Consider
Any evaluation of data capture processes should consider these factors:
Time between event and recording. In general, the longer the time differences, the greater the chance for errors. If the time lag is long enough, it also lends itself to missing or late information. Examples of long durations are cases in which forms are completed and mailed to a data entry location. The accuracy and timeliness would be enhanced if the time difference were eliminated through a more direct entry, such as through the Internet.
Distance between event and recording. Physical distance can also be a factor. This reduces the opportunity for the person who is entering the data to verify or challenge information. For example, if the originator of data is in Chicago but the information is transmitted via telephone or paper to Kansas City for entry, you have a distance between the person who knows the right information and the one entering it. If there is confusion, the entry person has to either enter nulls or enter a best guess.
Number of handoffs of information before recording. The first person to experience the event is most likely to be the one with the most accurate description of the facts. Each handoff to another person introduces the possibility of misreading written information, misinterpreting someone else’s comments, or not knowing information that was not passed on.
Availability of all facts as recording. If the person entering the information has no access to the event, to the person who created or observed the event, or to databases containing important auxiliary information, they cannot fill in missing information or challenge information they see.
For example, it is better for Human Resources data to be entered with the employee sitting next to the data entry person, as opposed to copying information from a form. Another example is to have a search function for customer identifiers available for order entry personnel.
Ability to verify information at recording. This is similar to the previous issue, but slightly different. Can the data entry person get the correct information if they think the information provided is wrong? In our Human Resource example, the data entry person can call or email the employee if there is confusion. However, there are times when the data capture process makes it impossible to make this connection. Sometimes the process penalizes the data entry person for taking the time to verify questionable information. All entry points should allow for information to be either verified immediately or posted to a deferred process queue for later verification and correction if needed.
Motivation of person doing recording. This is a complex topic with many sides. Are they motivated to enter correct information? Are they motivated and empowered to challenge questionable information? Are they motivated to enter the information at all? Someone entering their own order is motivated to do it and get it right. Someone who is tasked with entering piles of form information that they do not understand could not care less if the information is entered correctly or completely. Is feedback provided? Is their performance measured relative to completeness and accuracy?
Skill, training, and experience of person doing recording. People who enter the same information for a living get to learn the application, the typical content, and the data entry processes. They can be trained to do it right and to look for red flags. People who enter data on a form only once in their life are much more likely to get it wrong. Sometimes there exists a data entry position that has not been trained in the application. This is an invitation for mistakes.
Note that untrained data entry people who are making mistakes tend to make them repetitively, thus increasing the database inaccuracy level and thereby increasing the likelihood that it will be exposed through data profiling analysis.
Feedback provided to recorder. Feedback is always a good thing. And yet, our information systems rarely provide feedback to the most important people in the data path: those entering the data. Relevant information, such as errors found in computer checks, should be collected and provided to help them improve the accuracy of data they enter.
Auto-assist in recording process. Do the data entry programs and screens help in getting it right? A complex process can include pull-downs, file checking, suggestions on names, addresses, questioning of unusual options or entry information, and so on. Remembering information from the last transaction for that source can be very helpful in getting information right. Letting each data entry station set its own pull-down defaults can reduce errors. Providing the current date instead of asking that it be entered can improve accuracy. There are a lot of technology best practices that can improve the accuracy of information.
Error checking in recording process. Evaluate the error checking provided by the entry screen programs, the transaction path, and the database acceptance routines. Data checkers, filters, and database structural enforcement options can all be used to catch mistakes at the entry point. These are not always easy to identify because they require someone to dig around in code and database definitions. Many times these are not documented. Many times, they are thought to be true but have been turned off by a database administrator to improve performance. Many times, they exist but are not applied to all points of entry.
Conclusion
It is important to study all factors at each entry point, even though the investigation may have started by focusing on a single set of data errors.
This evaluation process may reveal other errors that were hidden from the data profiling process or uncover the potential for problems that have not yet occurred. It may also uncover some locally devised practices that are good ideas and may warrant propagation as a formal methodology throughout the data entry community.
Excerpt from Data Quality: The Accuracy Dimension, Copyright © 2003 by Elsevier Science (USA)
About the Author

Jack Olson has worked in the commercial software development business for 40 years. His career has mostly consisted of architecting solutions to IT problems in the area of database systems and tools. His career includes several years at IBM, BMC Software, Peregrine Systems, and Evoke Software, where he created the concept of data profiling. He has worked with several other startup companies in recent years as a consultant, advisor, or board member. He is currently an independent consultant.
Jack has published two books: “Data Quality: the Accuracy Dimension” (2003), and “Database Archiving: How to Keep Lots of Data for a Very Long Time” (2008). Jack has a BS degree in Mathematics from the Illinois Institute of Technology and an MBA from Northwestern University. He can be reached at Jack.Olson [at] SvalTech [dot] com or by phone at +1 (512) 565 9584.
