Skip to content

IAIDQ Publications

How to access IAIDQ publications and recordings

Connecting Entity Resolution and Information Quality
April 2011
John Talburt

The term Entity Resolution (ER) has only been in use for a few years, but the concept has been around since information systems have been in use. Sometimes called record de-duplication, record matching, record linking, merge-purge, or the co-reference problem, ER is the process of determining if two references to real-world objects are referring to the same or to different objects.

ER is an important tool for achieving Entity Identity Integrity, a fundamental data quality rule that should hold in any information system. In his book Data Quality Assessment, Arkady Maydanchik describes Entity Identity Integrity in the context of a database system as strict adherence to the following conditions

  • Each row (entity reference) in a entity table corresponds to one, and only one, real-world entity; and
  • Distinct rows in the table correspond to distinct real-world entities.

Entity Identity Integrity is also another way of stating the Fundamental Law of Entity Resolution – that two entity references should be linked (merged or integrated) if, and only if, they are equivalent (i.e. both refer to the same real-world entity).

A more complete discussion of the Fundamental Law of ER and other ER principles can be found in my book Entity Resolution and Information Quality (2011, Morgan Kaufmann Publishers).

Read full article