nick pizzi home research publications miscellany

an approximate reasoning strategy for entity resolution

Master data is the consistent set of identifiers and attributes that describe an organization’s core entities. Over time, as organizations collect entity data, it is common that duplicate records are inadvertently created. This is particularly the case when an organization stores entity information on disparate databases for different corporate agendas. Compounding this issue is the missing and potentially contradictory identifiers and attributes across these duplicates. Organizations employ Master Data Management systems to improve the accuracy, uniformity and semantic consistency of its master data assets.

We propose to enhance these systems via the creation of an approximate reasoning strategy, which will employ novel aggregation techniques on data with varying levels of information specificity and across disparate data repositories, in order to improve the overall quality of master data assets. An integral component of this research program is the creation of a graph-based data schema where “confidence”-weighted entity connections must be rapidly traversed, updated and queried.

copyright © 2000–2018. n pizzi, phd