An Introduction to Duplication Detection

by ; ;
Format: Paperback
Pub. Date: 2010-05-01
Publisher(s): Morgan & Claypool
Availability: This title is currently not available.
  • Free Shipping Icon

    This Item Qualifies for Free Shipping!*

    *Excludes marketplace orders.

List Price: $31.50

Rent Textbook

Select for Price
There was a problem. Please try again later.

New Textbook

We're Sorry
Sold Out

Used Textbook

We're Sorry
Sold Out

eTextbook

We're Sorry
Not Available

Summary

With the ever increasing volume of data, data quality problems abound. Multiple, yet different representations of the same real-world objects in data, duplicates, are one of the most intriguing data quality problems. The effects of such duplicates are detrimental; for instance, bank customers can obtain duplicate identities, inventory levels are monitored incorrectly, catalogs are mailed multiple times to the same household, etc.

Automatically detecting duplicates is difficult: First, duplicate representations are usually not identical but slightly differ in their values. Second, in principle all pairs of records should be compared, which is infeasible for large volumes of data. This lecture examines closely the two main components to overcome these difficulties: (i) Similarity measures are used to automatically identify duplicates when comparing two records. Well-chosen similarity measures improve the effectiveness of duplicate detection, (ii) Algorithms are developed to perform on very large volumes of data in search for duplicates. Well-designed algoritJims improve the efficiency of duplicate detection. Finally, we discuss methods to evaluate the success of duplicate detection.

An electronic version of this book is available through VitalSource.

This book is viewable on PC, Mac, iPhone, iPad, iPod Touch, and most smartphones.

By purchasing, you will be able to view this book online, as well as download it, for the chosen number of days.

Digital License

You are licensing a digital product for a set duration. Durations are set forth in the product description, with "Lifetime" typically meaning five (5) years of online access and permanent download to a supported device. All licenses are non-transferable.

More details can be found here.

A downloadable version of this book is available through the eCampus Reader or compatible Adobe readers.

Applications are available on iOS, Android, PC, Mac, and Windows Mobile platforms.

Please view the compatibility matrix prior to purchase.