Data ‘hashing’ improves estimate of the number of victims in databases

David Ruth - Phys.org - 5 June 2018

But while HRDAG’s estimate relied on the painstaking efforts of human workers to carefully weed out potential duplicate records, hashing with statistical estimation proved to be faster, easier and less expensive. The researchers said hashing also had the important advantage of a sharp confidence interval: The range of error is plus or minus 1,772, or less than 1 percent of the total number of victims.

“The big win from this method is that we can quickly calculate the probable number of unique elements in a dataset with many duplicates,” said Patrick Ball, HRDAG’s director of research. “We can do a lot with this estimate.”

Read full article off-site

Our work has been used by truth commissions, international criminal tribunals, and non-governmental human rights organizations. We have worked with partners on projects on five continents.

Donate