Clustering and Solving the Right Problem

In our database deduplication work, we’re trying to figure out which records refer to the same person, and which other records refer to different people. We write software that looks at tens of millions of pairs of records. We calculate a model that assigns each pair of records a probability that the pair of records refers to the same person. This step is called pairwise classification. However, there may be more than just one pair of records that refer to the same person. Sometimes three, four, or more reports of the same death are recorded. So once we have all ...

A geeky deep-dive: database deduplication to identify victims of human rights violations

In our work, we merge many databases to figure out how many people have been killed in violent conflict. Merging is a lot harder than you might think. Many of the database records refer to the same people--the records are duplicated. We want to identify and link all the records that refer to the same victims so that each victim is counted only once, and so that we can use the structure of overlapping records to do multiple systems estimation. Merging records that refer to the same person is called entity resolution, database deduplication, or record linkage. For ...

Database Duplication

Clustering and Solving the Right Problem

A geeky deep-dive: database deduplication to identify victims of human rights violations

Our work has been used by truth commissions, international criminal tribunals, and non-governmental human rights organizations. We have worked with partners on projects on five continents.

Donate

HRDAG

Selected projects

Stay informed about our work

Database Duplication

Clustering and Solving the Right Problem

A geeky deep-dive: database deduplication to identify victims of human rights violations

Our work has been used by truth commissions, international criminal tribunals, and non-governmental human rights organizations. We have worked with partners on projects on five continents. Donate

HRDAG

Selected projects

Stay informed about our work

Our work has been used by truth commissions, international criminal tribunals, and non-governmental human rights organizations. We have worked with partners on projects on five continents.

Donate