2 results for tag: deduplication
Deduplicating databases of deaths in war
On 12 September, 2016, director of research Patrick Ball gave a talk at the University of Cambridge's Isaac Newton Institute (INI) for Mathematical Sciences on the topic of Deduplicating databases of deaths in war: advances in adaptive blocking, pairwise classification, and clustering. @NewtonInstitute @anucecs
Clustering and Solving the Right Problem
In our database deduplication work, we’re trying to figure out which records refer to the same person, and which other records refer to different people.
We write software that looks at tens of millions of pairs of records. We calculate a model that assigns each pair of records a probability that the pair of records refers to the same person. This step is called pairwise classification.
However, there may be more than just one pair of records that refer to the same person. Sometimes three, four, or more reports of the same death are recorded.
So once we have all the pairs classified, we need to decide which groups of records refer to the ...
