1 result for tag: clusters


Clustering and Solving the Right Problem

In our database deduplication work, we’re trying to figure out which records refer to the same person, and which other records refer to different people. We write software that looks at tens of millions of pairs of records. We calculate a model that assigns each pair of records a probability that the pair of records refers to the same person. This step is called pairwise classification. However, there may be more than just one pair of records that refer to the same person. Sometimes three, four, or more reports of the same death are recorded. So once we have all the pairs classified, we need to decide which groups of records refer to the ...