#### 1. What is MSE?

A: Multiple Systems Estimation, or MSE, is a family of techniques for statistical inference. MSE uses the overlaps between several incomplete lists of human rights violations to determine the total number of violations.

#### 2. What do you mean by statistical inference?

A: Statistical inference means using information about part of a population (a sample) to think about what is likely to be true for the whole population.

For example, a report about the results of a public opinion poll might say, “Thirty percent of Americans approve, with a margin of error of +/- 3%.” This is an inference about all Americans based on a sample of some Americans. The margin of error tells us how likely it is that this sample measurement (30%) is also true for the whole population. More specifically, it tells us that, if we surveyed many samples of Americans, most of these samples would show an approval rating in the 27% to 33% range.

The validity of a statistical inference depends on several factors. Perhaps most importantly, the sample must be representative of the whole population of interest; that is, it must be a random (or probability) sample. (For example, a sample composed only of New Yorkers would not be appropriate for an inference about all Americans.) Samples that are not random are known as convenience samples and can only be used to create inferences as part of an MSE analysis.

It is very difficult to construct a valid random sample on human rights topics, for many reasons. It may be dangerous to talk about human rights violations. Certain groups may be more likely to report violations than other groups. Some areas where human rights violations are occurring may be remote or inaccessible. Many more similar problems exist also. Therefore, we use MSE with multiple convenience samples in many of our projects.

#### 3. What is an overlap, and how do we know when lists overlap?

A: Consider the example of civilian killings during an armed conflict. If two organizations are keeping lists of civilian killings, these lists are convenience samples of the true “population” of killings. The overlap is the group of killings that appear on both lists. To conduct MSE, we need to accurately determine the overlap between three or more lists (also known as systems) of human rights violations.

In order to accurately determine the overlap between lists, each case on each list should be identifiable, often by name, date of violation, location, age or other characteristics. This can be a very difficult task, since many victims of human rights violations may be recorded with incomplete or incorrect information. The Benetech Human Rights Project has developed an automated matching program, which facilitates matching of large datasets. Without automated matching, it would not be feasible to determine the overlap between large datasets, and MSE could not proceed.

#### 4. How does MSE find the total number of violations?

A: MSE estimates the total number of violations by comparing the size of the overlap(s) between lists to the sizes of the lists themselves. If the overlap is small, this implies that the population from which the lists were drawn is much larger than the lists. If, on the other hand, most of the cases on the lists overlap, this implies that the overall population is not much larger than the number of cases listed.

The example below shows how this works in a simplified way. On the left, list A has 10 individuals, 2 of whom are also on list B. List B has 8 individuals, 2 of whom are also on list A. We know from probability theory that the probability of being in a random list of size A from a population of size N is A/N. Similarly, the probability of being in a list of size B is B/N, and the probability of being in a list of size M is M/N. We also know that the probability of being in both A and B is the product of the individual probabilities: A/N * B/N. But “A and B” is the same as M, so we can write: A/N * B/N = M/N. From there, we can solve the equation for the unknown total population size, N: N = A*B/M.

Lists A and B are the same size on both left and right (A=10, B=8). However, on the left, A and B overlap only a little, while on the right, A and B overlap very significantly (i.e., M is large compared to A and B). As expected, when we plug A, B and M into the equation above, we find that N (the overall population) is much larger when the overlap is small than when it is large.

Remember that this is a simplified example. While the intuition behind MSE follows the two-list case closely, the two-list case relies on some assumptions about the lists which human rights data typically cannot meet. However, when we expand to three or more lists, these assumptions no longer apply, because we can use more complex statistics than the equation above. When we talk about MSE, we are typically talking about applying these more complex methods.

#### 5. How was MSE originally developed?

A:MSE was initially developed in the context of wildlife population management, where it is known as capture-recapture, mark-recapture, or multiple-recapture analysis. It has a long intellectual pedigree, beginning with reports on fish migration by Petersen (1896). Sekar and Deming (1949) expanded the uses of capture-recapture to human populations. MSE is used by the United States Census Bureau to more accurately estimate the US population; it is also used frequently in epidemiology to determine the completeness of disease registers and is the subject of a voluminous statistical literature (for example, see this Google Scholar search).

#### 6. How does the Benetech Human Rights Program use MSE?

A: MSE analyses extend naturally to the human rights context, as a number of the Human Rights Program’s projects show. For example:

• In Guatemala, the number of uncounted deaths was about 85,000. This represents about half the total number of deaths in the Guatemalan civil war.

• In Peru, where we assisted the Truth and Reconciliation Commission, just 18,000 deaths were documented directly, while MSE analysis estimated approximately 70,000 deaths. Without MSE, around 45,000 deaths would not have been counted.

• In Casanare, Colombia, our team, working with Colombian partners, found that only about 40% of over 2,000 disappearances had been reported in any list of casualties.