Multiple Systems Estimation: The Basics
Multiple systems estimation, or MSE, is a family of techniques for statistical inference. MSE uses the overlaps between several incomplete lists of human rights violations to determine the total number of violations. In this blogpost, and four more to follow, I’ll answer both conceptual and practical questions about this important method. (In posts to follow, questions that refer to specific statistical procedures or debates will be marked, “In depth.”)
Statistical inference means using information about part of a population (a sample) to think about what is likely to be true for the whole population.
For example, a report about the results of a public opinion poll might say, “Thirty percent of Americans approve, with a margin of error of +/- 3%.” This is an inference about all Americans based on a sample of some Americans. The margin of error tells us how likely it is that this sample measurement (30%) is also true for the whole population. More specifically, it tells us that, if we surveyed many samples of Americans, most of these samples would show an approval rating in the 27% to 33% range.
The validity of a statistical inference depends on several factors. Perhaps most importantly, the sample must be representative of the whole population of interest; that is, it must be a random (or probability) sample. (For example, a sample composed only of New Yorkers would not be appropriate for an inference about all Americans.) Samples that are not random are known as convenience samples and can only be used to create inferences as part of an MSE analysis.
It is very difficult to construct a valid random sample on human rights topics, for many reasons. For example, it may be dangerous to talk about human rights violations. Certain groups may be more likely to report violations than other groups. Some areas where human rights violations are occurring may be remote or inaccessible. These are just a few of the many challenges associated with constructing a valid random sample on human rights violations. Therefore, we use MSE with multiple convenience samples in many of our projects.
MSE analyses extend naturally to the human rights context, as a number of HRDAG’s projects demonstrate. For example:
* In Guatemala, the number of uncounted deaths was about 85,000. This represents about half the total number of deaths and disappearances in the Guatemalan civil war.
* In Perú, where we assisted the Truth and Reconciliation Commission, under 25,000 deaths were documented directly (about 18,000 were documented by the Commission; another 7,000 were documented by other organizations but not the Commission). Using MSE analysis, we estimated approximately 70,000 deaths. Without MSE, around 45,000 deaths would not have been counted.
* In Casanare, Colombia, our team, working with Colombian partners, found that only about 40% of over 2,000 disappearances had been reported in any list of casualties.
See our Projects Page for more information on these and other MSE analyses. Readers who are considering performing an MSE analysis, or those who are looking for more detailed analyses of MSE’s accuracy, should review questions Q17–Q18, which will appear in the fifth and final blogpost on this topic.
The next post in the series is Multiple Systems Estimation: Collection, Cleaning and Canonicalization of Data >>[Creative Commons BY-NC-SA, excluding image]