Core Concepts

Inaccurate statistics can damage the credibility of human rights claims—and that’s why we strive to ensure that statistics about human rights violations are generated with as much rigor and are as scientifically accurate as possible.

But, what are the pitfalls leading to inaccuracy—when, where, and how do data become compromised? How are patterns biased by having only partial data? And what are the best scientific methods for collecting, managing, processing and analyzing data?

Here are the data pitfalls that HRDAG has identified, as well as some of our approaches for meeting these challenges. We believe that human rights researchers must take these ideas into account. (Some of these concepts are reflected in How We Choose Projects.)

  • Under-registration

    No one information system can capture all the information related to every human rights violation in the universe of interest. A data collection project’s access may be limited by a number of factors, which include geography, available resources and the populations that are willing to share their knowledge. The information available in any given system is a sample of violations not necessarily representative of the real world.

  • Selection bias

    Every data source will have better access to some victims than to others, creating statistical bias. For example, bias can arise from limited access to unreachable or dangerous areas. The most frequent source of bias is that some victims or witnesses will trust the organization collecting data, while others will not.

    Because analyses of human rights violations are used to guide policy decisions, allocate resources for interventions, and inform transitional justice mechanisms, they must be accurate. Unfortunately, all too often these decisions are based, inappropriately, on analyses of a single convenience sample. A convenience sample is any data that is neither a complete enumeration of all the possible data—a census—nor a random, scientific sample.

    A discussion of the uses of raw data:

    Why raw data doesn’t support analysis of violence

    An explanation of convenience samples:

    Convenience samples: what they are, and what they should (and should not) be used for

  • Duplicate reporting and multiple systems estimation (MSE)

    Sources often include several accounts of the same violations because multiple witnesses may have reported on an event. We need to identify which records describe the same events and participants in order not to double- (or triple-) count them. In some projects, we are able to use the duplicated reporting to estimate the total violations, including those that were never documented.

    A truthful count of human rights violations demands that every violation counts once and only once. De-duplication, the process by which we identify multiply-counted cases, assures that each violation is counted only once. All of our projects, even those small in size and scope, require careful de-duplication.

    An explanation of  de-duplication methods as applied to the Kosovo Memory Book and the Syria project, respectively:

    How we make sure that nobody is counted twice

    How we estimate casualties in Syria–Part 1

    Further explanations of multiple systems estimation:

  • Complexity

    Human rights abuses are rarely straightforward. For instance, a database on arbitrary executions may also include information about detentions or torture that victims experienced before they were killed. If all three violation types are of interest to the project, all three violation types must be represented in the database. Collecting information only on the “most important” violation will obscure instances of the other violations, creating significant biases in the analysis.

  • Data Security

    Despite the importance of information, many human rights organizations lack the resources to preserve their data securely. Much of their information is stored in a single hard disk, and often in unencrypted form. Critical documentation is often subject to viruses, computer theft, accidents, neglect and staff turnover. Furthermore, files may contain sensitive identifying information about victims and perpetrators involved in abuses. If this information were to be compromised, it could put people at serious risk. An information management system must store data electronically, in multiple copies in multiple locations to prevent loss due to physical destruction. We encourage partners to encrypt data and allow only authorized users to access the information.

  • Other concepts