In Colombia: HRDAG and Dejusticia on the Importance of Missing Data
At the end of 2022, HRDAG partner Dejusticia, a human rights organization based in Bogotá, Colombia, published a brief that aggregates much of both organizations’ thinking on missing data. One of the brief’s authors, Valentina Rozo Ángel, has been both a visiting analyst and consultant with Human Rights Data Analysis Group, and the brief reflects that collaboration.
The brief, ”Telling the Truth: Statistics in the Unveiling of Patterns of Violence,” addresses the intricate, difficult process of discovering the truth during a transition period after a massive conflict has ended. (An alternate translation of the brief’s title would be “Counting the Truth,” as “contar” has two meanings in Spanish.) Answering the question, “Who did what to whom?” is necessary for understanding what happened during the conflict, and to answer the question, truth commissions turn to databases that have tried to capture information about victims and perpetrators. Unfortunately, it’s inevitable that databases will have information gaps, and special care must be taken to account for these gaps. If the information gaps are not taken seriously, the truth commission’s and state’s version of “the truth” will be incorrect. But data scientists can use statistical methods to get a more accurate picture of the patterns of violence than is available from simply looking at the databases.
Dejusticia also published this article, “How can science and statistics guarantee the right to the truth?” It gives an overview of the brief, and its points are paraphrased into English at the end of this article.
The brief and article aim to explain to a non-statistical audience, especially lawyers, what missing information is, how to deal with it, and why it makes sense to do so, with special regard for the right to truth. Here, HRDAG interviews two of the brief’s authors, Valentina Rozo Ángel (VRA) and Alejandro Jiménez Ospina (AJO), to understand some of the key points being made by Dejusticia and HRDAG.
Q: There are two types of information missing from databases that try to capture “who did what to whom” during a conflict. The first type of missing information is missing fields. Can you explain what is meant by a missing field?
VRA: Missing fields occur when a victim is documented but at least one variable related to the victimizing act is missing. For example, we can have the information that John Smith was murdered but we are missing the information related to who murdered him. If you can imagine a dataset having as many rows as victims and as many columns as variables, the missing fields would be the empty cells through the dataset.
Q: The other type of missing information is missing records. Can you explain that?
VRA: Contrary to the missing fields, the missing records refer to victims who have not been documented. For example, we might have records of 100 victims. But it is very unlikely that any organization can document the total number of victims during massive acts of human rights violations. There might have been 101, 102, even 200 or more victims. This might occur for many reasons. For example, victims might be scared of telling what happened. Also, the acts may have occurred in places that are difficult to reach. It might also be that the events took place without any witnesses. Therefore, we will have missing records, also known as “underreporting.”
Q: What can truth commissions do to compensate for missing information?
VRA: Truth Commissions can use statistics to compensate for missing information. Statistics give us tools to overcome the two kinds of missing information. Regarding missing fields, we can use “imputation methods.” Colloquially, these methods “fill in” the missing fields. There are different approaches for this; one method is using machine-learning algorithms. For the second kind of missing data, missing records or underreporting, we can use multiple systems estimation (MSE) methods. These are techniques that allow us to estimate population sizes, or in the case of truth commissions, the universe of victims. For using this approach a truth commission needs at least three different sources, since the estimate is based on the overlaps between the datasets.
Q: It would be so much easier to analyze the data in the databases and draw conclusions from that. What do we risk by doing that?
AJO: The main risk is creating incomplete or biased narratives about what happened. As explained, it is very difficult, if not impossible, to document each and every case of victimizing acts that occur in contexts like armed conflicts. Drawing conclusions from incomplete databases could amplify the visibility of the documented cases, ignoring those that were not documented. This negative outcomes get worse when the reasons behind the lack of documentation of cases are rooted in a history of discrimination or marginalization, or when someone in the process aims to erase the traces of victimization (e.g., enforced disappearances or sexual violence).
Q: Can the use of statistical methods completely eliminate uncertainty?
VRA: No, quite the opposite. Statistical methods allow us to include uncertainty in our findings. For example: If we had one dataset with 100 victims and we didn’t use statistical methods, there wouldn’t be any uncertainty on the real magnitude of victimization. We would “know” there were 100 victims. But our results would probably be biased: the difference between what we observe and what really occurred might be significant. By using statistical methods we will have “variance” or uncertainty. In the case of estimating the universe of victims we will have a credible interval with a lower and upper bound. We won’t know for sure how many victims there were, but we will know that the number would be between the range. In keeping with HRDAG´s slogan “everybody counts,” if we want to include the most vulnerable victims, who are probably the ones who were not documented, we should recognize uncertainty by using statistical methods.
Q: What is the connection between the right to truth and the right to information?
AJO: Rights to truth and information are intrinsically related and linked by history. The right to truth is not explicitly contained in international human rights treaties like the American Convention on Human Rights or the International Covenant on Civil and Political Rights. Nonetheless, different international human rights organizations have determined that the right to truth exists as an autonomous right and is rooted in the right to a fair trial, judicial protection and information. In the end, what matters is that the international human rights system recognizes that victims and society have a right to receive information about what happened in contexts like armed conflict or dictatorships (among many others) and that this implies an obligation to provide them with the most complete possible narrative.
Here is the Dejusticia article, paraphrased:
How can science and statistics guarantee the right to the truth?
States have the obligation to guarantee the right to the truth in the face of serious human rights violations. Through statistical methods it is possible to tell the truth that escapes in the information gaps that the databases have about these violations.
There are multiple databases that quantify serious human rights violations. However, it is common for these to have two types of information gaps: missing fields and missing records. Missing fields occur when there is no information about a variable for a record, while missing records refer to those victims or violations that are not in a database. Both situations make it difficult to reveal patterns of violence and guarantee the right to the truth.
Statistics contributes to overcoming these gaps through statistical imputation and multiple systems estimation; uncertainty is not completely eliminated but reduced. Thus, through the use of statistical methods, the magnitude of violence can be estimated within a possible range. Statistical methods quantify both the observed and the unobserved. Based on this, it is feasible to study patterns of violence that account for a broader vision of the conflict. If the analysis is done solely from what is observed, from the data recorded in the databases, the risk of ignoring patterns of violence not documented in them is assumed.
This document explains how statistics contribute to guaranteeing the right to the truth, especially in its collective dimension. It also addresses how, thanks to estimates, it is possible to help clarify the atrocious past.
5 Lessons
- Talking about data in human rights violations is important for understanding the behavior and violence and their patterns, as well as for finding, describing, and comparing the behaviors of the actors who participated.
- In addition, using statistical methods can help guarantee the rights of victims, since, in contexts of massive or systematic violence, it is impossible for all human rights violations to be documented, which creates an uncertainty. The uncertainty can be solved with the use of statistical methods that address gaps in information. Thus, the collection, systematization and analysis of large datasets, plus the acknowledgment that there are information gaps, help to guarantee the right to the truth.
- Statistical methods are essential for the work of a truth commission, because they allow the study, through the scientific method, of what is not known about the data. They help to address the information gaps derived from uncertainty, that is, from what is not observed, and they help to approach in probabilistic terms the possible content of the missing information (to measure the uncertainty).
- But truth commissions require a minimum of conditions to apply statistical methods, such as guaranteeing access to relevant information that is not exclusively of state origin. Another necessary condition is integrating a file design and implementing an adequate information management system that allows for clarifying what happened and identifying patterns of violence, and recognizing the existence of gaps in the information, and using statistical methods to act on it.
- Statistical methods have an impact on human rights, because there is an interdependence between the right to the truth, the right to information and the other rights. This occurs for at least three reasons:
- There is a national and international obligation for States to guarantee the right to the truth and, therefore, to produce or reconstruct information even when it has been stolen or destroyed.
- Making undocumented patterns visible can reveal invisible patterns of violence.
- Everyone counts, and while it is inevitable that information will be incomplete, we should not be comfortable accepting gaps or providing partial truth.
Image: Constanza Vieira/IPS ipsnews.net. “Crosses to identify gravesites of unknown persons at the Cementerio La Macarena, Colombia.”