This massive dataset about the 50-year conflict in Colombia is playing a central role in the truth and reconciliation process. The dataset comes from 44 sources and contains 24 million raw records created during the conflict. Because the dataset is an open resource (Creative Commons license), data scientists, researchers, civil society groups and others are invited to explore the data and see what else can be learned.
The dataset is the product of a collaboration between the Colombian Truth Commission, the Special Jurisdiction for Peace, and the Human Rights Data Analysis Group.
The challenge of missing data is symbolized by the yellow rectangles amidst this memorial to individuals who are known to have been victims.
Image from Centro de Memoria Historica, modified by David Peters
Colombia’s peace deal created a unique collaboration
A joint project of JEP, CEV, and HRDAG
In 2016, the Colombian government signed a peace deal with the FARC-EP guerrillas, bringing a 50-year conflict to an end. The agreement stipulated the creation of special courts to hear petitions for amnesty from military and FARC-EP officials; the court would be known as the Special Jurisdiction for Peace (known as JEP, or Jurisdicción Especial para la Paz). The deal also created a truth commission to write a definitive history of the violence of the last several decades (called the CEV, for Comisión para el Esclarecimiento de la Verdad, la Reconciliación y la No Repetición).
Since mid-2020, the CEV, the JEP, and the Human Rights Data Analysis Group (HRDAG) have worked together to integrate data and calculate statistical estimates of the number of victims of the armed conflict, including homicides, forced disappearances, kidnapping, and the recruitment of child soldiers. The goal of this project is to provide the CEV and the JEP with solid scientific arguments about the magnitude of violence, in part by correcting for underreporting and identifying victimization patterns.
This project consists of four components:
- Record linkage of available database records on the armed conflict in Colombia, including de-duplication of reports.
- Generation of estimates and analysis of underreporting related to violations of human rights and international humanitarian law.
- Data analysis related to these estimates.
- Strengthening local capacity for analyzing data and implementing statistical models.
Our methodological report (or technical appendix) was first published in August 2022 in Spanish. An English translation will be available soon.
Researchers are invited to use and explore this data
To ensure that researchers can make best use of the data, we are simultaneously publishing various documentation and software tools necessary to use the data correctly. The data files include simulations to manage empty fields. Furthermore, the data include information about which datasets documented which cases, which enables analysts to estimate the total population, including the victims never documented.
The data are published with a Creative Commons license (CC 4.0 BY-SA-NC). Anyone who has a copy of the data is permitted to share the data with other people, as long as they cite the original source (that is, the joint JEP-CEV-HRDAG project).
Access via DANE
The primary repository of these data files is the National Administrative Department of Statistics, or DANE, which is the Colombian government’s official repository for data.
Access via Truth Commission
https://www.comisiondelaverdad.co/analitica-de-datos-informacion-y-recursos#c3
On their official site, the Truth Commission has enabled a section about the project, from which it is possible to download the data.
Access via GitHub
Link to verdata GitHub repository:
https://github.com/HRDAG/verdata (under development)
Link to examples on GitHub repository:
https://github.com/HRDAG/verdata-examples (under development)
More resources
- Methodological report (technical appendix), 2022 [Spanish] [English available soon]
- FAQs about the methodological report (technical appendix), 2022 [Spanish] [English]
- Blogpost about the methodological report (technical appendix), 2022, “Analyzing The Patterns Of Violence In Colombia With More Than 100 Databases” [Spanish] [English]
- Article in Journal of Open Source Software: verdata: An R package for analyzing data from the Truth Commission in Colombia, 2024.
What the data are telling us about what happened in Colombia
In this graph we can see the observed (black bars), statistically imputed (blue bars) and estimated (green bars) victims of homicide disaggregated by perpetrator. The dots in the middle of the imputed and estimated bars show the mean number of victims. The y axis shows the number of victims and the x axis shows the perpetrator. This analysis shows that the paramilitaries are responsible for the largest number of homicide victims, while the FARC-EP is responsible for the second highest number of victims. Note that the ranges of the estimates do not overlap and the relative order of responsibility is consistent between the observed data, the statistical imputation, and the estimate of victims.
Here we can see the observed (black line), statistically imputed (blue line) and estimated (green area) number of victims of enforced disappearance. The y axis shows the number of victims and the x axis shows the year. The green dots show that the variance is so large that it does not fit on this graph (especially in the early 2000s). In general, we can see that the estimate follows the trend of documented (observed) victims. Also, there is an evident peak of violence at the beginning of 2000. Although we can see a decrease in cases after this year, underreporting is still present, as indicated by the distance between the black line and green shading.
Letter from Alejandro Valencia Villa, Former Commissioner of the Colombian Truth Commission
One of the most obvious and most difficult questions to answer when analyzing an armed conflict is determining the number of victims. In a conflict like Colombia’s, prolonged and with complex characteristics due to the different nature of the armed actors and because they committed a great variety and quantity of human rights violations and breaches of humanitarian law, the challenge is even greater. As if this were not enough, Colombia also had a large number of records of these violations and infractions that were realized in distinct databases, each of which represents a sample of the total number of victims. Read the entire letter …
More related content from HRDAG website
- “In Colombia: HRDAG and Dejusticia on the Importance of Missing Data” [English], blogpost/Q+A, 2023
- “Analyzing The Patterns Of Violence In Colombia With More Than 100 Databases” [Spanish] [English], 2022
- Methodological report (technical appendix), 2022 [Spanish] [English available soon]
- FAQs about the methodological report (technical appendix), 2022 [Spanish] [English]
- “Building Capacity in Colombia: Truth and Reconciliation,” partner story, 2021
- A short video from the Colombian Truth Commission about HRDAG’s data analysis of 20,000+ testimonies, which informs the forthcoming final report from the Commission. 2020. In Spanish.