This massive dataset about the 50-year conflict in Colombia is playing a central role in the truth and reconciliation process. The dataset comes from 44 sources and contains 24 million raw records created during the conflict. Because the dataset is an open resource (Creative Commons license), data scientists, researchers, civil society groups and others are invited to explore the data and see what else can be learned.
The dataset is the product of a collaboration between the Colombian Truth Commission, the Special Jurisdiction for Peace, and the Human Rights Data Analysis Group.

The challenge of missing data is symbolized by the yellow rectangles amidst this memorial to individuals who are known to have been victims.
Image from Centro de Memoria Historica, modified by David Peters
Colombia’s peace deal created a unique collaboration
A joint project of JEP, CEV, and HRDAG
In 2016, the Colombian government signed a peace deal with the FARC-EP guerrillas, bringing a 50-year conflict to an end. The agreement stipulated the creation of special courts to hear petitions for amnesty from military and FARC-EP officials; the court would be known as the Special Jurisdiction for Peace (known as JEP, or Jurisdicción Especial para la Paz). The deal also created a truth commission to write a definitive history of the violence of the last several decades (called the CEV, for Comisión para el Esclarecimiento de la Verdad, la Reconciliación y la No Repetición).
The Human Rights Data Analysis Group (HRDAG) supported the JEP and the CEV from 2018 through 2023. Together, analysts from HRDAG, JEP, and the CEV integrated data and calculated statistical estimates of the number of victims of the armed conflict, including homicides, enforced disappearances, kidnappings, and the recruitment of child soldiers. The project provided the CEV and the JEP with solid scientific arguments about the magnitude of violence, in part by correcting for underreporting and identifying victimization patterns.
This project consisted of four components:
- Record linkage of available database records on the armed conflict in Colombia, including de-duplication of reports.
- Generation of estimates and analysis of underreporting related to violations of human rights and international humanitarian law.
- Data analysis related to these estimates.
- Strengthening local capacity for analyzing data and implementing statistical models.
The updated version of our Methodological Report (also referred to as the Technical Appendix) is available in Spanish and English. Previous versions of the report in Spanish are linked here.
Researchers are invited to use and explore this data
To ensure that researchers can make best use of the data, we are simultaneously publishing various documentation and software tools necessary to use the data correctly. The data files include simulations to manage empty fields. Furthermore, the data include information about which datasets documented which cases, which enables analysts to estimate the total population, including the victims never documented.
The data are published with a Creative Commons license (CC 4.0 BY-SA-NC). Anyone who has a copy of the data is permitted to share the data with other people, as long as they cite the original source (that is, the joint JEP-CEV-HRDAG project).
Two versions of the data exist. The first version (v1) of the data corresponds to the original version of the data used for the analyses in the Methodological Report of the Joint Project. These data are useful for replicating the analyses in the Methodological Report. The second version (v2) of the data corrects some errors where individuals were erroneously included as direct victims when they should not have been. The v2 data is appropriate for researchers wishing to design their own analyses of the conflict.
Download data to replicate analyses conducted in the Methodological Report (v1 of the data)
Download data from the Departamento Administrativo Nacional de Estadística (DANE), the national statistics office in Colombia: https://microdatos.dane.gov.co/index.php/catalog/795/get-microdata
Download data from HRDAG via IPFS:
- Disappearance [csv] [parquet]
- Forced recruitment [csv] [parquet]
- Homicide [csv] [parquet]
- Kidnapping [csv] [parquet]
Download data to design your own analyses of the armed conflict in Colombia (v2 of the data)
Download data from HRDAG via IPFS:
- Disappearance [csv] [parquet]
- Forced recruitment [csv] [parquet]
- Homicide [csv] [parquet]
- Kidnapping [csv] [parquet]
We have produced a memo outlining what the changes in the v2 data mean for the results presented in the Methodological Report. In short, these changes change our estimate of overall underreporting for both homicides and enforced disappearances. They do not change any of the results presented for child recruitment or kidnapping, nor do they impact our analyses about the proportional disaggregation of the estimates (e.g., the proportion of violations carried out by each presumed perpetrator). The memo is available in Spanish and English.
To facilitate the correct use of the data, we have created the verdata package for the R statistical programming language. The package is currently available for download on GitHub and will soon be available on CRAN. In 2024, we published a peer-reviewed article in the Journal of Open Source Software to accompany the package. A repository of worked examples using the v1 data to replicate some key results from the Methodological Report is also available on GitHub, but they are still under development.
Learn more about the project:
- FAQ about the JEP-CEV-HRDAG data integration and statistical estimation project [Spanish version]
- Analyzing Patterns Of Violence In Colombia Using More Than 100 Databases [Spanish version]
- Capítulo 8 – Camino al Informe: La trascendencia de los datos del conflicto
- Building Capacity in Colombia: Truth And Reconciliation
- Making Missing Data Visible In Colombia
- Can The Armed Conflict Become Part of Colombia’s History?
- In Colombia: HRDAG And Dejusticia On the Importance Of Missing Data
What the data are telling us about what happened in Colombia

In this graph we can see the observed (black bars), statistically imputed (blue bars) and estimated (green bars) victims of homicide disaggregated by perpetrator. The dots in the middle of the imputed and estimated bars show the mean number of victims. The y axis shows the number of victims and the x axis shows the perpetrator. This analysis shows that the paramilitaries are responsible for the largest number of homicide victims, while the FARC-EP is responsible for the second highest number of victims. Note that the ranges of the estimates do not overlap and the relative order of responsibility is consistent between the observed data, the statistical imputation, and the estimate of victims.

Here we can see the observed (black line), statistically imputed (blue line) and estimated (green area) number of victims of enforced disappearance. The y axis shows the number of victims and the x axis shows the year. The green dots show that the variance is so large that it does not fit on this graph (especially in the early 2000s). In general, we can see that the estimate follows the trend of documented (observed) victims. Also, there is an evident peak of violence at the beginning of 2000. Although we can see a decrease in cases after this year, underreporting is still present, as indicated by the distance between the black line and green shading.
Letter from Alejandro Valencia Villa, Former Commissioner of the Colombian Truth Commission
One of the most obvious and most difficult questions to answer when analyzing an armed conflict is determining the number of victims. In a conflict like Colombia’s, prolonged and with complex characteristics due to the different nature of the armed actors and because they committed a great variety and quantity of human rights violations and breaches of humanitarian law, the challenge is even greater. As if this were not enough, Colombia also had a large number of records of these violations and infractions that were realized in distinct databases, each of which represents a sample of the total number of victims. Read the entire letter …

