Analyzing patterns of violence in Colombia using more than 100 databases

—Valentina Rozo Ángel and Maria Gargiulo

In 2016, the Colombian Government and the guerrilla group “Fuerzas Armadas Revolucionarias de Colombia – Ejército del Pueblo (FARC-EP)” arrived at a peace agreement, which included the creation of the Commission for the Clarification of Truth, Coexistence and Non-repetition (Truth Commission; abbreviated as CEV in Spanish). This temporary institution’s objectives were to learn the truth about what happened during the armed conflict, contribute to the understanding of the violations and infractions of international humanitarian law that occurred during the conflict, and offer a comprehensive explanation of its complexity to the Colombian people (1).

The Colombian Truth Commission collaborated with the Special Jurisdiction for Peace (abbreviated as JEP in Spanish), one of the other mechanisms created by the peace agreement, and HRDAG for the joint project on data integration and statistical estimation (“Proyecto conjunto JEP-CEV-HRDAG de integración de datos y estimación estadística”). The project aimed to create official statistical information about the magnitudes and patterns of violence during the Colombian conflict. The collaboration concluded with the publication of the Truth Commission’s final report. More information about the project is available here.

We estimated victims of forced disappearance, displacement, homicide, recruitment of minors, and kidnapping during the armed conflict using 112 databases. To do this, we used statistical and machine learning methods in three main phases. The first step was record linkage, where we deduplicated 12,863,977 records across all databases to avoid double counting victims. Next, we used multiple imputation aided by “support variables” calculated from a long short-term memory neural network to impute the missing fields in documented records, such as instances where a victim’s sex or age were not documented. Finally, we used multiple systems estimation with latent class multiple capture-recapture models to estimate the number of undocumented victims.

The findings from the project were used in many chapters of the final report. The chapter “Hasta la guerra tiene límites” (“Even war has limits”) contains a detailed analysis of the findings. The statistical appendix, “Anexo proyecto JEP-CEV-HRDAG”, presents a technical explanation of the methods used in the project alongside high-level findings. The FAQs page (currently only available in Spanish) has answers to commonly asked questions about the project. Finally, the National Administrative Department of Statistics (abbreviated DANE in Spanish), the entity responsible for disseminating official statistics of Colombia, reviewed the project and published a technical review.

In the coming weeks, we will publish more information about this project on all of our social media accounts.

Note: The statistical appendix on the HRDAG website is an updated version that corrects some errors in the version available on the Truth Commission website.


(1) Comisión de la Verdad. 2022. ¿Qué es la Comisión de la Verdad? Available at: Accessed 18 August 2022.

Our work has been used by truth commissions, international criminal tribunals, and non-governmental human rights organizations. We have worked with partners on projects on five continents.