Building Capacity in Colombia: Truth and Reconciliation

After decades of armed conflict in Colombia, the nation is attempting a formal reckoning. Who was killed, disappeared, or kidnapped? How many people, both civilians and armed combatants, were affected? Who committed the violence, and at whose behest? And how much of it was actually documented?

In 2016, two of the forces responsible for the civil war—the government and the Revolutionary Armed Forces of Colombia (FARC)—Colombian flag with quote: MSE helps us unhide the victims who've been silenced.reached a peace accord, and since then, dozens of organizations have been striving to answer questions about the scope of violence, create an accurate collective memory, and support healing from the trauma. Over the decades of conflict, dozens of organizations documented the violence they observed, and about 30 of them contributed their unique datasets to help achieve the goal of reckoning with the past. 

In June, 2020, the Human Rights Data Analysis Group began working with Colombian analysts to help them integrate all the datasets. Director of Research Patrick Ball and statistician Maria Gargiulo have been working with analysts from the truth commission, known as the CEV (La Comisión para el Esclarecimiento de la Verdad, la Convivencia y la No repetición), the mandate of which is clarification, memory, and reform, and with analysts from the JEP (Justicia Especial para la Paz), the mandate of which is a macro-level legal process. With these five Colombian analysts, Patrick and Maria have  helped build the capacity of the truth commission and the JEP, upgrading their analysts’ skills, and creating new, rigorous protocols at the intersection of human rights and data analysis.

The analysts from the Commission and the JEP are now so well-versed in statistical theory that they can communicate their findings without needing to rely on HRDAG.

For the first five months, the team met three times a week, for 90-minute sessions, with everyone speaking in Spanish. HRDAG’s “normal” process brings the HRDAG team on-site with partners for two or three weeks in the first phase of collaboration, but because of the Covid-19 pandemic, all collaboration took place over videoconference. Maria and Patrick walked their partners through every step, starting with setting up everyone’s work environments and strengthening the analysts’ technical toolkits.

For each phase of work, Maria and Patrick approached their work with the Colombian team with three tasks: data processing, which segued into data deduplication (also referred to as record linkage), and last, estimation. But they started with the basics.

“Everyone was coming from different backgrounds, and no one had ‘the right training,’ and that was totally fine,” said Maria.

Maria and Patrick oversaw the process by which all the records—more than 200 files—were combined. The combined dataset contained about 20 million records, and the team used HRDAG data processing procedures to “clean” the data and get it into a usable format for the next step.

The next step was data deduplication, which is also referred to as record linkage. In this machine learning-based step, the team’s goal was to find all the different records that referred to the same act of violence (same victim, same date, same place, same perpetrator, etc.), so all the records referring to the same victim are linked. After deduplication, the consolidated dataset was reduced by about half, and contains records pertaining to individual victims of homicide, forced displacement, forced disappearance, kidnapping, and forced recruitment of soldiers. Maria and Patrick used this time to train the team in how to use code to do record linkage, but they also used the time to discuss statistical theory so they could transfer that knowledge to their partners. 

Each of the 30-plus organizations brought its own, unique, partial dataset to the process. These datasets do a wonderful job of tallying the violence that was observed—but they don’t account for the violence that nobody witnessed or documented.

“Data is great, it’s the basis for moving forward,” said Patrick. “But each organization’s dataset reflects their particular interests, circumstances, budget and other factors. For these reasons, [all] data is partial.”

“This was the largest endeavor like this in Colombian history,” said Maria. “Having a deduplicated dataset is a big deal. But the consolidated dataset is still incomplete because it’s missing some victims.”

The problem of missing victims is what the team tackled with the third task, estimation. Unfortunately, all datasets inevitably miss some victims for different, legitimate reasons. (Counting victims during a conflict is dangerous and hard, the HRDAG team often says.) Maria and Patrick taught the team about multiple systems estimation, also known as MSE, which is an analytical tool for estimating what’s missing from the observed data.

“Every dataset, no matter how small, is valuable to the process of finding the truth,” said Patrick. “Every record in every dataset represents profound suffering. But to understand the whole story, we need all the data. We also need math to help us understand what’s missing, because we don’t want to ignore the deaths that didn’t get documented.”

“MSE helps us to unhide the victims who’ve been silenced,” said Maria.

As Maria and Patrick taught the Colombia team how to do MSE, as well as why observed data is not enough and why estimation is so important, the team grew from five to nine and then to thirteen members.To date, they have created ten analytical documents, or case studies, that answer specific questions. For example, one of the team members used the deduplicated dataset and MSE to estimate the number of homicides, comparing each of the armed groups in the state of Antioquia from 1990 to 2015. The possibilities for generating more case studies are myriad.

“There was a lot of thought going into that process,” said Maria. “It was incredible to see our partners pick up the concepts behind MSE, and then go on to communicate them to other colleagues. They’re educating their colleagues now on why we use estimates instead of observed data.”

The analysts from the Commission and the JEP are now so well-versed in statistical theory that they can communicate their findings without needing to rely on HRDAG. “These groups are now talking inter-institutionally,” said Maria. 

The team has completed the Methods Report and is adding even more datasets that have been contributed recently. Eventually, the Commission will present their results to Colombia’s legislative body, at which point the Commission dissolves, which is normal operating procedure for truth commissions. The JEP will continue to work on legal cases. And the analysts will continue to be technically self-sufficient at generating more reports. 

“I love the stats part of my job, but doing the capacity-building has been really cool,” said Maria. “We’ve gone from team members not being able to open a Unix terminal to ‘Maria, I did this case study, can you give it a look?’”

HRDAG partner stories:

Quantifying Police Misconduct in Louisiana

Scraping for Pattern: Protecting Immigrant Rights in Washington State

Police Violence in Puerto Rico: Flooded with Data

Building Capacity In Colombia: Truth And Reconciliation

Police Accountability In Chicago: From Data Dump To Usable Data

Protecting the Privacy of Whistle-Blowers: The Staten Island Files

Image: David Peters, 2021.


Our work has been used by truth commissions, international criminal tribunals, and non-governmental human rights organizations. We have worked with partners on projects on five continents.