2024 Publications
From time to time, we issue our own scientific reports that focus on the statistical aspects of the data analysis we have done in support of our partners. These reports are non-partisan, and they leave the work of advocacy to our partners.
Back to PublicationsThe killings of social movement leaders and human rights defenders in Colombia 2018 - 2023: an estimate of the universe
HRDAG + Dejusticia
In 2018, Dejusticia and HRDAG published our first report estimating the total number of social leaders killed in Colombia during 2016-2017. Additionally, we demonstrated that a statistical method known as “capture-recapture” could be used to estimate the underreporting of murdered social leaders. Moreover, our estimate closely matched the total documented by the organizations collectively. A year later, we released a second report, updating the data to include 2018. Five years later, we revisited this exercise to cover the period from 2019 to 2023, focusing on three of the original six organizations.
Creative Commons International license 4.0.
Valentina Rozo Ángel and Patrick Ball (2024). Asesinatos de líderes sociales y defensores de derechos en Colombia: en estimación del universo actualización 2019 – 2023. Human Rights Data Analysis Group. 18 December 2024. © HRDAG 2024.
The use of unstructured data to study police use of force
CHANCE magazine
The challenges and opportunities researchers face when working with unstructured data are hardly new. This article defines unstructured data as data that is not organized according to pre-existing schemas or structures for the sake of statistical analysis. Unstructured data poses a unique challenge for researchers focused on police and policing. The article discusses a definition of unstructured data and two of the primary challenges faced when working with such data, namely information extraction and classification problems. Two case studies are used to illuminate the challenges.
Tarak Shah, Cristian Allen, Ayyub Ibrahim, Harlan Kefalas, and Bavo Stevens (2024). The Use of Unstructured Data to Study Police Use of Force. 5 December, 2024. CHANCE, 37 (4), 18–23. © The American Statistical Association (ASA) and Taylor & Francis Group 2024. https://doi.org/10.1080/09332480.2024.2434437
Deaths in Custody during the Armed Conflict in Syria, 2011–2023
HRDAG
A key question of interest for the United Nations Commission of Inquiry on the Syrian Arab Republic is how many victims of the ongoing conflict were killed while in custody? Through our long collaboration with both the UN and multiple Syrian documentation groups, our team of data scientists at the Human Rights Data Analysis Group (HRDAG) have access to documented records of victims killed, under a variety of circumstances, in the Syrian Arab Republic between 2011 and 2023. This report is based on records collected by eight sources documenting deaths in the ongoing armed conflict in Syria.
Creative Commons International license 4.0.
Maria Gargiulo, Tarak Shah, Megan Price (2024). Deaths in Custody during the Armed Conflict in Syria, 2011–2023. Human Rights Data Analysis Group. 10 December, 2024. © 2024 HRDAG.
Unveiling statistical invisibility: The structural racism of the war on drugs, its impact on social inequalities, and the need for citizen data empowerment in Latin America
T20 Brasil
There is no way to address social inequalities in Latin America (LA) without discussing the region’s longstanding security policy: the War on Drugs, characterized by criminalization of historical cultural practices of Black and indigenous communities, the militarization of public security and mass incarceration. It contributes to the region being a leader in global homicides and exacerbates the unequal inclusion of non-white populations.
Cecilia Olliveira, Patrick Ball, Dayana Blanco, Eduardo Ribeiro, Juliana Borges, Maria Isabel Couto, Nathália Oliveira (2024).”Unveiling Statistical Invisibility: The Structural Racism of the War on Drugs, its Impact on Social Inequalities, and the Need for Citizen Data Empowerment in Latin America.” September 2024. © T20 Brasil 2024.
Innocence Discovery Lab - Harnessing large language models to surface data buried in wrongful conviction case documents
The Wrongful Conviction Law Review
The recent advent of commercial artificial intelligence (AI), especially in natural language processing (NLP), introduces transformative possibilities for wrongful conviction research. NLP, a pivotal branch of AI that forms the basis for Large Language Models (LLMs), enables computers to interpret human language with a nuanced understanding. This technological advancement is particularly valuable for analyzing the complex language found in case documents associated with wrongful convictions. This paper explores the effectiveness of LLMs in analyzing and extracting data from case documents collected by the Innocence Project New Orleans and the National Registry of Exonerations. The diverse and comprehensive nature of these datasets makes them ideal for assessing the capabilities of LLMs. The findings of this study advance our understanding of how LLMs can be utilized to make wrongful conviction case documents easily accessible by automating the extraction of relevant data.
Creative Commons Attribution 4.0 International License.
Ayyub Ibrahim, Huy Dao, and Tarak Shah (2024). Innocence Discovery Lab – Harnessing Large Language Models to Surface Data Buried in Wrongful Conviction Case Documents. The Wrongful Conviction Law Review 5 (1):103-25. 31 May, 2024. https://doi.org/10.29173/wclawr112. © 2024 Ayyub Ibrahim, Huy Dao, Tarak Shah.
Preserving human rights data with the Filecoin Network: A journey into the Decentralized Web with HRDAG
At the core of HRDAG’s work are the datasets it gathers, tidies, and uses for estimation and analysis. The data includes evidence of homicides, disappearances, kidnappings, recruitment of child soldiers, and forced displacement. These are some of the most traumatic events that could happen to anyone, and proof of these events is crucial –– so that societies remember the suffering of the past in order not to repeat it in the future. By remembering, we help to validate the experiences of the survivors, enable social recovery, and provide evidence with which to hold the perpetrators accountable. It is therefore essential to preserve and protect this information.
Creative Commons Attribution 4.0 International license
Patrick Ball (2024). Preserving Human Rights Data with the Filecoin Network: A Journey into the Decentralized Web with HRDAG. Filecoin Foundation. 18 April, 2024. © 2025 Filecoin Foundation for the Decentralized Web.
verdata: An R package for analyzing data from the Truth Commission in Colombia
The Journal of Open Source Software
The data compiled by the joint JEP-CEV-HRDAG project are publicly available from the Departamento Administrativo Nacional de Estadística (DANE). The data published by DANE is available in a format that may not be familiar to researchers who have not previously worked with statistical imputation methods. Recognizing this, verdata was created to support researchers in responsibly and correctly using the data despite the potential unfamiliarity of its structure. Researchers can use verdata to verify that the data files they are using in their analyses have not been altered, to replicate the main findings of the technical appendix, and to design new analyses of the conflict in Colombia.
Creative Commons Attribution 4.0 International License.
Maria Gargiulo, María Julia Durán, Paula Andrea Amado, and Patrick Ball (2024). verdata: An R package for analyzing data from the Truth Commission in Colombia. The Journal of Open Source Software. 6 January, 2024. 9(93), 5844, https://doi.org/10.21105/joss.05844. Creative Commons Attribution 4.0 International License.
