Recent Publications

From time to time, we issue our own scientific reports that focus on the statistical aspects of the data analysis we have done in support of our partners. These reports are non-partisan, and they leave the work of advocacy to our partners.


How public involvement can improve the science of AI

Proceedings of the National Academy of Sciences of the United States of America

As AI systems from decision-making algorithms to generative AI are deployed more widely, computer scientists and social scientists alike are being called on to provide trustworthy quantitative evaluations of AI safety and reliability. These calls have included demands from affected parties to be given a seat at the table of AI evaluation. What, if anything, can public involvement add to the science of AI? In this perspective, we summarize the sociotechnical challenge of evaluating AI systems, which often adapt to multiple layers of social context that shape their outcomes. We then offer guidance for improving the science of AI by engaging lived-experience experts in the design, data collection, and interpretation of scientific evaluations.

Nathan Matias and Megan Price (2025). How public involvement can improve the science of AI. Proceedings of the National Academy of Sciences of the United States of America, Vol. 122, No. 48. 14 November, 2025. © 2025 National Academy of Sciences. https://doi.org/10.1073/pnas.2421111122

Access the publication off-site.

Shots fired: Can technology really keep us safe from gunfire?

Significance

An expensive American gunshot detection system claims it’s necessary because humans don’t always call the police to report gunfire. But opponents say it’s fatally flawed. To investigate, Bailey Passmore and Larry Barrett analysed data on emergencies within the city of Chicago.

Bailey Passmore and Larry Barrett (2025). Shots fired: Can technology really keep us safe from gunfire? Significance, Volume 22, Issue 4, July 2025, Pages 34–37. 27 May 2025. © Royal Statistical Society 2025. https://doi.org/10.1093/jrssig/qmaf042

Access the publication off-site.

The killings of social movement leaders and human rights defenders in Colombia 2018 – 2023: an estimate of the universe.

HRDAG + Dejusticia

In 2018, Dejusticia and HRDAG published our first report estimating the total number of social leaders killed in Colombia during 2016-2017. Additionally, we demonstrated that a statistical method known as “capture-recapture” could be used to estimate the underreporting of murdered social leaders. Moreover, our estimate closely matched the total documented by the organizations collectively. A year later, we released a second report, updating the data to include 2018. Five years later, we revisited this exercise to cover the period from 2019 to 2023, focusing on three of the original six organizations.

Creative Commons International license 4.0.

Valentina Rozo Ángel and Patrick Ball (2024). Asesinatos de líderes sociales y defensores de derechos en Colombia: en estimación del universo actualización 2019 – 2023. Human Rights Data Analysis Group. 18 December 2024. © HRDAG 2024. 

Read the full article on our site.

The use of unstructured data to study police use of force

CHANCE magazine

The challenges and opportunities researchers face when working with unstructured data are hardly new. This article defines unstructured data as data that is not organized according to pre-existing schemas or structures for the sake of statistical analysis. Unstructured data poses a unique challenge for researchers focused on police and policing. The article discusses a definition of unstructured data and two of the primary challenges faced when working with such data, namely information extraction and classification problems. Two case studies are used to illuminate the challenges.

Tarak Shah, Cristian Allen, Ayyub Ibrahim, Harlan Kefalas, and Bavo Stevens (2024). The Use of Unstructured Data to Study Police Use of Force. 5 December, 2024. CHANCE37 (4), 18–23. © The American Statistical Association (ASA) and Taylor & Francis Group 2024. https://doi.org/10.1080/09332480.2024.2434437

Access the publication off-site.

Deaths in custody during the armed conflict in Syria, 2011–2023

HRDAG

A key question of interest for the United Nations Commission of Inquiry on the Syrian Arab Republic is how many victims of the ongoing conflict were killed while in custody? Through our long collaboration with both the UN and multiple Syrian documentation groups, our team of data scientists at the Human Rights Data Analysis Group (HRDAG) have access to documented records of victims killed, under a variety of circumstances, in the Syrian Arab Republic between 2011 and 2023. This report is based on records collected by eight sources documenting deaths in the ongoing armed conflict in Syria.

Creative Commons International license 4.0.

Maria Gargiulo, Tarak Shah, Megan Price (2024). Deaths in Custody during the Armed Conflict in Syria, 2011–2023. Human Rights Data Analysis Group. 10 December, 2024. © 2024 HRDAG. 

Read the full article on our site.

Unveiling statistical invisibility: The structural racism of the war on drugs, its impact on social inequalities, and the need for citizen data empowerment in Latin America

T20 Brasil

There is no way to address social inequalities in Latin America (LA) without discussing the region’s longstanding security policy: the War on Drugs, characterized by criminalization of historical cultural practices of Black and indigenous communities, the militarization of public security and mass incarceration. It contributes to the region being a leader in global homicides and exacerbates the unequal inclusion of non-white populations.

Cecilia Olliveira, Patrick Ball, Dayana Blanco, Eduardo Ribeiro, Juliana Borges, Maria Isabel Couto, Nathália Oliveira (2024).”Unveiling Statistical Invisibility: The Structural Racism of the War on Drugs, its Impact on Social Inequalities, and the Need for Citizen Data Empowerment in Latin America.” September 2024. © T20 Brasil 2024.

Access the publication off-site.

Innocence Discovery Lab – Harnessing large language models to surface data buried in wrongful conviction case documents

The Wrongful Conviction Law Review

The recent advent of commercial artificial intelligence (AI), especially in natural language processing (NLP), introduces transformative possibilities for wrongful conviction research. NLP, a pivotal branch of AI that forms the basis for Large Language Models (LLMs), enables computers to interpret human language with a nuanced understanding. This technological advancement is particularly valuable for analyzing the complex language found in case documents associated with wrongful convictions. This paper explores the effectiveness of LLMs in analyzing and extracting data from case documents collected by the Innocence Project New Orleans and the National Registry of Exonerations. The diverse and comprehensive nature of these datasets makes them ideal for assessing the capabilities of LLMs. The findings of this study advance our understanding of how LLMs can be utilized to make wrongful conviction case documents easily accessible by automating the extraction of relevant data.

Creative Commons Attribution 4.0 International License.

Ayyub Ibrahim, Huy Dao, and Tarak Shah (2024). Innocence Discovery Lab – Harnessing Large Language Models to Surface Data Buried in Wrongful Conviction Case Documents. The Wrongful Conviction Law Review 5 (1):103-25. 31 May, 2024. https://doi.org/10.29173/wclawr112. ©  2024 Ayyub Ibrahim, Huy Dao, Tarak Shah.

Access the publication off-site.

Preserving human rights data with the Filecoin Network: A journey into the Decentralized Web with HRDAG

At the core of HRDAG’s work are the datasets it gathers, tidies, and uses for estimation and analysis. The data includes evidence of homicides, disappearances, kidnappings, recruitment of child soldiers, and forced displacement. These are some of the most traumatic events that could happen to anyone, and proof of these events is crucial –– so that societies remember the suffering of the past in order not to repeat it in the future. By remembering, we help to validate the experiences of the survivors, enable social recovery, and provide evidence with which to hold the perpetrators accountable. It is therefore essential to preserve and protect this information.

Creative Commons Attribution 4.0 International license

Patrick Ball (2024). Preserving Human Rights Data with the Filecoin Network: A Journey into the Decentralized Web with HRDAG. Filecoin Foundation. 18 April, 2024. © 2025 Filecoin Foundation for the Decentralized Web.

Access the publication off-site.

verdata: An R package for analyzing data from the Truth Commission in Colombia

The Journal of Open Source Software

The data compiled by the joint JEP-CEV-HRDAG project are publicly available from the Departamento Administrativo Nacional de Estadística (DANE). The data published by DANE is available in a format that may not be familiar to researchers who have not previously worked with statistical imputation methods. Recognizing this, verdata was created to support researchers in responsibly and correctly using the data despite the potential unfamiliarity of its structure. Researchers can use verdata to verify that the data files they are using in their analyses have not been altered, to replicate the main findings of the technical appendix, and to design new analyses of the conflict in Colombia.

Creative Commons Attribution 4.0 International License.

Maria Gargiulo, María Julia Durán, Paula Andrea Amado, and Patrick Ball (2024). verdata: An R package for analyzing data from the Truth Commission in Colombia. The Journal of Open Source Software. 6 January, 2024. 9(93), 5844, https://doi.org/10.21105/joss.05844. Creative Commons Attribution 4.0 International License.

Access the publication off-site.

Our work has been used by truth commissions, international criminal tribunals, and non-governmental human rights organizations. We have worked with partners on projects on five continents.

Donate