HRDAG science
Training with HRDAG: Rules for Organizing Data and More
I had the pleasure of working with Patrick Ball at the HRDAG office in San Francisco for a week during summer 2016. I knew Patrick from two workshops he previously hosted at the University of Washington’s Centre for Human Rights (UWCHR). The workshops were indispensable to us at UWCHR as we worked to publish a number of datasets on human rights violations during the El Salvador Civil War. The training was all the more helpful because the HRDAG team was so familiar with the data. As part of an impressive career which took him from Ethiopia and Kosovo to Haiti and El ...
Using MSE to Estimate Unobserved Events
At HRDAG, we worry about what we don't know. Specifically, we worry about how we can use statistical techniques to estimate homicides that are not observed by human rights groups. Based on what we've seen studying many conflicts over the last 25 years, what we don't know is often quite different from what we do know.
The technique we use most often to estimate what we don't know is called "multiple systems estimation." In this medium-technical post, I explain how to organize data and use three R packages to estimate unobserved events.
Click here for Computing ...
Predictive Policing Reinforces Police Bias
Issues surrounding policing in the United States are at the forefront of our national attention. Among these is the use of “predictive policing,” which is the application of statistical or machine learning models to police data, with the goal of predicting where or by whom crime will be committed in the future. Today Significance magazine published an article on this topic that I co-authored with William Isaac. Significance has kindly made this article open access (free!) for all of October. In the article we demonstrate the mechanism by which the use of predictive ...
Clustering and Solving the Right Problem
In our database deduplication work, we’re trying to figure out which records refer to the same person, and which other records refer to different people.
We write software that looks at tens of millions of pairs of records. We calculate a model that assigns each pair of records a probability that the pair of records refers to the same person. This step is called pairwise classification.
However, there may be more than just one pair of records that refer to the same person. Sometimes three, four, or more reports of the same death are recorded.
So once we have all ...
The task is a quantum of workflow
This post describes how we organize our work over ten years, twenty analysts, dozens of countries, and hundreds of projects: we start with a task. A task is a single chunk of work, a quantum of workflow. Each task is self-contained and self-documenting; I'll talk about these ideas at length below. We try to keep each task as small as possible, which makes it easy to understand what the task is doing, and how to test whether the results are correct.
In the example I'll describe here, I'm going to describe work from our Syria database matching project, which ...
The case of Ana Lucrecia Orellana Stormont
When working with documents in an archive, every document offers the opportunity for statistical study and quantitative research. But a document can also offer the discovery of a story. That is the case with the disappearance of Ana Lucrecia Orellana Stormont, who was reported missing on June 6, 1983, at the age of 35.
Ana Lucrecia, a professor of psychology at the University of San Carlos, was scheduled to attend a meeting with Edgar Raúl Rivas Rodríguez at the Plaza Hotel in Guatemala’s capital city. Edgar, who also went missing, was a teacher at the School of ...
Data Archaeology for Human Rights in Central America: HRDAG Collaborates with UWCHR
Patrick Ball is kicking himself for a decision he made almost 25 years ago. “I was clever, but I wasn’t smart,” he says ruefully, as he considers the labyrinth of tables and ASCII-encoded keystrings he used to design a database of human rights violations for the pioneering Salvadoran non-governmental Human Rights Commission (CDHES). Now I’m sitting in his office in San Francisco’s Mission District watching over his shoulder, and trying to keep up, as he bangs out code to decipher the priceless data contained in these old files. Created in 1991 and 1992, ...
A geeky deep-dive: database deduplication to identify victims of human rights violations
In our work, we merge many databases to figure out how many people have been killed in violent conflict. Merging is a lot harder than you might think.
Many of the database records refer to the same people--the records are duplicated. We want to identify and link all the records that refer to the same victims so that each victim is counted only once, and so that we can use the structure of overlapping records to do multiple systems estimation.
Merging records that refer to the same person is called entity resolution, database deduplication, or record linkage. For ...
Rapid response to: Civilian deaths from weapons used in the Syrian conflict
On November 4, 2015, the BMJ published our "Rapid Response" to Civilian deaths from weapons used in the Syrian conflict (BMJ 2015;351:h4736). The response was co-authored by Megan Price, Anita Gohdes, Jay Aronson (Carnegie Mellon University, Center for Human Rights Science), and Christopher McNaboe (Carter Center, Syria Conflict Mapping Project).
We have three concerns about this article. First, the article apportions responsibility for casualties to particular perpetrator organizations based on a single snapshot of territorial control that ignores the numerous (and ...
HRDAG Testifies in Hissène Habré Trial
Last week HRDAG’s executive director, Patrick Ball, served as an expert witness for the prosecution in the trial of Hissène Habré, the ruler of Chad from 1982 to 1990. The trial is taking place in Dakar, Senegal, where the 73-year-old Habré has been living since 1990 when he fled Chad. He has already been sentenced to death in absentia in Chad.
Habré is being charged with war crimes, crimes against humanity, and torture that took place during his eight-year reign. The trial is happening at the Extraordinary African Chambers, which was inaugurated by Senegal ...
Analysis of Homicide Patterns in Colombia
Last week Forensis, the Colombian National Institute of Forensic Medicine’s flagship publication, published the first of our analyses of homicide patterns in Colombia. Authored by HRDAG executive director Patrick Ball and UN colleague Michael Reed Hurtado, “Cuentas y mediciones de la criminalidad y de la violencia” (pages 529-545) explores, as the title suggests, the quality of “truth” contained within crime registries. Citing the problem of partial data, missing data, and inherent design bias, Patrick and Michael write that no register, official or unofficial, ...
When Data Doesn’t Tell the Whole Story
This blog is a part of International Justice Monitor’s technology for truth series, which focuses on the use of technology for evidence and features views from key proponents in the field.
As highlighted by other posts in this series, emerging technology is increasing the amount and type of information available, in some contexts, to criminal and other investigations. Much of what is produced by these emerging technologies (Facebook posts, tweets, YouTube videos, text messages) falls in the category we refer to as “found” data. By “found” data we mean data ...
HRDAG Offers New R Package – dga
Much of the work we do at HRDAG involves estimating the number of undocumented deaths using a statistical technique called multiple systems estimation (MSE, described in more detail here). One of our goals is to make this class of methods more broadly available to human rights researchers. In particular, we are finding that Bayesian approaches are extremely valuable for MSE. Accordingly, we are pleased to offer a new R package called dga (“decomposable graphs approach”) that performs Bayesian model averaging for MSE.
The main function in this package implements a ...
BJS Report on Arrest-Related Deaths: True Number Likely Much Greater
(This post is co-authored by Patrick Ball and Kristian Lum.)
Today the Bureau of Justice Statistics (BJS) released a report on their effort to document “all deaths that occur during the process of arrest in the United States.” The analysis estimates that the Arrest-Related Deaths (ARD) program covers only 34-49% of these deaths. A parallel program by the FBI (the Supplementary Homicide Reports, SHR) is estimated to cover approximately the same proportion of deaths. Even taking into consideration both programs, 28% of all police homicides remain unreported.
In ...
Revisiting the analysis of event size bias in the Iraq Body Count
(This post is co-authored by Patrick Ball and Megan Price)
In a recent article in the SAIS Review of International Affairs, we wrote about "event size bias," the problem that events of different sizes have different probabilities of being reported. In this case, the size of an event is defined by the number of reported victims. Our concern is that not all violent (in this case homicide) events are recorded, that is, some events will have zero sources. Our theory is that events with fewer victims will receive less coverage than events with more victims, and that a higher ...
IRR: Agreement Among Coders is Key
For years I have been engaged in a quantitative study at Guatemala’s Historic Archive of the National Police, or AHPN. (See the blogposts below.) In this study coders collect data on sheets of paper according to criteria established and explained in manuals. But when collecting data, there’s always room for human error—this is why the validity of the study hinges on verifying that coders use the correct criteria.
It is important to mention that the mainstay of coding is the use of a controlled vocabulary. A controlled vocabulary gives analysts a framework, or ...
12 Questions about Using Data Analysis to Bring Guatemalan War Criminals to Justice
When people talk about war criminals in Guatemala, which war are they talking about?
They’re talking about the Guatemalan civil war, which began in 1960 and ended in 1996. That’s thirty-six years of civil war. Even though it ended almost two decades ago, Guatemala is still recovering from it. At its simplest, this civil war story was right-wing government forces fighting leftist rebels. But it went deeper than that, of course. The majority of the rebel forces was composed of indigenous peoples, primarily the Maya, (more…)
Learning Day by Day: Quantitative Research at the AHPN
Working at the Historic Archive of the National Police (AHPN) of Guatemala, there are many skills I learned on the job. My many years of work on the team that studies the recovered documents have been like a custom-made course in how to do quantitative research.
The Archive documents I study are the result of 36 years of creation during civil war (1960 to 1996). Many of these documents are simply administrative—but we are able to use them to understand patterns that occurred during the conflict, to get a sense of what mattered to the National Police and what ...
14 Questions about Counting Casualties in Syria
In early 2012, HRDAG was commissioned by the UN Office of the High Commissioner for Human Rights (OHCHR) to do an enumeration project, essentially a count of all of the reported casualties in the Syrian conflict. HRDAG has published two analyses so far, the first in January 2013, and the second in June 2013. In this post, HRDAG scientists Anita Gohdes, Megan Price, and Patrick Ball answer questions about that project.
So, how many people have been killed in the Syrian conflict?
This is a complicated question. As of our last report, in June 2013, we know that there have ...
HRDAG at Strata Conference 2014
Last Thursday, HRDAG co-founder and director of research Megan Price presented at Strata, the conference for data scientists and people who work with "big data." In her talk, she addressed the question of how we can know the actual number of conflict casualties in Syrian. Her short answer was, "We don't know." The longer answer was that we have a very good idea of how many conflict casualties have been reported, by several documentation groups, and that we're working on analyzing (more…)















