Using Machine Learning to Help Human Rights Investigators Sift Massive Datasets

How we built a model to search hundreds of thousands of text messages from the perpetrators of a human rights crime.

La estadística de mortalidad del conflicto en Perú

En ese artículo respondemos a una crítica del estudio de mortalidad que realizamos para la Comisión de la Verdad y Reconciliación en 2003.

The Statistics of Mortality Due to Conflict in Peru

A key point is that human rights data collection prior to the TRC largely ignored violence by the Shining Path.

Las cifras de la CVR en el 2019

Las estimaciones se estratificaron por ubicación y perpetrador.

Reality and Risk in Our Mortality Study of the Peruvian TRC

HRDAG researchers and analysts at Peru's Truth and Reconciliation Commission (TRC) estimated conflict mortality due to violence using Capture-Recapture methods.

Counting the Dead in Sri Lanka

ITJP and HRDAG are urging groups inside and outside Sri Lanka to share existing casualty lists.

New analysis of World War II Korean “comfort women” held by Japanese

There may have been more undocumented World War II-era Korean "comfort women" than known.

Herb Spirer, 1925 – 2018

Herb led and mentored a generation of statisticians working in human rights.

How many social movement leaders have been killed in Colombia? An estimate and analysis

As the war between the guerrillas, the Army, and paramilitary groups in Colombia winds down, violence against social movement leaders has intensified. Using data from six organizations, this report estimates the total number of social movement leaders killed in 2016 and 2017. The perpetrators of the killings are not reported in the data or in the report. In the report, we observe that together, the monitoring organizations documented 160 killings in 2016, and we estimate a total population of 166 deaths.[1] In 2017, there were 172 documented killings, and we estimate a total of 185 deaths.[2] From this, we conclude that the number of killings is ...

Using MSE to Estimate Unobserved Events

At HRDAG, we worry about what we don't know. Specifically, we worry about how we can use statistical techniques to estimate homicides that are not observed by human rights groups. Based on what we've seen studying many conflicts over the last 25 years, what we don't know is often quite different from what we do know. The technique we use most often to estimate what we don't know is called "multiple systems estimation." In this medium-technical post, I explain how to organize data and use three R packages to estimate unobserved events. Click here for Computing Multiple Systems Estimation in R.

Clustering and Solving the Right Problem

In our database deduplication work, we’re trying to figure out which records refer to the same person, and which other records refer to different people. We write software that looks at tens of millions of pairs of records. We calculate a model that assigns each pair of records a probability that the pair of records refers to the same person. This step is called pairwise classification. However, there may be more than just one pair of records that refer to the same person. Sometimes three, four, or more reports of the same death are recorded. So once we have all the pairs classified, we need to decide which groups of records refer to the ...

The task is a quantum of workflow

This post describes how we organize our work over ten years, twenty analysts, dozens of countries, and hundreds of projects: we start with a task. A task is a single chunk of work, a quantum of workflow. Each task is self-contained and self-documenting; I'll talk about these ideas at length below. We try to keep each task as small as possible, which makes it easy to understand what the task is doing, and how to test whether the results are correct. In the example I'll describe here, I'm going to describe work from our Syria database matching project, which includes about 100 tasks. I'll start with the first thing we do with files we receive ...

A geeky deep-dive: database deduplication to identify victims of human rights violations

In our work, we merge many databases to figure out how many people have been killed in violent conflict. Merging is a lot harder than you might think. Many of the database records refer to the same people--the records are duplicated. We want to identify and link all the records that refer to the same victims so that each victim is counted only once, and so that we can use the structure of overlapping records to do multiple systems estimation. Merging records that refer to the same person is called entity resolution, database deduplication, or record linkage. For definitive overviews of the field, see Scheuren, Herzog, and Winkler, Data Quality ...

Focus on Good Science, not Scientists

We recently learned about an article by Dr Nafeez Ahmed that criticizes the methods and conclusions of the Iraq Body Count (IBC) and the work of Professor Michael Spagat. Dr Ahmed cites our work extensively in support of his arguments, so we think it’s useful for us to reply. We welcome Dr Ahmed’s summary of various points of scientific debate about mortality due to violence, specifically in Iraq and Colombia. We think these are very important questions for the analysis of data about violent conflict, and indeed, about data analysis more generally. We appreciate his exploration of the technical nuances of this difficult field. Unfortunately, ...

How many police homicides in the US? A reconsideration

(This post is co-authored by Patrick Ball and Kristian Lum.) In early March, the Bureau of Justice Statistics published a report that estimated that in the period 2003-2009 and 2011, there were approximately 7427 homicides committed by police in the US. We responded that the method the analysts used, capture-recapture with two databases, is vulnerable to underestimation if the databases exhibit positive dependence. We conduct a thorough sensitivity analysis on the original independence model as applied to the police homicides databases. We used information from several other countries where our partners created multiple databases of homicides. We ...

Yezidi Activists Teach HRDAG about Human Rights – updated

UPDATE (21 Dec 2014): Juan Cole is reporting that the Kurdish militia (the peshmerga) have retaken Shingal (also known as Sinjar) mountain where many Yezidi people have been trapped since 3 August 2014. They are now moving to liberate other Yezidi towns south of the mountain. The Yezidi people trapped on the mountain are now free. There is no word yet on the thousands of Yezidi people enslaved by ISIS. ORIGINAL (19 Nov 2014): Farhad (not his real name) got the call from ISIS on his personal cell phone just after lunch: we have your sister, and we will give her back if you pay us $6000, plus $1500 for the driver. Carrying little more than his ...

Revisiting the analysis of event size bias in the Iraq Body Count

(This post is co-authored by Patrick Ball and Megan Price) In a recent article in the SAIS Review of International Affairs, we wrote about "event size bias," the problem that events of different sizes have different probabilities of being reported. In this case, the size of an event is defined by the number of reported victims. Our concern is that not all violent (in this case homicide) events are recorded, that is, some events will have zero sources. Our theory is that events with fewer victims will receive less coverage than events with more victims, and that a higher proportion of small events will have zero sources relative to large events. The ...

The Day We Fight Back

Today, February 11, is the day of national protests against the National Security Administration. The critical threat is mass surveillance. In the words of The Day We Fight Back, “Together we will push back against powers that seek to observe, collect, and analyze our every digital action. Together, we will make it clear that such behavior is not compatible with democratic governance. Together, if we persist, we will win this fight.” (more…)

Ouster of Guatemala’s Attorney General

We were surprised and disappointed to learn that our colleague Claudia Paz y Paz has had her term as Guatemala’s attorney general cut short. The nation’s Constitutional Court ruled on 6 February that her four-year term will end this May, instead of in December. (She was appointed in December 2010, replacing an attorney general who was appointed in May 2010.) During her term, she put four generals from Guatemala’s civil war on the stand for charges of crimes against humanity and genocide, including General José Efraín Ríos Montt, who ruled from 1982 to 1983. We were fortunate to work with her on that trial and to witness the handing down of a ...

Why raw data doesn't support analysis of violence

This morning I got a query from a journalist asking for our data from the report we published yesterday. The journalist was hoping to create an interactive infographic to track the number of deaths in the Syrian conflict over time. Our data would not support an analysis like the one proposed, so I wrote this reply. We can't send you these data because they would be misleading—seriously misleading—for the purpose you describe. Here's why: What we have is a list of documented deaths, in essence, a highly non-random sample, though a very big one. We like bigger samples because we think that they must be closer to true. The mathematical justificat...