25 results for author: Patrick Ball
Human Rights and the Decentralized Web
Our partners were eager to learn and talk about emerging decentralized technology.
Remembering Scott Weikart
HRDAG’s core values all have a connection to Scott Weikart, 1951–2023.
Epidemiology has theories. We should study them.
With so many dashboards and shiny visualizations, how can an interested non-technical reader find good science among the noise?
Using Machine Learning to Help Human Rights Investigators Sift Massive Datasets
How we built a model to search hundreds of thousands of text messages from the perpetrators of a human rights crime.
La estadística de mortalidad del conflicto en Perú
En ese artículo respondemos a una crítica del estudio de mortalidad que realizamos para la Comisión de la Verdad y Reconciliación en 2003.
The Statistics of Mortality Due to Conflict in Peru
A key point is that human rights data collection prior to the TRC largely ignored violence by the Shining Path.
Las cifras de la CVR en el 2019
Las estimaciones se estratificaron por ubicación y perpetrador.
Reality and Risk in Our Mortality Study of the Peruvian TRC
HRDAG researchers and analysts at Peru's Truth and Reconciliation Commission (TRC) estimated conflict mortality due to violence using Capture-Recapture methods.
Counting the Dead in Sri Lanka
ITJP and HRDAG are urging groups inside and outside Sri Lanka to share existing casualty lists.
New analysis of World War II Korean “comfort women” held by Japanese
There may have been more undocumented World War II-era Korean "comfort women" than known.
Herb Spirer, 1925 – 2018
Herb led and mentored a generation of statisticians working in human rights.
How many social movement leaders have been killed in Colombia? An estimate and analysis
As the war between the guerrillas, the Army, and paramilitary groups in Colombia winds down, violence against social movement leaders has intensified. Using data from six organizations, this report estimates the total number of social movement leaders killed in 2016 and 2017. The perpetrators of the killings are not reported in the data or in the report.
In the report, we observe that together, the monitoring organizations documented 160 killings in 2016, and we estimate a total population of 166 deaths.[1] In 2017, there were 172 documented killings, and we estimate a total of 185 deaths.[2]
From this, we conclude that the number of killings is ...
Using MSE to Estimate Unobserved Events
At HRDAG, we worry about what we don't know. Specifically, we worry about how we can use statistical techniques to estimate homicides that are not observed by human rights groups. Based on what we've seen studying many conflicts over the last 25 years, what we don't know is often quite different from what we do know.
The technique we use most often to estimate what we don't know is called "multiple systems estimation." In this medium-technical post, I explain how to organize data and use three R packages to estimate unobserved events.
Click here for Computing Multiple Systems Estimation in R.
Clustering and Solving the Right Problem
In our database deduplication work, we’re trying to figure out which records refer to the same person, and which other records refer to different people.
We write software that looks at tens of millions of pairs of records. We calculate a model that assigns each pair of records a probability that the pair of records refers to the same person. This step is called pairwise classification.
However, there may be more than just one pair of records that refer to the same person. Sometimes three, four, or more reports of the same death are recorded.
So once we have all the pairs classified, we need to decide which groups of records refer to the ...
The task is a quantum of workflow
This post describes how we organize our work over ten years, twenty analysts, dozens of countries, and hundreds of projects: we start with a task. A task is a single chunk of work, a quantum of workflow. Each task is self-contained and self-documenting; I'll talk about these ideas at length below. We try to keep each task as small as possible, which makes it easy to understand what the task is doing, and how to test whether the results are correct.
In the example I'll describe here, I'm going to describe work from our Syria database matching project, which includes about 100 tasks. I'll start with the first thing we do with files we receive ...
A geeky deep-dive: database deduplication to identify victims of human rights violations
In our work, we merge many databases to figure out how many people have been killed in violent conflict. Merging is a lot harder than you might think.
Many of the database records refer to the same people--the records are duplicated. We want to identify and link all the records that refer to the same victims so that each victim is counted only once, and so that we can use the structure of overlapping records to do multiple systems estimation.
Merging records that refer to the same person is called entity resolution, database deduplication, or record linkage. For definitive overviews of the field, see Scheuren, Herzog, and Winkler, Data Quality ...
Focus on Good Science, not Scientists
We recently learned about an article by Dr Nafeez Ahmed that criticizes the methods and conclusions of the Iraq Body Count (IBC) and the work of Professor Michael Spagat. Dr Ahmed cites our work extensively in support of his arguments, so we think it’s useful for us to reply.
We welcome Dr Ahmed’s summary of various points of scientific debate about mortality due to violence, specifically in Iraq and Colombia. We think these are very important questions for the analysis of data about violent conflict, and indeed, about data analysis more generally. We appreciate his exploration of the technical nuances of this difficult field.
Unfortunately, ...
How many police homicides in the US? A reconsideration
(This post is co-authored by Patrick Ball and Kristian Lum.)
In early March, the Bureau of Justice Statistics published a report that estimated that in the period 2003-2009 and 2011, there were approximately 7427 homicides committed by police in the US. We responded that the method the analysts used, capture-recapture with two databases, is vulnerable to underestimation if the databases exhibit positive dependence. We conduct a thorough sensitivity analysis on the original independence model as applied to the police homicides databases. We used information from several other countries where our partners created multiple databases of homicides. We ...
Yezidi Activists Teach HRDAG about Human Rights – updated
UPDATE (21 Dec 2014): Juan Cole is reporting that the Kurdish militia (the peshmerga) have retaken Shingal (also known as Sinjar) mountain where many Yezidi people have been trapped since 3 August 2014. They are now moving to liberate other Yezidi towns south of the mountain. The Yezidi people trapped on the mountain are now free. There is no word yet on the thousands of Yezidi people enslaved by ISIS.
ORIGINAL (19 Nov 2014): Farhad (not his real name) got the call from ISIS on his personal cell phone just after lunch: we have your sister, and we will give her back if you pay us $6000, plus $1500 for the driver.
Carrying little more than his ...
Revisiting the analysis of event size bias in the Iraq Body Count
(This post is co-authored by Patrick Ball and Megan Price)
In a recent article in the SAIS Review of International Affairs, we wrote about "event size bias," the problem that events of different sizes have different probabilities of being reported. In this case, the size of an event is defined by the number of reported victims. Our concern is that not all violent (in this case homicide) events are recorded, that is, some events will have zero sources. Our theory is that events with fewer victims will receive less coverage than events with more victims, and that a higher proportion of small events will have zero sources relative to large events.
The ...