New and Noteworthy

HRDAG Offers New R Package – dga

Much of the work we do at HRDAG involves estimating the number of undocumented deaths using a statistical technique called multiple systems estimation (MSE, described in more detail here). One of our goals is to make this class of methods more broadly available to human rights researchers. In particular, we are finding that Bayesian approaches are extremely valuable for MSE. Accordingly, we are pleased to offer a new R package called dga (“decomposable graphs approach”) that performs Bayesian model averaging for MSE.

The main function in this package implements a model created by David Madigan and Jeremy York. This model was designed to overcome two of the principal challenges we often face in multiple systems estimation: (1) adequately incorporating statistical uncertainty about the structure of dependence between the lists into the estimates; and, (2) obtaining estimates in the presence of data sparsity that causes maximum likelihood estimation to be highly unstable. In our research, we find these benefits to result from Bayesian approaches to MSE generally. This Bayesian model offers a principled way to account for list dependence uncertainty, allow estimation with sparse data, and, where appropriate, to incorporate expert input about the likely number of undocumented deaths.

The dga package includes some additional functions to process, visualize, and run diagnostics on data. It also includes functions to visualize the results. (The package does not perform record linkage.) Currently, dga can produce estimates for three-, four-, or five-list systems. In the future, we plan to expand this to up to seven lists. A limitation of the method is that it becomes computationally intractable to enumerate all of the possible list dependence structures for more than about seven lists. Fortunately, large numbers of lists are rarely encountered in human rights applications. In later versions of the package, we hope to incorporate a stochastic search feature to allow scaling to higher numbers of lists.

Let us know what you think about our package!

How many police homicides in the US? A reconsideration

(This post is co-authored by Patrick Ball and Kristian Lum.)

Estimated pairwise list dependence [click to enlarge]

Figure 1. [click to enlarge]

In early March, the Bureau of Justice Statistics published a report that estimated that in the period 2003-2009 and 2011, there were approximately 7427 homicides committed by police in the US. We responded that the method the analysts used, capture-recapture with two databases, is vulnerable to underestimation if the databases exhibit positive dependence. We conduct a thorough sensitivity analysis on the original independence model as applied to the police homicides databases. We used information from several other countries where our partners created multiple databases of homicides. We then used those data to estimate a range of plausible list dependence values. If the dependence between the two databases used to estimate the number of police homicides is similar to that seen in several other, contextually similar examples,  we estimate that the total number of victims of homicide by the police during 2003-2009 and 2011 is probably approximately 10,000. We have written a very detailed analysis of why we think this is true, available here. Here’s a shorter version: (more…)

HRDAG Retreat 2015

On retreat, March 2015 / HRDAG

On retreat, March 2015 / HRDAG

I look at the beach and then at the table surrounded by nerds, deep in thought and conversation about Dirichlet priors, matching algorithms, and armed conflicts. This peculiar (in the best way) environment catalyzes a moment of reflection: how did I get here?

Four years ago, as a second-year statistics PhD student, I watched “Guatemala: The Secret Files” on PBS Frontline World. I listened to stories of family members (more…)

BJS Report on Arrest-Related Deaths: True Number Likely Much Greater

(This post is co-authored by Patrick Ball and Kristian Lum.)

Today the Bureau of Justice Statistics (BJS) released a report on their effort to document “all deaths that occur during the process of arrest in the United States.” The analysis estimates that the Arrest-Related Deaths (ARD) program covers only 34-49% of these deaths. A parallel program by the FBI (the Supplementary Homicide Reports, SHR) is estimated to cover approximately the same proportion of deaths. Even taking into consideration both programs, 28% of all police homicides remain unreported.

In order to estimate the total number of homicides that appear on neither the ARD or SHR databases, the report relies upon a statistical technique that we at HRDAG regularly employ to estimate the number of undocumented conflict-related deaths: multiple systems estimation. The BJS analysts had only two databases available to them (the ARD and the SHR); HRDAG generally uses three or more databases. (more…)

< | >
  • > HRDAG

    The Human Rights Data Analysis Group is a non-profit, non-partisan organization that applies rigorous science to the analysis of human rights violations around the world.
  • > Recent Stories


    HRDAG Offers New R Package – dga

    How many police homicides in the US? A reconsideration

    HRDAG Retreat 2015

    BJS Report on Arrest-Related Deaths: True Number Likely Much Greater

    The Great Lessons in Research at the Archive

    Evaluation of the Kosovo Memory Book

    Yezidi Activists Teach HRDAG about Human Rights – updated

    Syria: No word on four abducted activists

    Revisiting the analysis of event size bias in the Iraq Body Count

    The AHPN: Home of Stories Old and New


    Archives


    x

    You are welcome to use these datasets for your research. If you publish with them, however, we ask that you include the following text: "These are convenience sample data, and as such they are not a statistically representative sample of events in this conflict.  These data do not support conclusions about patterns, trends, or other substantive comparisons (such as over time, space, ethnicity, age, etc.)."

    For reference and further information please see this blogpost about raw data and this blogpost about convenience samples. In addition, we recommend you read the following: Dorofeev, S. and P. Grant (2006). Statistics for Real-Life Sample Surveys. Cambridge University Press; and van Belle, Gerald (2002). Statistical Rules of Thumb. Wiley.

    If you use these data, please cite them with the following reference: