How we measure when we do not know.

The 2019 year-end review from Human Rights Data Analysis Group

Demonstrators mourn the missing at a 2016 protest in the Philippines

Photo: ©Raffy Larma / modified by David Peters

How we can help

A NOTE FROM
MEGAN PRICE,
EXECUTIVE DIRECTOR
Megan Price

At HRDAG, we worry. In fact, at HRDAG worrying is a common theme. We worry about what we don’t know, and we worry about how what we don’t know might adversely affect the lives of real people. We worry about how to count those who haven’t been counted, and how to give voices to people who have been silenced.

The stakes are high, because so many policies and programs now claim to be “data-driven.” As scientists, we support data collection and analysis, but we’re also realistic about the limitations of observed data. In our work researching violence we know that there are many casualties that we can not count, violence that hasn’t been documented or observed. Unless we use our analytical skills to estimate what we don’t know, we can’t really understand what’s happened.

››› more

What HRDAG’s evolving methods can uncover.

A timeline

Over the past 20 years working on projects in nine countries, HRDAG has refined and advanced one of the key methods we use to estimate what we do not know. A common theme in all our work is estimating what’s missing, what’s unrecorded. By estimating what is unobserved, we tell the true story, using statistics to include those who have been unable to speak. The findings in each of the following examples would not have been possible without statistical analysis.

1999

Human Rights Equation
Guatemala massacre

GUATEMALA | Taken away, never to be seen again.

photo: Elena Hermosa/Trócaire

Guatemala

The Guatemalan Truth Commission’s conclusion that the Army committed acts of genocide against the Mayan population was based, in part, on our first application of Multiple Systems Estimation (MSE). The testimonies given to the UN’s Commission for Historical Clarification and other sources did not document all the deaths, nor were they collected at random. The findings were therefore neither complete nor representative. The Commission’s conclusion that violence was targeted at the Mayan population could have been undermined by pointing out gaps in the observed data. By using MSE to adjust for incomplete data, we created a more robust result that could withstand scrutiny and criticism.

Initial assumptions: This first analysis relied on a version of MSE that requires strong assumptions, including the condition that every individual victim has the same probability of being recorded on each list; the lists are independent; and a single correct model can be estimated. Specifically, we relied on the method outlined in Marks, Seltzer, and Krótki’s 1974 text. These were necessary assumptions at the time, but newer methods don’t require us to assume as much.

2000s

Human Rights Equation
Guatemala massacre

KOSOVO | Searching for the missing

photo: rnw.org / CC BY-NR 2.0

Kosovo, Peru, and Colombia

In the decade following our work in Guatemala, we contributed statistical estimates to the International Criminal Tribunal for the Former Yugoslavia, the Peruvian Truth Commission, and a study of violence targeted at union members in Colombia. In Peru, our analysis concluded that a higher proportion of the violence was attributable to the Shining Path than had been observed in the human rights documentation prior to the Truth Commission.

What changed: These analyses improved upon our earlier method by adjusting for possible relationships between lists and different probabilities of being included on each list through an approach called loglinear modeling, as defined by Bishop, Fienberg, and Holland in their 1975 text. This method assumes that a single correct model can be estimated, but there may be many possible models.

2010s

Human Rights Equation
Guatemala massacre

COLOMBIA | Remembering the dead

photo: cortesía de la Asociación Caminos de esperanza

Colombia, Syria, and El Salvador

Working with human rights and academic partners, we estimated the total number of individuals killed or missing in Casanare, Colombia, victims killed in state custody in Syria, and deaths and disappearances during the civil war in El Salvador. The Salvadoran estimates are particularly salient as El Salvador’s amnesty law was declared unconstitutional in 2016, creating an opportunity to hold perpetrators accountable.

What changed: Our analyses in Colombia, Syria, and El Salvador marked a transition in our methodological approach to fully Bayesian methods. The new Bayesian techniques build on the benefits of the loglinear approach, and they enable us to consider a range of plausible models. Technically, we relied on decomposable graphical models, as outlined by Madigan and York in their 1997 paper, to calculate a posterior probability distribution for the estimated total number of victims.

2018-2019

Human Rights Equation
Guatemala massacre

PHILIPPINES | The site of a murder by unidentified gunmen

photo: Kimberly dela Cruz

Colombia, the Philippines, Sri Lanka, and Korea

Over the past year, we have estimated the number of social movement leaders killed in Colombia, drug-related killings in the Philippines, people disappeared at the end of the Sri Lankan civil war, and Korean women held in sexual slavery during WWII in Palembang, Indonesia. Our work in the Philippines, in collaboration with the Stabile Center for Investigative Journalism at Columbia University, found that more than three times as many people have been killed as the number reported by police.

What changed: This new method, developed by our colleague Daniel Manrique-Vallier, is called LCMCR, and it advances the previous techniques by enabling us to consider how victims might have different probabilities of being included on each list. Technically, this approach uses non-parametric latent class models to calculate a posterior probability distribution, as described in Manrique-Vallier’s 2016 paper.

Evolution of estimation

As scientists, we are constantly striving to improve our methods so that they more accurately reflect reality and require fewer assumptions about the underlying data. Every analytical method requires assumptions. We are transparent about ours.

Three key assumptions we manage as our methodology evolves:

1

Every victim has the same likelihood of being recorded

This is generally not true in our work, and we modify this assumption in later applications. Another way to think about this assumption is that some stories are amplified, while others are muffled. For example, the stories of a well-known politician or victims of a bombing at a crowded marketplace in a major city will be amplified, but a single person killed on a quiet street at night, or violence that occurs in rural villages, will likely be muffled. By adjusting this assumption we are accounting for these different voices in our analyses.

2

The likelihood that a victim will be recorded on one list is not related to the likelihood that they are also included on another list

This is generally not true in our work and we modify this assumption in later applications. Positively correlated lists are another way that stories are amplified or muffled — similar high-profile stories are likely to be told by multiple news outlets and social media sources whereas the same sources all may miss instances of rural violence. Lists also may be negatively correlated — a victim who is likely to be reported to a local NGO documenting violence may be very unlikely to be recorded by documentation efforts by the state.

3

Single vs. multiple plausible models

Assumes that it is possible to identify the single correct model to estimate the total number of victims. In later applications we expand our ability to take into account the possibility of multiple different models to describe the observed data.

What’s next

A few examples of methodological challenges that we’re working on with our academic colleagues:

> MSE requires identifying and linking multiple records that refer to the same victim. In reality, this process includes some uncertainty. How can we incorporate this uncertainty into our final estimate of the total number of victims?

> MSE estimates are often calculated for specific subsets — victims killed in Damascus in February 2012 — called strata. If strata are too small, model calculations may be unreliable. How should we identify optimal strata?

> Examining estimates in multiple strata can potentially generate multiple comparisons. Can we choose strata prior to calculating estimates? Is there a rigorous way to decide to look at additional strata after calculating estimates?

> MSE implicitly assumes that all victims have some chance of being observed and recorded, but some victims are very unlikely to ever be observed. Can we define the minimum probability that a victim is recorded on any list at which our models begin to fail?

With each advance, we narrow the gap between what is observed and what we want to know.
Support our work with donations to hrdag.org/donate

Artificial intelligence and criminal justice

Every day, judges have to make decisions about the fate of arrestees awaiting trial. Should they be released under supervision…or not? Pre-trial risk assessment tools are supposed to aid judges in those decisions by making recommendations based on calculations from mathematical models. These tools, or mathematical models, are in active use across the United States. This year, our team evaluated two such tools: one designed by the Criminal Justice Agency (CJA) in New York City, and the Arnold Public Safety Assessment (PSA), as implemented in San Francisco.

Arrests

The tool designed by CJA attempts to predict the likelihood that an arrested person will be re-arrested for a felony while awaiting trial, and was used to screen defendants for acceptance into a pre-trial supervised release program. In their analysis, Lead Statistician Kristian Lum and Data Scientist Tarak Shah concluded that the current tool does not meet many standards of fairness. Additionally, they discovered that the scores used in the tool do not come directly from the model-fitting process, but rather were chosen in an ad hoc way.

The Arnold Public Safety Assessment is used in San Francisco, including among clients represented by the city’s Public Defender’s Office, and Kristian and Executive Director Megan Price investigated how recommendations based on it are affected by booking charges. Specifically, we considered a charge to be “unsubstantiated” if an individual was ultimately not found guilty of that charge. We found that unsubstantiated booking charges increase the recommended level of pre-trial supervision for 20–30% of the people evaluated by the tool.

photo: Shane McCoy / U.S. Marshals / CC BY 2.0

Remembering Herb Spirer
Herb Spirer

We said goodbye to our friend and mentor Herb Spirer who passed away last fall at the age of 93. Herb supported HRDAG since before it formally existed. With his wife Louise, Herb mentored almost everyone — or he mentored their mentors — who works in human rights data analysis today. Herb led a generation of statisticians to work in human rights, and he taught a generation of human rights activists about scientific rigor.

Herb’s life exemplified the best of what a scientist can offer: precision, honesty, modesty and a fascination for technical concerns bound to a deep commitment to making the world a better place.

“I know I speak for dozens of others when I say it is among my life’s greatest points of pride to number myself among his students.”

— Patrick Ball, HRDAG Director of Research

photo: Barak Yedidia, used with permission

The people behind HRDAG
HRDAG Staff

HRDAG’s team includes Executive Director Megan Price, Director of Research Patrick Ball, Lead Statistician Kristian Lum, Data Scientist Tarak Shah, Operations Coordinator Suzanne Nathans, as well consultants, interns, and fellows. Our team is based in San Francisco, and our partners are located in countries around the world. We are supported by an advisory board composed of: Julie Broome, Margot Gerritsen, Michael Kleinman, Dinah PoKempner, and Frank Schulenburg.

A year by the numbers

STATISTICS FOR GOOD

8

countries with active HRDAG projects this year

4

reports assessing situations of mass violence

45

requests for technical assistance

NEW KNOWLEDGE

3

technical papers accepted in peer-reviewed publications; each paper improves the use of statistical modeling in human rights case

OUTREACH AND EDUCATION

15

requests for media interviews

70

invitations to collaborate or participate in meetings, panels, convenings, or conferences

6

universities participating in active methodological collaborations

WHAT WE NOW KNOW

About 1–2% of El Salvador’s population was killed or disappeared during the civil war.

Approximately 500 people disappeared during the last three days of the Sri Lankan civil war, after surrendering to the army.

FINANCIALS

HRDAG’s fiscal year is July 1–June 30

2018–19

2017–18

Beginning cash balance

213,428

102,145

Income

Foundation grants

1,386,971

981,186

Revenue from contracts

22,474

90,812

Direct public support

89,555

35,043

Total income

1,499,00

1,107,041

Expenses

Salaries and consultants

946,281

787,618

Travel and conferences

34,350

44,358

Rent, utilities, and technology

47,745

42,438

Supplies and other direct costs

19,892

23,083

Administration

136,359

98,260

Total expenses

1,184,627

995,757

Ending cash balance

527,801

213,428

Financial Footnote

HRDAG ended this fiscal year (FY 2018–19) with significant net assets, of which $205,359 are without donor restrictions, while $ 322,442 are obligated for use in a future period. The latter balance reflects the receipt of three significant grants in the final month of the fiscal year (June 2019). HRDAG’s grant agreements indicate that these funds are meant to be spent over the next fiscal year (FY 2019–20), or over a longer period, depending on the grant provisions.

HRDAG operates as a nonprofit project of Community Partners (communitypartners.org), a nonprofit organization that helps community leaders build and sustain effective programs that benefit the public good.

As our fiscal sponsor, Community Partners offers back-office services and the legal framework that allows nonprofit ventures to focus on their missions.

Your support makes our work possible

Human rights research — if done well — requires the dedication of a full-time team working with committed consultants. It requires investment in staff, in travel, in equipment and technology.

Primary funding comes from private, international human rights donors. The majority of our funding is not tied to specific projects, but rather supports our ongoing scientific work in human rights data analysis. A small but growing proportion of our funding specifically supports our work examining the use of predictive algorithms in the US criminal justice system.

We are extremely grateful to the following major donors for encouraging data science for good, including:

HRDAG equations

Anne R. Dow Family Foundation

Cooper Schneier Fund of The Minneapolis Foundation

Keller Family Foundation

Morgan Agnew

Patrick Ball

David Banks (in memory of Herb Spirer)

Sudhir Bhaskar

Carl Bialik

August Brocchini

Julie Broome

Cindy Cohn

Katherine Crecelius

Mark Girouard

Oli Hall

Rafe Kaplan

Ami Laws

Chris Palmer

Dinah PoKempner

Geetha Reddy

Frank Schulenburg

Rick Storrs

Lance Waller

Catherine Zennström

Anonymous

We are also very grateful to a our individual donor base and to the following private companies for including HRDAG in their Matching Gift programs and matching their employees’ generous donations:

Github

Google

LinkedIn

Oracle

Yelp

We provide analysis so that our partners — human rights advocates — can build scientifically defensible, evidence-based arguments. Our work has been used by truth commissions, international criminal tribunals, and non-governmental human rights organizations on five continents.

HRDAG equations