At HRDAG, we worry. In fact, at HRDAG worrying is a common theme.
We worry about what we don’t know, and we worry about how what we don’t
know might adversely affect the lives of real people. We worry about how to
count those who haven’t been counted, and how to give
voices to people who have been silenced.
The stakes are high, because so many policies and programs
now claim to be “data-driven.” As scientists, we support
data collection and analysis, but we’re also realistic about
the limitations of observed data. In our work researching
violence we know that there are many casualties that we can
not count, violence that hasn’t been documented or observed. Unless we use
our analytical skills to estimate what we don’t know, we can’t really understand
what’s happened.
››› more
For more than 20 years, we’ve challenged
the science and the methods so that we
can get better and better at estimating the
true tolls of unobserved violence. Our work
is made possible with a method known as
multiple systems estimation, or MSE, as we
affectionately refer to it. MSE is constantly
evolving, which means that we are, too, always
refining our ability to estimate unobserved
violence.
We’re constantly advancing the statistical
science of knowing. In the following pages,
you’ll find examples of how our scientific
advances have helped to uncover deeper
truths — such as the fact that during
Guatemala’s civil war, violence in rural
areas was much greater than had ever been
documented, and the fact that in Peru, a higher
proportion of the violence was attributable to
the Shining Path than had been observed in
the human rights documentation.
In the end, HRDAG analysis helps communities
to answer a question that can be incredibly
hard to answer: “What really happened here?”
We hope that by providing the answer, we
contribute to it never happening again.
Thank you for joining us in this critical effort
and for sharing our belief that rigorous data
science can advance justice and accountability.
— Megan Price, Executive Director
Over the past 20 years working on projects in nine countries, HRDAG has refined and advanced one of the key methods we use to estimate what we do not know. A common theme in all our work is estimating what’s missing, what’s unrecorded. By estimating what is unobserved, we tell the true story, using statistics to include those who have been unable to speak. The findings in each of the following examples would not have been possible without statistical analysis.
1999
GUATEMALA | Taken away, never to be seen again.
photo: Elena Hermosa/Trócaire
The Guatemalan Truth Commission’s conclusion that the Army
committed acts of genocide against the Mayan population was
based, in part, on our first application of Multiple Systems
Estimation (MSE). The testimonies given to the UN’s Commission
for Historical Clarification and other sources did not document all
the deaths, nor were they collected at random. The findings were
therefore neither complete nor representative. The Commission’s
conclusion that violence was targeted at the Mayan population
could have been undermined by pointing out gaps in the observed
data. By using MSE to adjust for incomplete data, we created a
more robust result that could withstand scrutiny and criticism.
Initial assumptions: This first analysis relied on a version of MSE
that requires strong assumptions, including the condition that
every individual victim has the same probability of being recorded
on each list; the lists are independent; and a single correct model
can be estimated. Specifically, we relied on the method outlined
in Marks, Seltzer, and Krótki’s 1974 text. These were necessary
assumptions at the time, but newer methods don’t require us to
assume as much.
2000s
KOSOVO | Searching for the missing
photo: rnw.org / CC BY-NR 2.0
In the decade following our work in Guatemala, we contributed
statistical estimates to the International Criminal Tribunal for the
Former Yugoslavia, the Peruvian Truth Commission, and a study
of violence targeted at union members in Colombia. In Peru, our
analysis concluded that a higher proportion of the violence was
attributable to the Shining Path than had been observed in the
human rights documentation prior to the Truth Commission.
What changed: These analyses improved upon our earlier method
by adjusting for possible relationships between lists and different
probabilities of being included on each list through an approach
called loglinear modeling, as defined by Bishop, Fienberg, and
Holland in their 1975 text. This method assumes that a single
correct model can be estimated, but there may be many possible
models.
2010s
COLOMBIA | Remembering the dead
photo: cortesía de la Asociación Caminos de esperanza
Working with human rights and academic partners, we estimated
the total number of individuals killed or missing in Casanare,
Colombia, victims killed in state custody in Syria, and deaths and
disappearances during the civil war in El Salvador. The Salvadoran
estimates are particularly salient as El Salvador’s amnesty law was
declared unconstitutional in 2016, creating an opportunity to hold
perpetrators accountable.
What changed: Our analyses in Colombia, Syria, and El Salvador
marked a transition in our methodological approach to fully
Bayesian methods. The new Bayesian techniques build on
the benefits of the loglinear approach, and they enable us to
consider a range of plausible models. Technically, we relied on
decomposable graphical models, as outlined by Madigan and York
in their 1997 paper, to calculate a posterior probability distribution
for the estimated total number of victims.
2018-2019
PHILIPPINES | The site of a murder by unidentified gunmen
photo: Kimberly dela Cruz
Over the past year, we have estimated the number of social
movement leaders killed in Colombia, drug-related killings in
the Philippines, people disappeared at the end of the Sri Lankan
civil war, and Korean women held in sexual slavery during
WWII in Palembang, Indonesia. Our work in the Philippines, in
collaboration with the Stabile Center for Investigative Journalism
at Columbia University, found that more than three times as many
people have been killed as the number reported by police.
What changed: This new method, developed by our colleague
Daniel Manrique-Vallier, is called LCMCR, and it advances the
previous techniques by enabling us to consider how victims
might have different probabilities of being included on each
list. Technically, this approach uses non-parametric latent class
models to calculate a posterior probability distribution, as
described in Manrique-Vallier’s 2016 paper.
As scientists, we are constantly striving to improve our methods so that they more accurately reflect reality and require fewer assumptions about the underlying data. Every analytical method requires assumptions. We are transparent about ours.
This is generally not true in our work, and we modify this assumption in later applications. Another way to think about this assumption is that some stories are amplified, while others are muffled. For example, the stories of a well-known politician or victims of a bombing at a crowded marketplace in a major city will be amplified, but a single person killed on a quiet street at night, or violence that occurs in rural villages, will likely be muffled. By adjusting this assumption we are accounting for these different voices in our analyses.
This is generally not true in our work and we modify this assumption in later applications. Positively correlated lists are another way that stories are amplified or muffled — similar high-profile stories are likely to be told by multiple news outlets and social media sources whereas the same sources all may miss instances of rural violence. Lists also may be negatively correlated — a victim who is likely to be reported to a local NGO documenting violence may be very unlikely to be recorded by documentation efforts by the state.
Assumes that it is possible to identify the single correct model to estimate the total number of victims. In later applications we expand our ability to take into account the possibility of multiple different models to describe the observed data.
A few examples of methodological challenges that we’re working on with our academic colleagues:
> MSE requires identifying and linking multiple records that refer to the same victim. In reality, this process includes some uncertainty. How can we incorporate this uncertainty into our final estimate of the total number of victims?
> MSE estimates are often calculated for specific subsets — victims killed in Damascus in February 2012 — called strata. If strata are too small, model calculations may be unreliable. How should we identify optimal strata?
> Examining estimates in multiple strata can potentially generate multiple comparisons. Can we choose strata prior to calculating estimates? Is there a rigorous way to decide to look at additional strata after calculating estimates?
> MSE implicitly assumes that all victims have some chance of being observed and recorded, but some victims are very unlikely to ever be observed. Can we define the minimum probability that a victim is recorded on any list at which our models begin to fail?
Every day, judges have to make decisions about the fate of arrestees awaiting trial. Should they be released under supervision…or not? Pre-trial risk assessment tools are supposed to aid judges in those decisions by making recommendations based on calculations from mathematical models. These tools, or mathematical models, are in active use across the United States. This year, our team evaluated two such tools: one designed by the Criminal Justice Agency (CJA) in New York City, and the Arnold Public Safety Assessment (PSA), as implemented in San Francisco.
The tool designed by CJA attempts to predict the likelihood that an arrested person will be re-arrested for a felony while awaiting trial, and was used to screen defendants for acceptance into a pre-trial supervised release program. In their analysis, Lead Statistician Kristian Lum and Data Scientist Tarak Shah concluded that the current tool does not meet many standards of fairness. Additionally, they discovered that the scores used in the tool do not come directly from the model-fitting process, but rather were chosen in an ad hoc way.
The Arnold Public Safety Assessment is used in San Francisco, including among clients represented by the city’s Public Defender’s Office, and Kristian and Executive Director Megan Price investigated how recommendations based on it are affected by booking charges. Specifically, we considered a charge to be “unsubstantiated” if an individual was ultimately not found guilty of that charge. We found that unsubstantiated booking charges increase the recommended level of pre-trial supervision for 20–30% of the people evaluated by the tool.
photo: Shane McCoy / U.S. Marshals / CC BY 2.0
We said goodbye to our friend and mentor Herb Spirer who passed away last fall at the age of 93. Herb supported HRDAG since before it formally existed. With his wife Louise, Herb mentored almost everyone — or he mentored their mentors — who works in human rights data analysis today. Herb led a generation of statisticians to work in human rights, and he taught a generation of human rights activists about scientific rigor.
Herb’s life exemplified the best of what a scientist can offer: precision, honesty, modesty and a fascination for technical concerns bound to a deep commitment to making the world a better place.
“I know I speak for dozens of others when I say it is among my life’s greatest points of pride to number myself among his students.”
— Patrick Ball, HRDAG Director of Research
photo: Barak Yedidia, used with permission
HRDAG’s team includes Executive Director Megan Price, Director of Research Patrick Ball, Lead Statistician Kristian Lum, Data Scientist Tarak Shah, Operations Coordinator Suzanne Nathans, as well consultants, interns, and fellows. Our team is based in San Francisco, and our partners are located in countries around the world. We are supported by an advisory board composed of: Julie Broome, Margot Gerritsen, Michael Kleinman, Dinah PoKempner, and Frank Schulenburg.
STATISTICS FOR GOOD
8
countries with active HRDAG projects this year
4
reports assessing situations of mass violence
45
requests for technical assistance
NEW KNOWLEDGE
3
technical papers accepted in peer-reviewed publications; each paper improves the use of statistical modeling in human rights case
OUTREACH AND EDUCATION
15
requests for media interviews
70
invitations to collaborate or participate in meetings, panels, convenings, or conferences
6
universities participating in active methodological collaborations
WHAT WE NOW KNOW
About 1–2% of El Salvador’s population was killed or disappeared during the civil war.
Approximately 500 people disappeared during the last three days of the Sri Lankan civil war, after surrendering to the army.
FINANCIALS
HRDAG’s fiscal year is July 1–June 30
2018–19
2017–18
Beginning cash balance
213,428
102,145
Income
Foundation grants
1,386,971
981,186
Revenue from contracts
22,474
90,812
Direct public support
89,555
35,043
Total income
1,499,00
1,107,041
Expenses
Salaries and consultants
946,281
787,618
Travel and conferences
34,350
44,358
Rent, utilities, and technology
47,745
42,438
Supplies and other direct costs
19,892
23,083
Administration
136,359
98,260
Total expenses
1,184,627
995,757
Ending cash balance
527,801
213,428
Financial Footnote
HRDAG ended this fiscal year (FY 2018–19) with significant net assets, of which $205,359 are without donor restrictions, while $ 322,442 are obligated for use in a future period. The latter balance reflects the receipt of three significant grants in the final month of the fiscal year (June 2019). HRDAG’s grant agreements indicate that these funds are meant to be spent over the next fiscal year (FY 2019–20), or over a longer period, depending on the grant provisions.
HRDAG operates as a nonprofit project of Community Partners (communitypartners.org), a nonprofit organization that helps community leaders build and sustain effective programs that benefit the public good.
As our fiscal sponsor, Community Partners offers back-office services and the legal framework that allows nonprofit ventures to focus on their missions.
Human rights research — if done well — requires the dedication of a full-time team working with committed consultants. It requires investment in staff, in travel, in equipment and technology.
Primary funding comes from private, international human rights donors. The majority of our funding is not tied to specific projects, but rather supports our ongoing scientific work in human rights data analysis. A small but growing proportion of our funding specifically supports our work examining the use of predictive algorithms in the US criminal justice system.
Anne R. Dow Family Foundation
Cooper Schneier Fund of The Minneapolis Foundation
Keller Family Foundation
Morgan Agnew
Patrick Ball
David Banks (in memory of Herb Spirer)
Sudhir Bhaskar
Carl Bialik
August Brocchini
Julie Broome
Cindy Cohn
Katherine Crecelius
Mark Girouard
Oli Hall
Rafe Kaplan
Ami Laws
Chris Palmer
Dinah PoKempner
Geetha Reddy
Frank Schulenburg
Rick Storrs
Lance Waller
Catherine Zennström
Anonymous
We are also very grateful to a our individual donor base and to the following private companies for including HRDAG in their Matching Gift programs and matching their employees’ generous donations:
Github
Oracle
Yelp