At HRDAG, we worry. In fact, at HRDAG worrying is a common theme.
We worry about what we don’t know, and we worry about how what we don’t
know might adversely affect the lives of real people. We worry about how to
count those who haven’t been counted, and how to give
voices to people who have been silenced.
The stakes are high, because so many policies and programs now claim to be “data-driven.” As scientists, we support data collection and analysis, but we’re also realistic about the limitations of observed data. In our work researching violence we know that there are many casualties that we can not count, violence that hasn’t been documented or observed. Unless we use our analytical skills to estimate what we don’t know, we can’t really understand what’s happened.
For more than 20 years, we’ve challenged
the science and the methods so that we
can get better and better at estimating the
true tolls of unobserved violence. Our work
is made possible with a method known as
multiple systems estimation, or MSE, as we
affectionately refer to it. MSE is constantly
evolving, which means that we are, too, always
refining our ability to estimate unobserved
We’re constantly advancing the statistical science of knowing. In the following pages, you’ll find examples of how our scientific advances have helped to uncover deeper truths — such as the fact that during Guatemala’s civil war, violence in rural areas was much greater than had ever been documented, and the fact that in Peru, a higher proportion of the violence was attributable to the Shining Path than had been observed in the human rights documentation.
In the end, HRDAG analysis helps communities to answer a question that can be incredibly hard to answer: “What really happened here?” We hope that by providing the answer, we contribute to it never happening again.
Thank you for joining us in this critical effort and for sharing our belief that rigorous data science can advance justice and accountability.
— Megan Price, Executive Director
Over the past 20 years working on projects in nine countries, HRDAG has refined and advanced one of the key methods we use to estimate what we do not know. A common theme in all our work is estimating what’s missing, what’s unrecorded. By estimating what is unobserved, we tell the true story, using statistics to include those who have been unable to speak. The findings in each of the following examples would not have been possible without statistical analysis.
The Guatemalan Truth Commission’s conclusion that the Army
committed acts of genocide against the Mayan population was
based, in part, on our first application of Multiple Systems
Estimation (MSE). The testimonies given to the UN’s Commission
for Historical Clarification and other sources did not document all
the deaths, nor were they collected at random. The findings were
therefore neither complete nor representative. The Commission’s
conclusion that violence was targeted at the Mayan population
could have been undermined by pointing out gaps in the observed
data. By using MSE to adjust for incomplete data, we created a
more robust result that could withstand scrutiny and criticism.
Initial assumptions: This first analysis relied on a version of MSE that requires strong assumptions, including the condition that every individual victim has the same probability of being recorded on each list; the lists are independent; and a single correct model can be estimated. Specifically, we relied on the method outlined in Marks, Seltzer, and Krótki’s 1974 text. These were necessary assumptions at the time, but newer methods don’t require us to assume as much.
In the decade following our work in Guatemala, we contributed
statistical estimates to the International Criminal Tribunal for the
Former Yugoslavia, the Peruvian Truth Commission, and a study
of violence targeted at union members in Colombia. In Peru, our
analysis concluded that a higher proportion of the violence was
attributable to the Shining Path than had been observed in the
human rights documentation prior to the Truth Commission.
What changed: These analyses improved upon our earlier method by adjusting for possible relationships between lists and different probabilities of being included on each list through an approach called loglinear modeling, as defined by Bishop, Fienberg, and Holland in their 1975 text. This method assumes that a single correct model can be estimated, but there may be many possible models.
Working with human rights and academic partners, we estimated
the total number of individuals killed or missing in Casanare,
Colombia, victims killed in state custody in Syria, and deaths and
disappearances during the civil war in El Salvador. The Salvadoran
estimates are particularly salient as El Salvador’s amnesty law was
declared unconstitutional in 2016, creating an opportunity to hold
What changed: Our analyses in Colombia, Syria, and El Salvador marked a transition in our methodological approach to fully Bayesian methods. The new Bayesian techniques build on the benefits of the loglinear approach, and they enable us to consider a range of plausible models. Technically, we relied on decomposable graphical models, as outlined by Madigan and York in their 1997 paper, to calculate a posterior probability distribution for the estimated total number of victims.
Over the past year, we have estimated the number of social
movement leaders killed in Colombia, drug-related killings in
the Philippines, people disappeared at the end of the Sri Lankan
civil war, and Korean women held in sexual slavery during
WWII in Palembang, Indonesia. Our work in the Philippines, in
collaboration with the Stabile Center for Investigative Journalism
at Columbia University, found that more than three times as many
people have been killed as the number reported by police.
What changed: This new method, developed by our colleague Daniel Manrique-Vallier, is called LCMCR, and it advances the previous techniques by enabling us to consider how victims might have different probabilities of being included on each list. Technically, this approach uses non-parametric latent class models to calculate a posterior probability distribution, as described in Manrique-Vallier’s 2016 paper.
As scientists, we are constantly striving to improve our methods so that they more accurately reflect reality and require fewer assumptions about the underlying data. Every analytical method requires assumptions. We are transparent about ours.
This is generally not true in our work, and we modify this assumption in later applications. Another way to think about this assumption is that some stories are amplified, while others are muffled. For example, the stories of a well-known politician or victims of a bombing at a crowded marketplace in a major city will be amplified, but a single person killed on a quiet street at night, or violence that occurs in rural villages, will likely be muffled. By adjusting this assumption we are accounting for these different voices in our analyses.
This is generally not true in our work and we modify this assumption in later applications. Positively correlated lists are another way that stories are amplified or muffled — similar high-profile stories are likely to be told by multiple news outlets and social media sources whereas the same sources all may miss instances of rural violence. Lists also may be negatively correlated — a victim who is likely to be reported to a local NGO documenting violence may be very unlikely to be recorded by documentation efforts by the state.
Assumes that it is possible to identify the single correct model to estimate the total number of victims. In later applications we expand our ability to take into account the possibility of multiple different models to describe the observed data.
A few examples of methodological challenges that we’re working on with our academic colleagues:
> MSE requires identifying and linking multiple records that refer to the same victim. In reality, this process includes some uncertainty. How can we incorporate this uncertainty into our final estimate of the total number of victims?
> MSE estimates are often calculated for specific subsets — victims killed in Damascus in February 2012 — called strata. If strata are too small, model calculations may be unreliable. How should we identify optimal strata?
> Examining estimates in multiple strata can potentially generate multiple comparisons. Can we choose strata prior to calculating estimates? Is there a rigorous way to decide to look at additional strata after calculating estimates?
> MSE implicitly assumes that all victims have some chance of being observed and recorded, but some victims are very unlikely to ever be observed. Can we define the minimum probability that a victim is recorded on any list at which our models begin to fail?
Every day, judges have to make decisions about the fate of arrestees awaiting trial. Should they be released under supervision…or not? Pre-trial risk assessment tools are supposed to aid judges in those decisions by making recommendations based on calculations from mathematical models. These tools, or mathematical models, are in active use across the United States. This year, our team evaluated two such tools: one designed by the Criminal Justice Agency (CJA) in New York City, and the Arnold Public Safety Assessment (PSA), as implemented in San Francisco.
The tool designed by CJA attempts to predict the likelihood that an arrested person will be re-arrested for a felony while awaiting trial, and was used to screen defendants for acceptance into a pre-trial supervised release program. In their analysis, Lead Statistician Kristian Lum and Data Scientist Tarak Shah concluded that the current tool does not meet many standards of fairness. Additionally, they discovered that the scores used in the tool do not come directly from the model-fitting process, but rather were chosen in an ad hoc way.
The Arnold Public Safety Assessment is used in San Francisco, including among clients represented by the city’s Public Defender’s Office, and Kristian and Executive Director Megan Price investigated how recommendations based on it are affected by booking charges. Specifically, we considered a charge to be “unsubstantiated” if an individual was ultimately not found guilty of that charge. We found that unsubstantiated booking charges increase the recommended level of pre-trial supervision for 20–30% of the people evaluated by the tool.
photo: Shane McCoy / U.S. Marshals / CC BY 2.0
We said goodbye to our friend and mentor Herb Spirer who passed away last fall at the age of 93. Herb supported HRDAG since before it formally existed. With his wife Louise, Herb mentored almost everyone — or he mentored their mentors — who works in human rights data analysis today. Herb led a generation of statisticians to work in human rights, and he taught a generation of human rights activists about scientific rigor.
Herb’s life exemplified the best of what a scientist can offer: precision, honesty, modesty and a fascination for technical concerns bound to a deep commitment to making the world a better place.
“I know I speak for dozens of others when I say it is among my life’s greatest points of pride to number myself among his students.”
— Patrick Ball, HRDAG Director of Research
photo: Barak Yedidia, used with permission
HRDAG’s team includes Executive Director Megan Price, Director of Research Patrick Ball, Lead Statistician Kristian Lum, Data Scientist Tarak Shah, Operations Coordinator Suzanne Nathans, as well consultants, interns, and fellows. Our team is based in San Francisco, and our partners are located in countries around the world. We are supported by an advisory board composed of: Julie Broome, Margot Gerritsen, Michael Kleinman, Dinah PoKempner, and Frank Schulenburg.
STATISTICS FOR GOOD
countries with active HRDAG projects this year
reports assessing situations of mass violence
requests for technical assistance
technical papers accepted in peer-reviewed publications; each paper improves the use of statistical modeling in human rights case
OUTREACH AND EDUCATION
requests for media interviews
invitations to collaborate or participate in meetings, panels, convenings, or conferences
universities participating in active methodological collaborations
WHAT WE NOW KNOW
About 1–2% of El Salvador’s population was killed or disappeared during the civil war.
Approximately 500 people disappeared during the last three days of the Sri Lankan civil war, after surrendering to the army.
HRDAG’s fiscal year is July 1–June 30
Beginning cash balance
Revenue from contracts
Direct public support
Salaries and consultants
Travel and conferences
Rent, utilities, and technology
Supplies and other direct costs
Ending cash balance
HRDAG ended this fiscal year (FY 2018–19) with significant net assets, of which $205,359 are without donor restrictions, while $ 322,442 are obligated for use in a future period. The latter balance reflects the receipt of three significant grants in the final month of the fiscal year (June 2019). HRDAG’s grant agreements indicate that these funds are meant to be spent over the next fiscal year (FY 2019–20), or over a longer period, depending on the grant provisions.
HRDAG operates as a nonprofit project of Community Partners (communitypartners.org), a nonprofit organization that helps community leaders build and sustain effective programs that benefit the public good.
As our fiscal sponsor, Community Partners offers back-office services and the legal framework that allows nonprofit ventures to focus on their missions.
Human rights research — if done well — requires the dedication of a full-time team working with committed consultants. It requires investment in staff, in travel, in equipment and technology.
Primary funding comes from private, international human rights donors. The majority of our funding is not tied to specific projects, but rather supports our ongoing scientific work in human rights data analysis. A small but growing proportion of our funding specifically supports our work examining the use of predictive algorithms in the US criminal justice system.
Anne R. Dow Family Foundation
Cooper Schneier Fund of The Minneapolis Foundation
Keller Family Foundation
David Banks (in memory of Herb Spirer)
We are also very grateful to a our individual donor base and to the following private companies for including HRDAG in their Matching Gift programs and matching their employees’ generous donations: