Algorithmic tools like PredPol were supposed to reduce bias. But HRDAG has found that racial bias is baked into the data used to train the tools.
Cory Doctorow of Boing Boing writes about HRDAG director of research Patrick Ball’s article “Violence in Blue,” published March 4 in Granta. From the post: “In a must-read article in Granta, Ball explains the fundamentals of statistical estimation, and then applies these techniques to US police killings, merging data-sets from the police and the press to arrive at an estimate of the knowable US police homicides (about 1,250/year) and the true total (about 1,500/year). That means that of all the killings by strangers in the USA, one third are committed by the police.”
We are pleased to announce that HRDAG will be supported by two additions to our Advisory Board, Julie Broome and Frank Schulenburg.
We’ve worked with Julie for many years, getting to know her when she was Director of Programmes at The Sigrid Rausing Trust. She is now the Director of London-based Ariadne, a network of European funders and philanthropists. She worked at the Trust for seven years, most notably Head of Human Rights, before becoming Director of Programmes in 2014. Before joining the Trust she was Programme Director at the CEELI Institute in Prague, where she was responsible for conducting rule of law-related trainings for judges and ...
In this post, Cory Doctorow writes about the Significance article co-authored by Kristian Lum and William Isaac.
1. Is there a single source of information about the victims of the armed conflict in Colombia?
No. Colombia has an extensive documentation process for victims of the armed conflict. Hundreds of institutions, victims' organizations, and civil society organizations have focused their efforts on recording this information. However, each entity or organization develops their documentation process with its own limitations related to technical, logistical, social, and missionary capacities. No entity or organization is able to document the complete universe of victims. This is because it is impossible for them to reach every part of the country, ...
Weapons of Math Destruction: invisible, ubiquitous algorithms are ruining millions of lives. Excerpt:
As Patrick once explained to me, you can train an algorithm to predict someone’s height from their weight, but if your whole training set comes from a grade three class, and anyone who’s self-conscious about their weight is allowed to skip the exercise, your model will predict that most people are about four feet tall. The problem isn’t the algorithm, it’s the training data and the lack of correction when the model produces erroneous conclusions.
So much of what I learned at HRDAG was intangible, and I'm grateful to have been able to go deep.
From the article: “As we seek to advance the responsible use of data for racial injustice, we encourage individuals and organizations to support and build upon efforts already underway.” HRDAG is listed in the Data Driven Activism and Advocacy category.
The institution’s objectives were to learn the truth about what happened during the armed conflict.
It’s inevitable that databases will have information gaps, and special care must be taken to account for these gaps.
Cory Doctorow of Boing Boing writes about HRDAG executive director Patrick Ball and his contribution to Carl Bialik’s article about the recently released Bureau of Justice Statistics report on the number of annual police killings, both reported and unreported, in 538 Politics.
With help from HRDAG, Roman Rivera built the data backbone for the Invisible Institute's Citizens Police Data Project.
Today the United Nations Office of the High Commissioner for Human Rights (OHCHR) released an HRDAG-prepared report that describes and tallies documented killings in the Syrian Arab Republic from the beginning of the conflict in March 2011 through April 2014. (The report is here.) This is our third report for the UN on the Syrian conflict, and it is an update of work we published in January 2013 and June 2013.
The report, Updated Statistical Analysis of Documentation of Killings in the Syrian Arab Republic, concludes that approximately 191,000 identifiable victims have been reported in the period covered (March 2011 – April 2014). (more…)
This post describes how we organize our work over ten years, twenty analysts, dozens of countries, and hundreds of projects: we start with a task. A task is a single chunk of work, a quantum of workflow. Each task is self-contained and self-documenting; I'll talk about these ideas at length below. We try to keep each task as small as possible, which makes it easy to understand what the task is doing, and how to test whether the results are correct.
In the example I'll describe here, I'm going to describe work from our Syria database matching project, which includes about 100 tasks. I'll start with the first thing we do with files we receive ...
During the violence in Timor-Leste in June 2006, armed gangs broke into the offices of the Commission for Reception, Truth and Reconciliation (CAVR) in Dili and stole their motorbikes.
The Human Rights Data Analysis Group, then at Benetech®, and other human rights observers wondered whether the mobs would soon return to loot the irreplaceable paper records used by the CAVR to compile their definitive report entitled "Chega!"
The Benetech Initiative contributed to the CAVR findings and released a separate statistical report (PDF) establishing that at least 102,800 (+/- 11,000) Timorese died as a result of human rights violations in Timor-Leste ...
Over the last few years, we've tried to make the data organized in our projects publicly accessible. We have encouraged our partners to publish the data at the completion of the project. We continue to believe it is important to offer access to the data used in our projects for the sake of transparency as well as to encourage further research and analysis. However, we are increasingly concerned about how raw data are used. Data collected by what we can observe is what statisticians call a convenience sample, which is subject to selection bias.
We're keeping these datasets available for researchers who want to use them for simulation or estimation ...