At HRDAG, 2021 was all about service and partnership.
In 2020, HRDAG provided clarity on issues related to the pandemic, police misconduct, and more.
The modular nature of the workflow and use of Git allowed us to work on different parts of the project from across the country.
Ball analyzed the data reporters had collected from a variety of sources – including on-the-ground interviews, police records, and human rights groups – and used a statistical technique called multiple systems estimation to roughly calculate the number of unreported deaths in three areas of the capital city Manila.
The team discovered that the number of drug-related killings was much higher than police had reported. The journalists, who published their findings last month in The Atlantic, documented 2,320 drug-linked killings over an 18-month period, approximately 1,400 more than the official number. Ball’s statistical analysis, which estimated the number of killings the reporters hadn’t heard about, found that close to 3,000 people could have been killed – more than three times the police figure.
Ball said there are both moral and technical reasons for making sure everyone who has been killed in mass violence is counted.
“The moral reason is because everyone who has been murdered should be remembered,” he said. “A terrible thing happened to them and we have an obligation as a society to justice and to dignity to remember them.”
Shemika Lamare has joined the HRDAG team as our new data science fellow.
One of the researchers, a Michigan State PhD candidate named William Isaac, had not previously heard of New Orleans’ partnership with Palantir, but he recognized the data-mapping model at the heart of the program. “I think the data they’re using, there are serious questions about its predictive power. We’ve seen very little about its ability to forecast violent crime,” Isaac said.
I began working with HRDAG in the summer of 2001 before it was ever even called HRDAG. In fact, not intended as a boast, I think I’m responsible for coming up with the name. After contracting with Dr. Patrick Ball for a time writing the Analyzer data management platform, I left New York City and joined him in Washington, DC, at AAAS in 2002. Soon after starting, Patrick decided to establish an identity for this new team, consisting mainly of myself, Miguel Cruz and a handful of field relationships. We discussed what to name it briefly in the AAAS Science & Policy break room, which at the time, being in the mind of unclever descriptive naming ...
What is a controlled vocabulary?
A controlled vocabulary provides the ability to transform information that has been collected on violations, victims, and perpetrators into a countable set of data categories. It is important that this process be done without discarding relevant information and without misrepresenting the collected information.
Why is it necessary?
The data collected about human rights violations originates from a wide range of information sources – legal case files, newspaper articles, e-mails, faxes, letters, phone conversations, testimonies, interviews, radio and television programs, video clips, and photos. This wide range of ...
In this story, Guerrini discusses the impact of HRDAG’s work in Guatemala, especially the trials of General José Efraín Ríos Montt and Colonel Héctor Bol de la Cruz, as well as work in El Salvador, Syria, Kosovo, and Timor-Leste. Multiple systems estimation and the perils of using raw data to draw conclusions are also addressed.
Megan Price and Patrick Ball are quoted, especially in regard to how to use raw data.
“From our perspective,” Price says, “the solution to that is both to stay very close to the data, to be very conservative in your interpretation of it and to be very clear about where the data came from, how it was collected, what its limitations might be, and to a certain extent to be skeptical about it, to ask yourself questions like, ‘What is missing from this data?’ and ‘How might that missing information change these conclusions that I’m trying to draw?’”
In our work, we merge many databases to figure out how many people have been killed in violent conflict. Merging is a lot harder than you might think.
Many of the database records refer to the same people--the records are duplicated. We want to identify and link all the records that refer to the same victims so that each victim is counted only once, and so that we can use the structure of overlapping records to do multiple systems estimation.
Merging records that refer to the same person is called entity resolution, database deduplication, or record linkage. For definitive overviews of the field, see Scheuren, Herzog, and Winkler, Data Quality ...
We’ve known for years that Beka Steorts is on the cutting-edge of statistical science, and now The MIT Technology Review has realized the same. Last week she was named one of 35 Innovators Under 35, in the category of humanitarian.
We first became familiar with Beka's work in 2013 when she was a visiting professor at Carnegie Mellon and was introduced to us by Prof. Steve Fienberg. Since then, we’ve felt very fortunate to collaborate with her on projects such as the UN enumeration of casualties in the Syrian conflict, and we look forward to many more years of work with her. She is one of several young stars we include in our superheroine hall ...
Last Thursday, HRDAG co-founder and director of research Megan Price presented at Strata, the conference for data scientists and people who work with "big data." In her talk, she addressed the question of how we can know the actual number of conflict casualties in Syrian. Her short answer was, "We don't know." The longer answer was that we have a very good idea of how many conflict casualties have been reported, by several documentation groups, and that we're working on analyzing (more…)
The coding, from my perspective, is the heart of the project. I say this, because the coding team has the responsibility of selecting documents according to the random sample, recording the documents’ contents, and applying the criteria to convert that content into an entry in a quantitative database. Not to mention the fact that this team has the privilege of being in direct contact with the documents.
At present, because of advanced organizational processes, not everyone has a chance to hold an original document in their hands. The quantitative study had many advantages in this regard; since we started work in parallel with the archival ...
Bailey joined HRDAG as a data scientist in 2022.