How Machine Learning Makes Visible Gender-Based Violence by Police

When the Chicago grassroots organization Invisible Institute began its investigation into gender-based violence by police, it acquired decades of complaints of misconduct by Chicago police officers. The complaints comprised hundreds of thousands of pages of allegation forms, memos, various police administrative forms, interviews and testimonies, pictures, and even embedded audio files. 

The Institute’s data director, Trina Reynolds-Tyler, had been HRDAG’s 2019 Human Rights Intern, during which time she honed her data science skills, and in 2017 HRDAG intern Roman Rivera had helped the Institute to create the data backbone of the Citizens Police Data Project, while ensuring it would be scalable, transparent, and easy to maintain. While the Institute published scanned images on its Citizens Police Data Project and kicked off the Beneath the Surface project, it asked HRDAG to step in and help make these documents more useful. 

HRDAG found that some gender-based violence and sexual misconduct by police was getting buried through official coding procedures. For example, a complainant may allege that sexual misconduct took place during an “improper search of person.” That entire complaint gets coded by the police department as “improper search of person,” which leaves the allegation of sexual misconduct unrecorded, and unsearchable in official public reports. 

HRDAG trained the Institute’s team of  community organizers, human rights experts, and other community members to read and code extracted allegation descriptions from those documents in order to give visibility to allegations that would otherwise be lost. The team codes complaints with categories such as “sexual violation,” “home invasion,” “stalking/domestic violence,” “policing parents and children,” “neglect,” “LGBTQIA,” “disabled,” and “use of force.”

This hand-coded information has helped to build machine learning models that make it possible to process and analyze the entire cache of documents. Tarak Shah, one of HRDAG’s leads on the project, says, “We believe the gender-based violence we’re documenting is made invisible or under-documented.” 

Trina still meets regularly with the team at HRDAG to share code and improve her technical skills. Together we have created a data practice that is replicable, auditable, scalable, and transparent.

Further reading

Invisible Institute.
Beneath the Surface.

Recent publications

HRDAG. Tarak Shah. 13 July, 2021.
Scanning Documents to Uncover Police Violence.

HRDAG. Trina Reynolds-Tyler. 15 October, 2019.
Reflections on Data Science for Real World Problems.

HRDAG. Roman Rivera. 17 October, 2017.
In Pursuit of Excellent Data Processing.

Related videos and podcasts

PBS, 2023.
Trina Reynolds-Tyler: Bridge Builders

Acknowledgments

HRDAG was supported in this work by MacArthur Foundation, Ford Foundation, Heising Simons Foundation, and Open Society Foundations.

Image: David Peters.

See more from Applying machine learning to make sense of massive caches of data


Our work has been used by truth commissions, international criminal tribunals, and non-governmental human rights organizations. We have worked with partners on projects on five continents.

Donate