Lessons at HRDAG: Holding Public Institutions Accountable
As someone who was raised on the South Side of Chicago and was incarcerated for a little over 12 years, by the time I arrived at the Human Rights Data Analysis Group, I already had a passion for social justice. While incarcerated, I taught myself Python and helped to develop and maintain a Python/SQL library database. Through the Education Justice Project, I took classes in mathematics and computer science, as well.
Upon release, I continued to pursue my interest in data science. This journey led me to volunteer at the Invisible Institute, a nonprofit journalism production company on the South Side of Chicago that works to enhance the capacity of citizens to hold public institutions accountable. While volunteering at the institute I participated in Code Review, a community where Black people who want to learn about data science come to learn skills. It was there that I learned the ins and outs of data science applied to criminal justice reform. It was during this time that I was introduced to and worked alongside Trina Reynolds-Tyler, Zaynaib Giwa, Rajiv Sinclair, Sukari Stone, Tarak Shah, Bailey Passmore and Patrick Ball.
While doing my internship at HRDAG (October 2021-February 2022), I refined my skills in git/Github, Pandas, Python, Bash (and other command line tools), and I learned to work in linux and iOS environments. In addition to these technical skills, I also learned a lot about team leadership, organization, and workflow. But perhaps one of the most important things I learned at HRDAG was Principled Data Processing.
As HRDAG’s director of research Patrick Ball put it, Principled Data Processing is a way to prove to someone, usually yourself, that what you did was right. This is done through organizing workflow in such a way that when anyone looks at your work they can see:
- what you did
- how you did it
- where you did it
- why you did things the way you did
This process is important because it gives us, as data scientists, a direct way for our work to be transparent, auditable, reproducible, and scalable. To be transparent simply means that anyone—yourself or a colleague—can review the work. Auditability means that you can prove, usually to yourself, that the work is accurate and you can prove that you are doing what you say you are doing. The next principle is reproducibility, which is the gold standard for every science in the entire universe. We need to be able to have it so that your future self and anyone else can do what you already proved that you have done. Which leads to my favorite thing about Principled Data Processing—scalability. This is my favorite because many times I have struggled with having a congruent way of working with a team, more than one dataset, and dealing with reporting on updated data.
For me, things often get lost in the proverbial sauce. What I mean by that is, when working with more than one person, communication is not always the best. However, even when communication is stellar, there is sometimes still a breakdown when working with multiple datasets or updating datasets. With Principled Data Processing, I have learned a scientific way of dealing with these problems, rather than using an app that I’m not really sure how I got the numbers or graph that I did. Principled Data Processing has given me the tools to be a professional data scientist.
I am currently working on an Invisible Institute project called Beneath the Surface, which uses data science to investigate gender-based violence at the hands of the Chicago Police Department (CPD). My latest contribution is looking at CPD’s handling of cases of people who went missing, to investigate systemic racism and sexism.
In the future, I plan to finish earning my bachelor’s degree—I only have 15 credits left to complete. My plan is to use my data science skills to do social justice work and to contribute to a space where people of color have access to the tools and resources they need to empower their own communities. Working with HRDAG helped me build the skills so that we could prove, to ourselves and others, that data science is one of the tools we can use to hold public institutions accountable.