Reflections on Data Science for Real-World Problems
Trina Reynolds-Tyler was HRDAG’s 2019 Human Rights Intern.
I was born and raised on the South Side of Chicago, and I have always known I wanted to support my community through my work. If you asked me how I was going to achieve that goal 10 years ago, I would have said by working at a nonprofit. However, my experiences growing up in Chicago—seeing the failures of public health, education, and policing—led me to shift my focus to community organizing. I began organizing with a group called the Black Youth Project 100 (BYP100) around the firing of an off-duty police officer who killed Rekia Boyd, a 22-year-old Black woman headed to a party with friends in 2012. I wanted to hold Chicago Police accountable for misconduct and brutality, which led me to a position at the Invisible Institute, a journalism team on the South Side that investigates police misconduct. It was there that I came to understand the importance of data in organizing and social justice work, and set out to become a data scientist.
While organizing with BYP100, I was aware that the stories of state-sanctioned violence I heard were not happening in isolation—they were part of a larger pattern. Despite this, I didn’t have the tools to build bridges between so many different people’s shared experiences. This was partly because people who attempted to speak up about their experiences with violence were blamed by police and community members. Additionally, people who had experienced violence at the hands of the state often blamed themselves, and were therefore less likely to share their stories. It was not until I learned more about data science that I found the tools to address the social stigma and self-blame that comes along with these experiences, and was able to see the larger pattern in the stories of individuals.
Over time, I learned that data science could lay out evidence to engage the state about systemic issues and assure people that they were not alone in their experiences. To learn more, I pursued my Masters in Public Policy at University of Chicago Harris School of Public Policy. During my first year, I learned how to program in R and STATA through two statistics classes and program evaluation. Those classes were helpful in introducing the theory behind research design and regression analysis, but it was not until my Human Rights Internship with HRDAG that I was able to put that theory into practice.
My learning at HRDAG is categorized into three parts. I would categorize the first part as enhancing my toolkit and getting me organized. Before HRDAG, I hadn’t used a text editor; I was told to avoid my command line (I know someone who accidentally deleted all her directories), and most of my coding was for homework assignments that didn’t follow a clear workflow. HRDAG trained me so well that I prefer the command line when working on my computer, and I prefer to use VIM when writing code; Gus Brocchini was a phenomenal teacher during these first few weeks. This training has made me more time-efficient, has taught me to be more thoughtful about how I organize projects and has made me more confident and excited to do this work.
I would describe the second part as application and problem-solving as I put my improved toolkit to use on real-world problems. The first exercise involved data cleaning for the El Salvador project. I was given a dataset that needed to be imported, standardized, cleaned, and then exported. The administrative data were messy. There were things in the wrong columns, the data were in Spanish, and I needed to integrate the dataset with another dataset that would allow us to eventually use multiple systems estimation (MSE). This task was really important because data cleaning is necessary and can be incredibly difficult when working with administrative data. Throughout the process, I began to identify another toolkit: the packages in R that I could use when encountering tricky data cleaning problems. Data cleaning is when you identify incomplete, incorrect, inaccurate or irrelevant parts of the data so that your findings are accurate when you do the analysis.
In a second exercise, I worked with Tarak Shah, another HRDAG team member. I began studying open missing-persons cases in Chicago, applying my tools and problem-solving skills to people’s real-life experiences in Chicago. The data that I had were on a PDF, so I worked with Tarak to scrape text from the PDF and parse the information into a data frame. During my time working on this project, I began paying more attention to my GitHub, and following people I wanted to learn from. I’m excited to continue to work on this project with community collaborators and eventually post a document or website that my community can learn from.
The last part of my learning was about my physical toolkit and investing in my future. Patrick Ball and Scott Weikart played a big role in this. Patrick has told me on more than one occasion that “you need a powerful and smooth set of toolkits;” this toolkit can include something as small as a pencil and as large as the monitor you use when programming. The toolkit can be physical things you buy or shortcuts when working from your terminal. Ergonomics is an applied science concerned with designing and arranging things people use so that the people and things interact most efficiently and safely. Programmers have to especially be careful because we sit in front of computers for long periods of time. I invested in a mouse and flat keyboard to support a large monitor to mitigate the health risks associated with this career path. It’s also important to think about the food you put in your body to make sure that your lifestyle doesn’t contribute to problems with your mind, back, and eyes in the future.
When I started my internship, I had only begun to scratch the surface of data science. Now that my time is coming to a close, I’ve realized that there is an infinite amount of information that I will need to learn in my career as a future data scientist. I feel ready for that journey, in large part because of the skills and toolkits I gained at HRDAG. I’ll be wrapping up my last year of graduate school, and I’ll also be working as a research assistant at the Center for Survey Methodology at the University of Chicago, where I will help analyze 35 years of criminal court data from the Circuit Court of Cook County. Going forward with all that I’ve learned, I am excited to continue using data science to find patterns in people’s stories, to boost knowledge in my community, and to hold those in power accountable to the public.