Learning to Learn: Reflections on My Time at HRDAG

When I reflect upon the series of unexpected forks in my life that led me to my work with the Human Rights Data Analysis Group these past seven months, I am reminded of a quote by Dutch engineer Edsger W. Dijkstra: “Computer science is no more about computers than astronomy is about telescopes, biology is about microscopes or chemistry is about beakers and test tubes.”

At HRDAG, rigorous statistical work is done through the lens of human rights. Data science does not exist in its own bubble, devoid of social science, humanity, and politics—statistics is a game of assumptions and choices, and therefore is always political and subjective. Statistics is about everything. A major theme of HRDAG’s work is how we think about and account for missing data. As a journalist and researcher, I often work with datasets that are far from comprehensive. For example, datasets obtained through Freedom of Information Act (FOIA) requests come with some huge caveats: These are only the data the government agency deemed chose to collect for whatever reason, and these are only the data the government agency *chose* to release. Journalists must resist the urge to make easy sweeping statements rather than correct ones—for example, “the number of coronavirus cases in California is x,” as opposed to “the number of confirmed coronavirus cases in California is x.”

I have wrestled with how to present government datasets or incomplete datasets accurately to the public without assuming that the sources of our data share our goals or interests, or that the data are representative. Previously to my work at HRDAG, I worked as a reporter and researcher at Freedom of the Press Foundation (FPF) and worked on occasional freelance journalism projects. Although a significant part of my work at FPF involved data (I was a reporter on the U.S. Press Freedom Tracker, which tracks press freedom issues) and used FOIA data in my reporting, I wanted to be doing increasingly technical journalism. I oftentimes had questions about data that I did not feel qualified to answer, or did not know how to talk about inconsistencies or incomplete data. I knew that all of the statistics and data science programming classes in the world (and I took many!) could not compare to working full time with experts in the field. I hoped to become a much more rigorous data analyst and data scientist, competent programmer, and more responsible data journalist.

My work at HRDAG—in addition to contributing to existing HRDAG projects—included analysis of a large dataset of detained migrant children in the United States. I spent several months deep in this dataset, and eventually authored a technical blog post about the implications of our finding that detained children’s alien numbers are generally assigned sequentially. The ability to pursue a project from PDF wrangling and data cleaning to predictive modeling of missing data and publishing a blogpost all under the mentorship and support of the HRDAG community was an invaluable learning experience. I did not get everything done that I wanted to during my tenure at HRDAG. I started with a ton of ideas for large, ambitious projects I was excited about, and I never ended up touching many of them.

I am proud of my work these last months, but so much of what I took away from my fellowship was about learning how to learn. In the past, a source of frustration for me was that I tended to have questions about data that I did not know how to answer, and struggled with knowing where to start. Now, I work the other way around—I resist trying to think so far ahead that I form too many hypotheses about what the data are or might mean before I interrogate and sit with them. My mentor and supervisor Dr. Patrick Ball constantly reminded me to aim to produce less, and study more, and I am so grateful to have had the time and space to go deep and learn, to wade through academic papers and technical documentation and make sense of data while surrounded by such incredible leaders in the field of human rights data science. Indeed, just being in the HRDAG office in San Francisco was a huge inspiration. The diversity of the backgrounds of people I was lucky enough to work with and share space with—from academic statistics to public policy to industry data science to journalism—was palpable. During my initial time at HRDAG, I worked alongside Trina Reynolds-Tyler, an incredible organizer and a public policy graduate student. Later, I worked even more closely with Valentina Rozo Ángel, a visiting analyst from whom I learned a lot of programming in R, and I was lucky to have a few weeks of overlap at HRDAG with Maria Garguilo, an equally great statistician and person. And in addition to working so closely with HRDAG staff and their network of past employees and fellows all doing amazing work, I was constantly delighted by unexpected and always interesting guests who popped in to the office.

Something I grew to appreciate deeply about HRDAG is everyone’s commitment to constantly improving tooolkits and processes. During my first weeks at HRDAG, I finally fixed my botched Anaconda installation, settled on one shell, and got my environment working across my multiple devices using multiple operating systems. These had been items on my to-do list for some time, but fine-tuning toolkits were a priority and not an afterthought at HRDAG. Everyone fiddled constantly with their text editor and IDE configurations, and frequently discussed nifty software we stumbled upon as well as ways to make various organizational processes more efficient. I am grateful to have gained a persistently critical and dissatisfied approach to my tools, and find myself constantly now asking how I might improve them. In the end, so much of what I learned at HRDAG was intangible. I have gained an enormous amount of overall technical competence, including fluency with data science Python libraries, documentation literacy, and a much stronger, more comprehensive perspective on the ways concepts and tools fit together. With these skills and a more rigorous background has come increased skepticism of work in my own field; I am exponentially more likely to read the footnotes and citations and interrogate the original data that form the basis of technical journalism projects, and to be more critical of their conclusions. This is an enormous gift.

My path to this work has no doubt been unconventional, and it has taken me time to see this as an asset rather than a weakness. This is a truly exciting moment to be working in human rights data science, and I am incredibly grateful to have had the opportunity to collaborate to closely with my friends at colleagues at the Human Rights Data Analysis Group. Huge thanks to Dr. Patrick Ball, Dr. Megan Price, Tarak Shah, and Dr. Scott Weikart in particular for their patience and mentorship, and for believing in my ability to do this work well.

Our work has been used by truth commissions, international criminal tribunals, and non-governmental human rights organizations. We have worked with partners on projects on five continents.