I’ve always worked in the nonprofit sector because it gives me a way to give back to communities that need and deserve it. When I first began studying data science, I became fascinated with the possibility of using data for social good. HRDAG is doing great work in multiple spaces of social good, from criminal justice to assisting in international issues. I wanted to work with HRDAG because it would enable me to see the different ways data science can be applied to social good projects.
My fellowship started with me learning about the importance of setting up my computer. I have never used vim before, and this led me to learn about dotfiles, bash, and terminal types. I learned that it’s important to consider color schemes for your prompt. This may seem unimportant to a newbie, but it makes a big difference, particularly when you’re trying to find an error in your code. If the colors are too distracting, or your prompt names don’t include your current location, and you’re deep into files, well, it’s good to have the computer helping you. These little things make it easier to focus on programming.
Next I learned about the HRDAG workflow, which is nothing short of amazing. It’s so organized that you can go into any project and easily see what is happening because they are all set up in the same way. After I understood the workflow, Patrick encouraged me to learn R. I’d tried learning R before, but in the past I felt it was best to stay focused on Python. Patrick brought up some great points about the power of R and instances when it is ideal to use R instead of Python. He described the benefits of Jupyter Notebook as a prototyping tool, as well as its shortcomings as a production tool. And we discussed the many advantages of R Markdown.
It’s definitely an uphill battle. I’m still learning R, but I have come to understand why it’s important and why people should learn both. It helped that Patrick was working on making an R package and he asked me and Tarak to help. I got to see the process of making an R package up close. I learned the importance of testing, using tests as a way to lead you to writing better code.
The project I worked on while at HRDAG was helping our partners at Million Dollar Hoods to clean data. From this project, I learned the importance of considering the implications of apparently small decisions when cleaning datasets. For example, if we need to collapse a series of records into one, but the fields are insufficient to be precise, should we prefer to collapse them too much or not quite enough?
I spent a significant amount of time debugging throughout the project. Erin helped with debugging and learning to read the error messages. I started keeping notes about the different errors I was finding, so that when I had the error again I could remember how to fix it. This came in handy several times!
During the fellowship, I improved my best practices with github thanks to Tarak, who showed me the importance of regularly breaking down each task in a project to ensure you are pushing it to github. Since I have previously only used github for personal projects, I never thought too much about the commit message or how often I was pushing. He showed me his method for creating issues and branches to ensure that my changes stay organized. It has taken a while to get the github workflow down. But at one point, I was fixing a bug that required us to look back at the git changes to find the root of the problem. That helped me to see the importance of making commit messages that explain why the code is being pushed, rather than just describing the specific change.
I’m grateful for the experience to have learned more about data science, the inner workings of a project, and from my colleagues at HRDAG. The data science field is changing, and it’s important to know that we will always be learning. The field is moving fast and there’s a ton to know, so I will always be learning.