.Rproj Considered Harmful

An interactive programming environment, such as RStudio or Jupyter Notebooks, is an indispensable tool for a data analyst. But code prototyped in such environments, left unedited, can be brittle, difficult to maintain, or come to depend on some hidden aspect of the context in which it was developed. Our projects tend to go on for significant amounts of time and involve multiple programmers working in different languages. Furthermore, given the subject matter, we know our calculations and assumptions will be closely scrutinized.

So we aim to produce code that is clear, replicatable across machines and operating systems, and that leaves an easy-to-follow audit trail allowing us to review every step in the data processing pipeline. In this post, we look at several illustrations of how code prototyped in an interactive environment falls short of those standards, and how to effectively make such code production-ready.

Read the full post here: .Rproj Considered Harmful

Acknowledgements

This post is built from HRDAG’s collective accumulated wisdom, picked up over decades of managing complex, long-lived, collaborative data analysis projects. It reflects that experience and countless conversations. The project structure discussed in this post, with discrete tasks using Makefiles, was developed by Drs. Scott Weikart and Jeff Klingner and formalized around 2007. I’m particularly grateful to Dr. Patrick Ball, who motivated the post, helped me think through the concepts, and provided valuable feedback.


Our work has been used by truth commissions, international criminal tribunals, and non-governmental human rights organizations. We have worked with partners on projects on five continents.

Donate