This talk will explore the assumption that nearly every project using data must make: that the data are representative of reality in the world. We will explore how, contrary to the standard assumption, how statistical patterns in raw data tend to be quite different than patterns in the world. Statistical patterns in data reflect how the data was collected rather than changes in the real-world phenomena data purport to represent.
Using analysis of killings in Iraq, homicides committed by police in the US, killings in the conflict in Syria, and homicides in Colombia, we will contrast patterns in raw data with estimates total patterns of violence—where the estimates correct for heterogeneous underreporting. The talk will show how biases in raw data can be addressed through estimation, and explain why it matters.