Datasets available for research
Over the last few years, we’ve tried to make the data organized in our projects publicly accessible. We have encouraged our partners to publish the data at the completion of the project. We continue to believe it is important to offer access to the data used in our projects for the sake of transparency as well as to encourage further research and analysis. However, we are increasingly concerned about how raw data are used. Data collected by what we can observe is what statisticians call a convenience sample, which is subject to selection bias.
We’re keeping these datasets available for researchers who want to use them for simulation or estimation projects that use statistical models to correct for biases. We strongly caution researchers not to use the raw data as measures of violence. (Note: several of the Kosovo datasets are estimates and can be used as measures; read the data dictionaries carefully.) Please contact us at info @ hrdag.org if you have any questions or need assistance.
You are welcome to use these datasets for your research. If you publish with them, however, we ask that you include the following text:
“These are convenience sample data, and as such they are not a statistically representative sample of events in the referenced conflict. These data do not support conclusions about patterns, trends, or other substantive comparisons (such as over time, space, ethnicity, age, etc.).”
We would also very much appreciate a citation.
Option 1: Access via DWeb
The DWeb (distributed web) is a decentralized network of servers, in contrast to Option 2: Access via Central Server. We’re learning how we can place files using IPFS, whether it is slower or faster than centralized storage platforms, and is it more resilient to various kinds of failure: storage failure, retrieval failure, or the suppression of our site (e.g., via censorship). We are aware these are threats that many of our partners face.
Relatedly, we wish to understand if this approach is valued by people in our community. Is it helpful to access data in this manner? Does this give you more confidence that the data will be available and secure over long periods of time? We invite you to tell us.
Statistical data
- Colombia – Casanare
- Guatemala – CIIDH datasets
- Kosovo – Killings, Migrations, and More
- Liberia – Truth and Reconciliation Commission
- Sierra Leone – TRC Data and Statistical Appendix
Option 2: Access via Central Server
HRDAG maintains these same datasets below, along with additional explanations and resources in Spanish.