Datasets available for research
Over the last few years, we’ve tried to make the data organized in our projects publicly accessible. We have encouraged our partners to publish the data at the completion of the project. We continue to believe it is important to offer access to the data used in our projects for the sake of transparency as well as to encourage further research and analysis. However, we are increasingly concerned about how raw data are used. Data collected by what we can observe is what statisticians call a convenience sample, which is subject to selection bias.
We’re keeping these datasets available for researchers who want to use them for simulation or estimation projects that use statistical models to correct for biases. We strongly caution researchers not to use the raw data as measures of violence. (Note: several of the Kosovo datasets are estimates and can be used as measures; read the data dictionaries carefully.) Please contact us at info @ hrdag.org if you have any questions or need assistance.
You are welcome to use these datasets for your research. If you publish with them, however, we ask that you include the following text:
“These are convenience sample data, and as such they are not a statistically representative sample of events in the referenced conflict. These data do not support conclusions about patterns, trends, or other substantive comparisons (such as over time, space, ethnicity, age, etc.).”
We would also very much appreciate a citation.
Option 1: Access via DWeb
The DWeb (decentralized web) is a decentralized network of servers, in contrast to Option 2: Access via Central Server. We’re learning how we can place files using IPFS, whether it is slower or faster than centralized storage platforms, and is it more resilient to various kinds of failure: storage failure, retrieval failure, or the suppression of our site (e.g., via censorship). We are aware these are threats that many of our partners face.
Relatedly, we wish to understand if this approach is valued by people in our community. Is it helpful to access data in this manner? Does this give you more confidence that the data will be available and secure over long periods of time? We invite you to tell us.
CEV-JEP-HRDAG Joint Project Data
These datasets are from the JEP-CEV-HRDAG Joint Project described here. Two anonymized versions of the databases of victims of disappearance, forced recruitment, homicide, and kidnapping have been published.
The first version (v1) of the data corresponds to the original version of the data used for the analyses in the Methodological Report of the Joint Project. These data are useful for replicating the analyses in the Methodological Report.
The second version (v2) of the data corrects some errors where individuals were erroneously included as direct victims when they should not have been. The v2 data is appropriate for researchers wishing to design their own analyses of the conflict. This page provides more information about the two versions of the data.
The verdata package for the R statistical programming language was created to facilitate the analyses of these datasets.
Download data to replicate analyses conducted in the Methodological Report (v1 of the data)
- Disappearance [csv] [parquet]
- Forced recruitment [csv] [parquet]
- Homicide [csv] [parquet]
- Kidnapping [csv] [parquet]
Download data to design your own analyses of the armed conflict in Colombia (v2 of the data)
- Disappearance [csv] [parquet]
- Forced recruitment [csv] [parquet]
- Homicide [csv] [parquet]
- Kidnapping [csv] [parquet]
Statistical data
- Colombia – Casanare
- Guatemala – CIIDH datasets
- Kosovo – Killings, Migrations, and More
- Liberia – Truth and Reconciliation Commission
- Sierra Leone – TRC Data and Statistical Appendix
Option 2: Access via Central Server
HRDAG maintains these same datasets below, along with additional explanations and resources in Spanish.