Data on Kosovo Killings
The data on killings in Kosovo are in four files. All of the files are comma-delimited ASCII. The fields in each file are described below.
If you use these data on Kosovo killings, please cite them with the following citation, as well as this note:
“These are convenience sample data, and as such they are not a statistically representative sample of events in this conflict. These data do not support conclusions about patterns, trends, or other substantive comparisons (such as over time, space, ethnicity, age, etc.).”
Patrick Ball, Wendy Betts, Fritz Scheuren, Jana Dudukovich, and Jana Asher. (2002). AAAS/ABA-CEELI/Human Rights Data Analysis Group database of killings in Kosovo.
The first file is md_pub.csv. It contains 4725 records (see below for why there are more records than victims). Appendix 1 of the report gives a full description of how this file was compiled. We have omitted the names of the victims in order to protect both the victims’ privacy and to protect the people who gave information to the organizations that collected the data. It contains records of deaths reported to have occurred during the period 20 March 1999 – 20 June 1999; reported deaths outside that period were not included in our analysis and so are not included in this file.
Each record represents one death or a partial death. The partial deaths are those for which the date of death was missing. Quoting from pages 30-31 of the report, “For 204 records with no date information, a hot deck procedure was employed to assign a date at random from a donor record that was geographically closest to the location of the record with the missing date.Three dates were randomly selected from the potential donors, and copies of the original record were created with each of the sampled dates. The new records were each assigned a weight of 0.33.”
Note therefore that the total number of victims is the sum of the “weight” field, which equals 4399.67.
However, not all weighted deaths have three records with the same id. Continuing to quote from page 31, “Some of the hot-decked dates were outside the date range of interest to this study (20 March-22 June). Those records (and their partial weights) were therefore excluded from the analysis.”
It contains the following fields.
|Field name||Field description
|id||The id of this record. Note that these are not unique.|
|age||The age at death of this victim. Note that 0 denotes an infant, and -1 indicates that the age is unknown.|
|sex||M=male, F=female, U=unknown.|
|pcode||The geographic code for the village or town in which the death occurred. See the geographic dataset for more information.|
|mcode||The geographic code for the municipality in which the death occurred. See the geographic dataset for more information.|
|dt_kill||The date of the death.|
|dtk2||The date of death rounded to two-day periods; note that each period includes the following day.|
|aba||1=this death was reported to the ABA (see pages 18-19 in the report).|
|exh||1=this death was identified in an exhumation (see pages 19-20 in the report).|
|hrw||1=this death was reported to HRW (see pages 20-21 in the report).|
|osce||1=this death was reported to the OSCE (see page 21 in the report).|
|weight||1=record with a complete date; 0.33=record with an imputed date.|
The remaining three files contain our estimates, using the data in md_pub.csv and following the procedures described in Appendix 2 of our report.
dtk2_oth.csv contains data estimated by two-day periods. These data underlie (for example) Figure 2 (page 6), and the regressions over time presented in Figure 19 (page 58), first and third columns. Note: these data have been corrected as described in the 15 November 2002 corrigendum.
|modelspec||The model used to estimate the total deaths for this point in standard log-linear notation. See Appendix 2, section 3.5 and following. This value is empty when it was impossible to estimate any model for this period (e.g., 11may99).|
|nsum||The total estimated deaths for this two-day period. Note that this value is simply the reported deaths when modelspec is missing. The cell counts from which this was estimated can be computed using the raw data.|
|sd||The estimated standard error of the estimate of nsum, as described in Appendix 2, page 40, in the report.|
|lvcnt||The estimated total number of people leaving home during this two-day period. See the description of migration data, and Policy or Panic.|
|bomb||The number of NATO airstrikes in this period. See the description of other data, and Section 5, pp. 8-13 in the report.|
|bomblag||The number of reported NATO airstrikes in the previous period (note that this is missing for 20mar99).|
|klaB||The number of reported KLA exchanges of fire with Serb authorities. See the description of other data, and pp. 11-12 in the report.|
|klaBlag||The value of klaB in the previous two-day period.|
|klaK||The number of reported Serb casualties caused by interactions with the KLA. See See the description of other data, and pp. 11-12 in the report.|
|klaKlag||The value of klaK in the previous two-day period.|
Over region and six-day period
rgwk6_oth.csv contains data estimated by two-day periods. These data underlie (for example) Figure 12 (page 53). Note: these data have been corrected as described in the 15 November 2002 corrigendum.
|w6||The six-day period defined as the listed day and the five following days.|
|gcode||North, south, east or west. The classification of municipalities into regions is described in Figure 3, page 7 of the report. Also see the geographic data.|
Over region and two-day period
rgdtk2est_oth.cvs contains data estimated by region and two-day periods. These data underlie (for example) Figures 4-7 (page 9-10), and the regressions over time presented in Figure 19 (page 58), second and fourth columns. Note: these data have been corrected as described in the 15 November 2002 corrigendum.
|nsum||Estimated as described in Appendix 2, Section 3.6, pp. 50-51.|
Last updated: 1 November 2002
Back to main Kosovo Data page.