CIIDH Data – Dictionary
Version date: 2000.01.29
Current version: ATV20.1
Patrick Ball & Herbert F. Spirer
The unit of analysis for each record in this structure is VIOLATION.
Each violation was of a particular type, happened at a particular time and place, and was committed by zero, one, or several organizational perpetrators. The violation was committed against zero or one named (individually identified) victim, and zero or more anonymous (unidentified) additional victims. The violation was reported one or more times in one, two, or three source types.
Note that to count the number of times individuals suffered particular violations, users should sum either the variable c_nmd (to count the number of NAMED individuals) or c_tot (to count the total number of individuals, named and anonymous). In Stata, this can be accomplished by using frequency weights. Other statistics programs have similar features. To repeat: the number of records is not the same as the number of violations.
The dataset is available in several formats: Stata version 6 (recommended), delimited ASCII (csv), dBase III (dbf), SPSS portable file, and SPSS for Windows. Note that for the Stata and SPSS (Windows and portable) versions of the dataset, the variable labels and value labels are already applied to the data. However, for the ASCII and dbf versions, you will have to handle the labeling on your own. Note that there are 17,423 records in this dataset, which is too large to be imported into most spreadsheets.
The categorical variables are coded as integers. Although this is convenient for statistical packages, it can be difficult for human beings to interpret data coded in this way. The value labels for the integer codes are here. The value label list includes the number of times each category appears in the data. Note: these are frequencies of records, not of violations. To count violations, you must use the weights in c_tot and c_nmd.
Variable list
Victim variables | |||
Variable name | Variable type | Value labels | Variable label |
v_num | str9 | Victim ID | |
v_sur1 | str13 | Victim First surname | |
v_sur2 | str15 | Victim Second surname | |
v_nam1 | str13 | Victim First names | |
v_age | byte | Victim Age | |
v_dob | int | Victim date of birth | |
v_p94 | long | Population of v_must (1994 census) | |
v_occ | byte | Victim Occupation | |
v_ind | byte | Yes | Victim Ethnic category |
v_sex | byte | Yes | Victim Sex |
v_eth | byte | Yes | Victim Maternal language (proxy for eth.) |
v_must | int | Yes | Victim Muncipio of birth |
Violation variables | |||
Variable name | Variable type | Value labels | Variable label |
n_grp | int | Number in group (killings and disappearances) | |
n_ovkl | byte | Whether the killing was “overkill” (see Note 2 below) | |
n_mon | byte | Month of violation | |
n_year | int | Year of violation | |
n_dtcd | byte | Yes | Date precision (violation) |
n_rgim | byte | Yes | Regime code (for date of violation) |
n_p94 | long | Population of m_mucd (1994 census) | |
n_type | byte | Yes | Type of violation (note 1, below) |
n_ur | byte | Yes | Violation location: Rural or urban |
n_must | int | Yes | Municipio of the violation |
n_dpst | int | Yes | Departamento of the violation |
Perpetrator variables | |||
Variable name | Variable type | Value labels | Variable label |
p_civ | byte | 1=participation of civilians | |
p_arm | byte | 1=participation of army | |
p_pac | byte | 1=participation of PACs | |
p_pol | byte | 1=participation of police | |
p_par | byte | 1=participation of paramilitary groups | |
p_urn | byte | 1=participation of URNG | |
Reporting variables | |||
Variable name | Variable type | Value labels | Variable label |
r_per | byte | Number of times this violation was reported in the press | |
r_doc | byte | Number of times this violation was reported in documentary sources | |
r_ent | byte | Number of times this violation was reported in interviews with witnesses | |
r_date | int | If R_per>0, R_date is the date of the first press report of the violation (in the ASCII version, this is formatted as mm/dd/yyyy) | |
Case (multiplier) variables | |||
Variable name | Variable type | Value labels | Variable label |
c_nmd | Byte | 1=this violation includes a named victim | |
c_tot | Int | The total number of victims (named and anonymous) who suffered this violation |
Note 1: the violation type codes are the following:
Category | Meaning | Record count |
DM | Disappeared, later found killed | 218 |
Ds | Disappeared | 1546 |
Hr | Injured (in Army attack) | 411 |
Mu | Killed | 11862 |
Se | Kidnapped | 2903 |
To | Tortured | 483 |
Total | 17423 |
The important part of Note 1 is that to count disappeared people, you must sum c_nmd or c_tot including Ds + DM; to count killed people, sum c_nmd or c_tot with DM + Mu; to count killed and disappeared, sum c_nmd or c_tot for Ds + DM + Mu. DM is a compound category including people who were both disappeared and later their bodies appeared. In Stata, you could create new variables to represent people who were killed and disappeared with the following commands. (note the difference between the record counts in the table above and the frequency counts using c_tot in the examples below).
/* this creates a variable with the value and the label in one field */ . vallab n_type, g(sn_type) /* now show the tabulation, counting anonymous victims */ . ta sn_type [fw=c_tot] Type of | violation | Freq. Percent Cum. ------------+----------------------------------- 23 DM | 272 0.63 0.63 24 Ds | 2759 6.41 7.04 25 Hr | 1085 2.52 9.56 26 Mu | 34210 79.43 88.99 27 Se | 3466 8.05 97.03 28 To | 1278 2.97 100.00 ------------+----------------------------------- Total | 43070 100.00 /* we're interested in violations with n_type = 23, 24, and 26. The new variable is created below. */ . ge killdis=1 if n_type==23 | n_type==24 | n_type==26 (3797 missing values generated) . replace killdis=0 if killdis==. (3797 real changes made) . ta killdis [fw=c_tot] killdis | Freq. Percent Cum. ------------+----------------------------------- 0 | 5829 13.53 13.53 1 | 37241 86.47 100.00 ------------+----------------------------------- Total | 43070 100.00
Note 2: “overkill” is defined as people who were killed by methods beyond the necessary, including torturing to death or burning, as well as cases in which bodies were mutilated after death.
Notes on the original data
The original data from which this dataset was generated include 19 tables linked in a relational database collected and systematized by the International Center for Human Rights Research in Guatemala. That full dataset, including narrative summaries, occupies approximately 50 megabytes.
There are many variables that were not included in this output, from antemortem information about victims of forced disappearance (color of pants when last seen, dental or bone conditions), to specific types of torture, to data about the perpetrators (vehicle type, weapon caliber).
It would be very complicated to put most of the excluded variables in the dataset. For example, since each violation may have been committed by various perpetrators, there may be various weapons that were used. If we attempt to put the weapons data into the flat structure we are using for this published data, we will need dozens of fields to represent each perpetrator’s possible weapon.
Most of the variables not included in this dataset are sparse. For example, there is data on the type of weapons used in particular violations for approximately one-third of the violations originally coded. Other variables have non-missing data for only a few dozen records. If researchers have particular questions about variables they would like to have included in future versions of this dataset, we are willing to discuss their needs. If there are sufficient requests for new variables, we may issue a new version of this dataset. A review of the dataset’s full variables is here.
Error checking
We have devoted hundreds of hours to checking the dataset to control for multiple reports of the same incidents. Many of the victims in this dataset have the same names and may appear to be the same person. We have reviewed every pair of victims with the same or similar names against the narrative information that was stored with the original data. The narrative information includes portions of the original testimony, quotations of original newspaper or documentary accounts, and the coders’ commentary on what they found in the source materials; this narrative information cannot be published because it includes too much data on the original witnesses to be securely released. Whenever victims appeared to be the same person, based on an overall analysis of the names, places and dates of birth, types, times and places of the violations, and qualitative data in the narrative, we combined the records. Note that we did not delete the original records; instead we created meta-records that linked all the data pertaining to the same person. This way we are able to report the r_* series variables, analyzing how frequently some violations are reported relative to other violations.
Back to main CIIDH Data page.