CIIDH Data – Dictionary

Version date: 2000.01.29
Current version: ATV20.1
Patrick Ball & Herbert F. Spirer

The unit of analysis for each record in this structure is VIOLATION.

Each violation was of a particular type, happened at a particular time and place, and was committed by zero, one, or several organizational perpetrators. The violation was committed against zero or one named (individually identified) victim, and zero or more anonymous (unidentified) additional victims. The violation was reported one or more times in one, two, or three source types.

Note that to count the number of times individuals suffered particular violations, users should sum either the variable c_nmd (to count the number of NAMED individuals) or c_tot (to count the total number of individuals, named and anonymous). In Stata, this can be accomplished by using frequency weights. Other statistics programs have similar features. To repeat: the number of records is not the same as the number of violations.

The dataset is available in several formats: Stata version 6 (recommended), delimited ASCII (csv), dBase III (dbf), SPSS portable file, and SPSS for Windows. Note that for the Stata and SPSS (Windows and portable) versions of the dataset, the variable labels and value labels are already applied to the data. However, for the ASCII and dbf versions, you will have to handle the labeling on your own. Note that there are 17,423 records in this dataset, which is too large to be imported into most spreadsheets.

The categorical variables are coded as integers. Although this is convenient for statistical packages, it can be difficult for human beings to interpret data coded in this way. The value labels for the integer codes are here. The value label list includes the number of times each category appears in the data. Note: these are frequencies of records, not of violations. To count violations, you must use the weights in c_tot and c_nmd.

Variable list

Victim variables
Variable name	Variable type	Value labels	Variable label
v_num	str9		Victim ID
v_sur1	str13		Victim First surname
v_sur2	str15		Victim Second surname
v_nam1	str13		Victim First names
v_age	byte		Victim Age
v_dob	int		Victim date of birth
v_p94	long		Population of v_must (1994 census)
v_occ	byte		Victim Occupation
v_ind	byte	Yes	Victim Ethnic category
v_sex	byte	Yes	Victim Sex
v_eth	byte	Yes	Victim Maternal language (proxy for eth.)
v_must	int	Yes	Victim Muncipio of birth

Violation variables
Variable name	Variable type	Value labels	Variable label
n_grp	int		Number in group (killings and disappearances)
n_ovkl	byte		Whether the killing was “overkill” (see Note 2 below)
n_mon	byte		Month of violation
n_year	int		Year of violation
n_dtcd	byte	Yes	Date precision (violation)
n_rgim	byte	Yes	Regime code (for date of violation)
n_p94	long		Population of m_mucd (1994 census)
n_type	byte	Yes	Type of violation (note 1, below)
n_ur	byte	Yes	Violation location: Rural or urban
n_must	int	Yes	Municipio of the violation
n_dpst	int	Yes	Departamento of the violation

Perpetrator variables
Variable name	Variable type	Value labels	Variable label
p_civ	byte		1=participation of civilians
p_arm	byte		1=participation of army
p_pac	byte		1=participation of PACs
p_pol	byte		1=participation of police
p_par	byte		1=participation of paramilitary groups
p_urn	byte		1=participation of URNG

Reporting variables
Variable name	Variable type	Value labels	Variable label
r_per	byte		Number of times this violation was reported in the press
r_doc	byte		Number of times this violation was reported in documentary sources
r_ent	byte		Number of times this violation was reported in interviews with witnesses
r_date	int		If R_per>0, R_date is the date of the first press report of the violation (in the ASCII version, this is formatted as mm/dd/yyyy)

Case (multiplier) variables
Variable name	Variable type	Value labels	Variable label
c_nmd	Byte		1=this violation includes a named victim
c_tot	Int		The total number of victims (named and anonymous) who suffered this violation

Note 1: the violation type codes are the following:

Category	Meaning	Record count
DM	Disappeared, later found killed	218
Ds	Disappeared	1546
Hr	Injured (in Army attack)	411
Mu	Killed	11862
Se	Kidnapped	2903
To	Tortured	483
Total		17423

The important part of Note 1 is that to count disappeared people, you must sum c_nmd or c_tot including Ds + DM; to count killed people, sum c_nmd or c_tot with DM + Mu; to count killed and disappeared, sum c_nmd or c_tot for Ds + DM + Mu. DM is a compound category including people who were both disappeared and later their bodies appeared. In Stata, you could create new variables to represent people who were killed and disappeared with the following commands. (note the difference between the record counts in the table above and the frequency counts using c_tot in the examples below).

/* this creates a variable with the value and the label in one field */
. vallab n_type, g(sn_type)

/* now show the tabulation, counting anonymous victims */
. ta sn_type [fw=c_tot]

    Type of |
  violation |      Freq.     Percent        Cum.
------------+-----------------------------------
      23 DM |        272        0.63        0.63
      24 Ds |       2759        6.41        7.04
      25 Hr |       1085        2.52        9.56
      26 Mu |      34210       79.43       88.99
      27 Se |       3466        8.05       97.03
      28 To |       1278        2.97      100.00
------------+-----------------------------------
      Total |      43070      100.00

/*
we're interested in violations with n_type = 23, 24, and 26.
The new variable is created below.
*/

. ge killdis=1 if n_type==23 | n_type==24 | n_type==26
(3797 missing values generated)

. replace killdis=0 if killdis==.
(3797 real changes made)

. ta killdis [fw=c_tot]

    killdis |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |       5829       13.53       13.53
          1 |      37241       86.47      100.00
------------+-----------------------------------
      Total |      43070      100.00

Note 2: “overkill” is defined as people who were killed by methods beyond the necessary, including torturing to death or burning, as well as cases in which bodies were mutilated after death.

Notes on the original data

The original data from which this dataset was generated include 19 tables linked in a relational database collected and systematized by the International Center for Human Rights Research in Guatemala. That full dataset, including narrative summaries, occupies approximately 50 megabytes.

There are many variables that were not included in this output, from antemortem information about victims of forced disappearance (color of pants when last seen, dental or bone conditions), to specific types of torture, to data about the perpetrators (vehicle type, weapon caliber).

It would be very complicated to put most of the excluded variables in the dataset. For example, since each violation may have been committed by various perpetrators, there may be various weapons that were used. If we attempt to put the weapons data into the flat structure we are using for this published data, we will need dozens of fields to represent each perpetrator’s possible weapon.

Most of the variables not included in this dataset are sparse. For example, there is data on the type of weapons used in particular violations for approximately one-third of the violations originally coded. Other variables have non-missing data for only a few dozen records. If researchers have particular questions about variables they would like to have included in future versions of this dataset, we are willing to discuss their needs. If there are sufficient requests for new variables, we may issue a new version of this dataset. A review of the dataset’s full variables is here.

Error checking

We have devoted hundreds of hours to checking the dataset to control for multiple reports of the same incidents. Many of the victims in this dataset have the same names and may appear to be the same person. We have reviewed every pair of victims with the same or similar names against the narrative information that was stored with the original data. The narrative information includes portions of the original testimony, quotations of original newspaper or documentary accounts, and the coders’ commentary on what they found in the source materials; this narrative information cannot be published because it includes too much data on the original witnesses to be securely released. Whenever victims appeared to be the same person, based on an overall analysis of the names, places and dates of birth, types, times and places of the violations, and qualitative data in the narrative, we combined the records. Note that we did not delete the original records; instead we created meta-records that linked all the data pertaining to the same person. This way we are able to report the r_* series variables, analyzing how frequently some violations are reported relative to other violations.

Back to main CIIDH Data page.

CIIDH Data – Dictionary

Our work has been used by truth commissions, international criminal tribunals, and non-governmental human rights organizations. We have worked with partners on projects on five continents.

Donate

HRDAG

Selected projects

Stay informed about our work

CIIDH Data – Dictionary

Our work has been used by truth commissions, international criminal tribunals, and non-governmental human rights organizations. We have worked with partners on projects on five continents. Donate

HRDAG

Selected projects

Stay informed about our work

Our work has been used by truth commissions, international criminal tribunals, and non-governmental human rights organizations. We have worked with partners on projects on five continents.

Donate