3.3 Data processing: classification and coding

What are the particular kinds of violence that the organization considers important? The data processing step begins with a list of carefully defined variables. The list is based on the same discussions that were had during the development of the interview protocols and questionnaire about what the organization wants to know from the data. For example, what will the organization choose to classify as violations? Is the analysis about detention and torture, or about illegal detentions, detentions without charge (which may be legal), beatings, electric shocks, and asphyxiation? The latter suggestion is far more detailed than the former, and it requires that the person using the list have more skill to distinguish between similar violation types. The list of types of violence is called the controlled vocabulary of types of violence. When people read questionnaires and classify the descriptions of violence in terms of one or another particular type of violence in the controlled vocabulary, they are coding. Using the controlled vocabulary to code an interview, a document, or a piece of physical evidence is called data processing.

Data processing can be very time consuming, and it can require skilled staff to do it. After the organization has defined extensive rules about how interviews are to be conducted, about how documents are to be analyzed, and about how physical evidence is supposed to be recorded, a great deal more work is required before the information is ready to be entered into a database. Data processing moves information from its partly-structured, partly-coded state in a questionnaire into a fully-structured, fully-coded state in which it can be entered directly into a computer.

There are several controlled vocabularies necessary for human rights work. The most obvious such list is the list of types of violence discussed earlier. Other lists might include regions of the country, organizations in the country's army and security forces, or kinds of vehicles favored by perpetrators. Each list should be composed of mutually exclusive items so that each classification of an act, a place, a military unit, or a vehicle is unambiguous. That is, a good controlled vocabulary list of types of violence might include the following three items: beating with a rifle butt, beating with a truncheon, and other beatings. Any given act of beating can have occurred only in one of the three possible manners described. If a given torture session involved a beating with a rifle butt and a beating with a truncheon, the data processors would code two acts, one for each kind of classified beating. The controlled vocabulary of violations determines how a given incident in the "real world" will appear in the coded data.

A bad controlled vocabulary can have negative ramifications. For example, if "torture" is one category and "beating" is another category, the the only meaningful interpretations of the two items would be either a) that beating is not a form of torture; or b) that "torture" means "torture except for beating." Since neither of these interpretations is particularly clear, it would be best to decompose the "torture" category into useful subdivisions, including torture, and to drop beating as a separate category. Each category in the controlled vocabulary must exclude the other categories that are at the same level in the classificatory hierarchy. For example, instead of

  • Torture
  • Beating
  • we would create a list like this:

  • Torture
  • beating
  • electric shocks
  • etc.
  • Some of the lists in a controlled vocabulary might be hierarchical. For example, any place in El Salvador is part of a small geographic unit called a caserio. Each caserio belongs to a slightly larger unit called a canton. Each canton belongs to a larger unit called a municipio. Each municipio, in turn, belongs to a department. The definition of a location in El Salvador is therefore hierarchical because a single department implies a list of the municipios in that department; each municipio implies a list of the cantones in that municipio; etc. Other kinds of lists are also hierarchical: lists of organizations might include sub- or regional organizations that compose a given group; in military units, often a group of battalions form a brigade, and a group of brigades form a unified command. Composing lists hierarchically can help order and organize a list that may have many items.

    The design and use of controlled vocabulary lists will be covered more thoroughly in the handbook in this series that addresses database design for human rights information management systems[6].

    In conclusion, I would underline that data processors apply the organization's definitional decisions to the raw information coming from interviewers, documentalists, and investigators. This is the step at which information from each source is rendered in a standard form that can be accessed from any part of the organization which needs the information. The next step, the database itself, is the conduit for the information to flow from the point at which it is originally acquired through to every other part of the organization.

