Data coding and inter-rater reliability (IRR)

Data coding is the process of converting unstructured information, such as a narrative testimony, into discrete facts such as names and roles of actors (victims, witnesses, perpetrators) in crimes, as well as the date and place of act. Data coding must not discard or distort information. When more than one person is identifying, classifying and counting the elements reported in a qualitative source, the results of what they find may differ slightly based on each individual’s interpretation and care in doing the coding. These differences can be measured by measuring IRR (inter-rater reliability). We give the same source document to several coders and compare their coded outputs: the extent to which the outputs are the same indicates the reliability of the coding process. Assessing and improving data quality helps us to defend the resulting analysis by showing that the analysis is calculated from a consistent application of the classification criteria to the raw data.

See also: Controlled vocabulary

Return to Core Concepts page