from IPython.display import Image
Image(filename='imgs/banner.png')
No description has been provided for this image
%load_ext pretty_jupyter
%%html

<style>
    #Styling {
        font-weight: bold;
        font-family: Helvetica;
    }
</style>

Analysis and writing by:

Background


Invisible Institute ("II") and the Human Rights Data Analysis Group ("HRDAG") have collaborated on analyzing missing persons cases since 2019 – initially scraping and analyzing open missing persons cases from 2000 - 2016 received through public records requests.

Through Beneath the Surface, a project that uses machine learning to parse through narrative text of police misconduct records, our team was able to identify over 50 complaints related to missing persons cases. Alleged misconduct ranged from officers denying reports all the way to an officer closing a case before finding the young person. Our team then embarked on a deep dive into Chicago Police Missing Persons Data and the legislation which influenced the data systems where these cases live; analyzing every digital case record from 2000-2022 prepared and provided by the Chicago Police Department. We partnered with City Bureau in 2021 on the investigation.

Since reporting in the Chicago Reader and publishing the full version of the story as a microsite, Trina Reynolds-Tyler and Sarah Conway have won a Pulitzer Prize for the story and presented their findings to Chicago City Council and the Illinois Task Force on Missing and Murdered Chicago Women. The Task Force's role includes examining and reporting the systemic causes behind violence that Chicago women and girls experience and reviewing the existing and potential new methods for tracking and collecting data on violence against Chicago women and girls. Pursuant to that goal, members of the Task Force have attempted to review the missing persons data collected by the city's primary contact agency, the Chicago Police Department.

At HRDAG, we expect to receive messy data that needs some degree of tidying. Data entry errors as well as missing datapoints are common enough that we seek them out in our data processing steps, and even seek to employ advanced statistical techniques like multiple imputation and multiple systems estimation (MSE) to help fill in some of the blanks that aren't recoverable through standardization.

In the missing persons investigation, II's modest questions such as, "How often were missing persons located by CPD?", were not possible to answer even after standardizing the data because there were no structured fields related to these pieces of information.

Instead, when we explicitly asked in follow-up FOIA requests for this information to be included, the person(s) responding instructed us to consult the fields related to the investigation status and the current Uniform Crime Reporting (UCR or I-UCR for Illinois) code.

"If an incident is closed non-criminal than the person was likely found." (source)

"A case originally classified as MISSING PERSON may have its IUCR code updated to reflect new information... More specific information on missing person recoveries would be captured in individual case report narratives." (source)

CPD has made public claims to the City Council about the statistical outcomes of these cases which could only have been generated using some form of structured data, not the unstructured case narratives. However, the status and current_iucr fields are insufficient for addressing questions about outcomes, and "likely found" does not imply precise knowledge that a missing person had been located, so the data do not meet the standard necessary for statistical claims about the rate of located persons.

When we summarize the status data, we find that more than 99% of reports are assigned "CLOSED NON-CRIMINAL". We believe this is the field CPD was using for its public claim about how often people are located, which would be misleading if true. We discuss this opinion in detail and through examples in the first notebook.

Why does it matter?


Before digging into the technical details about the policy and data related to missing persons reports in Chicago, it behooves us to tell you why we think you should care about the questions we can't answer and what it could mean for the people in Chicago.

Crime Prevention Initiatives


Meaningful crime prevention begins at the root of crime and with identifying those who are most vulnerable to becoming a victim or offender themself. Those who are in unsafe situations, who are experiencing a mental health crisis, trauma, or violence, are especially vulnerable to crime and may rely on effective community support to find secure housing that is physically and emotionally safe, financial stability, and opportunities for growth that can provide strong alternatives to and safety from crime.

Crime prevention initiatives often use police data to support and guide policy. The city contracts the police department as the primary contact agency for most types of emergencies and dispatches officers through 911 call centers where calls for assistance are received. Both the call center and police keep some data on the requested and provided support from their agency. Later, when city leaders have questions about what the community needs and what issues are the most prevalent, each agency and their data will be consulted. When that information is then used to make a data-driven argument and affect policy, new public safety initiatives are built on the significance of the data and findings. However, as with all statistics and scientific findings, the strength of the findings relies on the strength of the data collection and methodology - details which may not be accessible to all stakeholders in the matter.

For example, when communities report non-arrival of police even when calling about gunshots, it's useful to think about the data generation process and how something like non-arrival would present itself in data. Additionally, what would you need to have certainty that arrival did occur, besides the word of those whose job it was to show up?

Scientific Sins & How to spot them

How do the experts evaluate data and data-driven arguments?

The usefulness and real-world applicability of data-driven arguments is closely connected to adherence to basic scientific and research tenets, such as:

  • Transparency: Information about how the data were collected and used to produce the results is accessible and appears with the findings, along with documentation of relevant limitations or caveats.
  • Reproducibility: Given the same dataset and using the same methods, others can reproduce the results and come to the same conclusions.
  • Reliability: The definitions and methods presented are an accurate representation of what was done overall and were consistent across contributors. That is to say nothing was left out and nobody deviated from these instructions in a way that could influence the results.
  • Validity: The interpretations and conclusions are reasonable to make given the data, results, and limitations. The conclusions are not overstated or applied too broadly given the constraints.
  • Peer review: The arguments survive the review of other experts in the subject and/or methodology.

What research is appropriate for police data?

Every datapoint is just an artifact of some real-world process. Somewhere, at some point, someone recorded something. Whether it's an officer filling out an incident report, a researcher surveying an area, or even a relative leaving a voicemail, these are all examples of some real-world process that generates data. The circumstances surrounding the data represent the information you have.

Let's take a survey conducted by a local library facing funding cuts for example. Suppose an analysis of the survey responses suggest an unfavorable view of the local library's new hours, and people in the survey are complaining that now the facility is unavailable at the times they need it. How do we decide if the survey is representative of everyone in the area? What if the people surveyed lived out of state of the library, or had never accessed the library anyway? What if they were all people who had visited the facility in the last 6 months? Information about how this survey was conducted is relevant to how we conduct and interpret the analyses which affect ongoing decisions about returning funding to the library so that it can remain open longer.

When it comes to police data, there are 2 main points to consider:

  1. Even when a police department is mandated to collect data for the purpose of enabling statistical research, the department is still going to develop the data entry system around its operational objectives and what they need from the database.
  2. Police data is often used and regarded as "crime data." However, it more accurately represents potential crime that is known by police, since due process includes a presumption of innocence and crimes occur that are not known to police.

The missing persons data story is largely shaped by the first point, that this dataset was designed in collaboration with CPD and shaped by its operational objectives, not by substantive research questions, to such a severe degree that it cannot be used to answer the most basic of those substantive questions - how many missing persons were located?

For this reason, we believe that the statistics shared by CPD with City Council as referenced in this article by the Chicago Reader were flawed. CPD could not have precise knowledge of how many missing persons returned or were located because they do not track this with structured data, they merely record their own agency's investigation status, and imply that one particular status is synonymous exclusively with one type of outcome.

Before we continue introducing the data series, we want to elaborate on why this practice is a problem using examples of the "Closed Non-Criminal" label, as applied to cases with known outcomes:

  • Incomplete and/or inaccurate reclassification. Several of the instances we looked at as part of our review of internal documents revealed CPD had knowledge that a missing person had become the victim of a homicide, but reclassified the report as "Closed Non-Criminal" anyway. When CPD was the agency investigating the homicide, a separate rd number was often created for the homicide investigation, which keeps that case separate and undiscoverable from those looking into missing persons cases. However this practice may have been justified internally, these instances of "Closed Non-Criminal" cannot reasonably be presented as the missing returning or being located in the same way as someone who took a surprise vacation returning alive and well.
  • Premature case closure. We know this occurs because we have identified some such cases in our review, and some also exist in the public domain. For instance, the Bradley sisters were initially under the same, single missing persons report which was prematurely closed by CPD. Once this was made public, CPD opened a second missing persons report for the sisters and kept that record open. Now, two examples of this incident exist in the database: one that describes these two, still missing humans, as having returned or been located, and another that correctly refers to this as an open investigation into what happened to them.

Making the data series

As we explored our simple but substantive questions, we encountered several issues that could not be resolved. We discuss these issues and many of our findings in-depth in a multi-part data series hosted in a public GitHub repo and introduced broadly here.

The series is organized as a walk-through of the data, as follows:

  1. What is a "Missing Person" event?
  2. What happens as a result of an event? What is the expected response from the primary contact agency?
  3. Who is a "Missing Person" in CPD's data?
  4. What has to happen in order for a report to make it into the CPD database? Who is missing from the missing persons data?

Final Thoughts

Despite the increasing militarization of police departments, much of policing is deskwork. All the while, police acting as first responders are often the primary or even exclusive agency tasked with responding to emergent needs from our communities.

This deskwork creates the paper trail that city representatives, as well as journalists and researchers, regularly rely on when they look for details of what happened for the purpose of studying different kinds of emergencies and informing our future understanding of community safety needs. The use of a police department’s data to inform policy places a burden on the department’s data collection practices to be:

  1. accurate and representative of the truth,
  2. transparent and consistent about what information is tracked in what fields, and
  3. aware of possible constraints and limitations.

As far as the Chicago Police Department’s missing persons data is concerned, there is an outstanding question about which of these cases have been resolved, and whether poor record-keeping is systematically undermining our ability to understand and respond to public safety needs.

We discuss 4 specific missing persons cases in chapter one that speak to this potential pattern. For a particularly illuminating example, in the Case Supplementary Report where an officer provided the final update to 12 year old Jahmeshia Conner’s missing person case, they wrote:

“The complainant [her mother] positively identified the missing at the Medical Examiners Office. The missing is now the subject of an ongoing Homicide investigation HR-666816. This investigation is closed non criminal.”

Challenges with record-keeping are not isolated to Chicago and the 3 agencies mentioned in this analysis. HRDAG's more than 30 years of designing data processing steps to document human rights violations around the world has involved diving into the data collection methods of a given dataset, how they might have influenced what each field means in practice, and how these realities limit what analysis is possible. If the appropriate data for analysis are not available, what is the significance of what we cannot address? What harm might go on unnoticed? What narratives unchallenged?

As police departments increasingly produce their own open data portals and data-driven arguments, scientific rigor is important to spot flawed or misleading reasoning. Like we see in Chicago with missing persons, even in places where legislation exists that explicitly dictates data collection is intended to support statistical analysis of outcomes, the design and usage of database systems reflect the working objectives of the agency. It may be assumed by policy makers that this would not be prohibitive or hindering for the later analysis but that is, unfortunately, not a reasonable assumption to make.

When public safety and crime prevention initiatives fail, the people most harmed are the vulnerable who have the greatest stake in such initiatives working. When the arguments that inform these policies are built on data which have not been constructed consistently and have labels like "Closed Non-Criminal" assigned to cases that became homicide investigations as well as those that ended in a safe return, that failure is inevitable.

We hope these data notebooks can support elected officals and local community members in better understanding the missing persons data pipeline, that the public has access to the details of the gaps unveiled by our findings, and that our work can operate as a model for utilizing data science and narrative justice to interrogate police records and the data-driven claims made from them.

Acknowledgements

Many thanks to:

for their contributions to this data story, and to:

for their contributions to the full Chicago Missing Persons story, a Pulitzer-Prize-winning example of data journalism.


More on this series:

Start reading the full Chicago Missing Persons story or chapter one of the data story.