01-introduction.Rmd

\addtocounter{chapter}{1}\specialchapt{CHAPTER 1. INTRODUCTION}

# Background

There are few stakes higher than the prosecution of suspects in the criminal justice system. In the United States, 32 states continue to maintain the use of the capital punishment, in which a convicted suspect can have their life ended. All 50 states will consider a lifetime imprisonment, with no opportunity for parole, given the extent of the crime committed. And yet, in spite of the major stakes, many forensic science methods have come under fire in recent years due to a lack of statistical rigor. The issues were well summarized in a report by the President's Council of Advisors on Science on Technology (PCAST) [@pcast2016]. A nonexhaustive list of the issues:

- Fingerprint analysis is a subjective process where investigators can be swayed, and error rates are seriously lacking.
- Bitemark analysis is seen as a method that probably can't be developed into a scientifically valid method, and resources towards such efforts should be minimal.
- DNA analysis, while more sound, sometimes discounts the role of operator error in the process.

There was also a heavy focus on the limitations of firearms analysis. Though a widely used and accepted procedure, it has come under particular scrutiny in the past decade. In 2005, in *United States vs. Green*, the court ruled that the forensic expert could not confirm that the bullet casings came from a specific weapon with certainty, but could merely "describe" other casings which are similar. Further court cases in the late 2000s expressed caution about the use of firearms identification evidence [@giannelli:2011]. In 2009, the National Academy of Sciences published a report [@NAS:2009] questioning the scientific validity of many forensic methods including firearm examination. The report states that "[m]uch forensic evidence -- including, for example, bite marks and firearm and toolmark identification is introduced in criminal trials without any meaningful scientific validation, determination of error rates, or reliability testing to explain the limits of the discipline." The PCAST report corroborated these findings, and explained how modern techniques could potentially be used to turn this analysis into something more objective:

*A second—and more important—direction is (as with latent print analysis) to convert firearms analysis from a subjective method to an objective method. This would involve developing and testing image-analysis algorithms for comparing the similarity of tool marks on bullets. There have already been encouraging steps toward this goal. Recent efforts to characterize 3D images of bullets have used statistical and machine learning methods to construct a quantitative “signature” for each bullet that can be used for comparisons across samples.*

To begin to develop such an objective method, the currently available computing tools must be explored and discussed so that they can be leveraged for this purpose.

# Designing a Modern Software System

Conveniently, the computing revolution has opened up statistical methods and tools to a broad range of fields, and these tools can be used to begin such an image analysis algorithm. With the growing popularity of R [@R] in particular, the wide range of choices of open source statistical routines in the form of packages has significantly expanded statistical computing capabilities. 

Still, there remains a fundamental obstacle to the use of R. Effective use of R requires a commitment to learning and understanding programming, which some in the forensic science community may not have the desire to do so. Furthermore, although the open-source nature of R is one of its biggest assets, it also means that there is a more rapid development cycle than would often be found in more corporate software solutions. This means that R developers must continue to maintain their code while learning new programming concepts.

A number of tools have been developed in an attempt to address this issue. The commercial software on which R is derived, S-PLUS [@SPlus], includes a rudimentary graphical user interface (GUI) supporting data editing, graphing, and basic statistics. Over time, GUIs were developed for R as well. One of the first was R Commander [@fox2005], which provides a wrap-around user interface for R. With drop-down menus allowing point-and-click selection of a number of common data analysis and statistical functions, analysis could be performed without a knowledge of programming. More recently, the program Deducer [@fellows2012] also abstracts the programming into graphical menus and buttons. It expands on R Commander by providing an effective data viewer, help system, and easy to read tables displaying the results.

GUIs have some natural limitations that often make them a less appealing option for researchers. The results of an analysis from a GUI are not typically reproducible. Whereas an R script can be created, shared, and executed elsewhere, the actions taken in a GUI are not transcribed and portable. GUIs also tend to slow down the development and iteration process once the user has become more comfortable with the programming concepts. For instance, scripts allow copying and pasting of code blocks that need only minor modifications. In a GUI, the options representing a code block would need to be individually chosen through drop-down menus.

Recognizing some of these limitations, other approaches have been taken to easing the transition to working with R. RStudio [@RStudio] provides a GUI around R with expanded functionality, but maintains focus on the scripting and coding aspect. In this sense, RStudio more readily resembles an IDE (Integrated Development Environment), which aid the programmer rather than attempting to abstract the programming away. While this allows reproducibility and may still help a less experienced programmer begin to get started in a programming language, it still depends on a continuing effort to learn programming.

The Shiny package for R provides a framework for researchers with at least a limited knowledge of R programming to make their analysis available to other researchers. Shiny [@shiny] is a web development framework which can help turn the results of an R analysis into an interactive web application. Results can be generated by browsing to the website at which the Shiny application is deployed, and using GUI elements (drop-downs, text boxes, tabs) similar to R Commander or Deducer in order to generate results. But a Shiny application is standard R code, and hence maintains the reproducibility and maintainability benefits of standard R scripts.

Because Shiny offers a solution which maintains the benefits of both GUIs and standard programming, I believe it can form the basis for a new set of tools and concepts that greatly expand the reach of statistics. Those who are comfortable with programming can now provide functionality to those who aren't. This functionality can enable researchers to see, understand, and work with their data in ways that they were simply unable to. Ultimately, an open-source solution based on R and Shiny I believe can yield a bullet matching framework which allows for iteration and improvement, but doesn't shut out individuals lacking a knowledge of or a desire to learn programming. Specifically, a modern bullet matching application should aim to be:

- **Modular** - Components of the application can be dynamically enabled and disabled at run-time, allowing flexibility in terms of what functionality is presented to the end-user.
- **Extensible** - New components, or modules, can be written to further extend the functionality beyond what the base application allows.
- **Web-Based** - The application should live on the web, so that it can be accessed anywhere, from any device, without the need to deal with restrictive licenses or unsupported platforms.
- **Reproducible** - Results generated by the application should be reproducible. There should be no black boxes.

These ideas are described further over the course of this document in order to develop a modern, reproducible, and statistical application for the analysis of bullet lands. Before arriving at that point, however, we must further motivate the need for modern bullet matching algorithms. 
    
# Algorithms for Bullet Matching
    
In the United States, suspects are considered innocent until proven guilty "beyond a reasonable doubt". This in many ways parallels traditional hypothesis testing approaches, in which a pre-defined cut-off (significance level) is used to determine the threshold at which the null hypothesis is rejected (which presumably should occur once the evidence leads us beyond a reasonable doubt). 

Rifling, manufacturing defects, and impurities in a barrel create striation marks on the bullet during the firing process. These marks are assumed to be unique to the barrel, as described in a 1992 AFTE article [@afte:1992]. Current standard practice for bullet matching relies in part on the assessment of the so-called maximum number of consecutively matching striae (CMS), first defined by @biasotti:1959. One of the primary issues with this procedure is that a human inspection to determine CMS is subjective [@miller:1998]. Human inspection also requires on-site analysis of the bullets, which can be costly and time-consuming, and introduces the potential for differing opinions across different forensic examiners.

A modern development in this realm is the adoption of an open format for storing 3D topographical images of bullets in a format called x3p (XML 3-D Surface Profile). The x3p format conforms to the ISO5436-2 standard\footnote{\url{http://sourceforge.net/p/open-gps/mwiki/X3p/}}, implemented to provide a simple and standard conforming way to exchange 2D and 3D profile data. It was adopted by the OpenFMC (Open Forensic Metrology Consortium\footnote{\url{http://www.openfmc.org/}}), a group of academic, industry, and government firearm forensics researchers whose aim is to establish best practices for researchers using metrology in forensic science. Furthermore, NIST (the National Institute for Standards and Technology) is developing a database to allow searching and downloading of these x3p files\footnote{\url{https://tsapps.nist.gov/NRBTD/}}. Although limited to around 70 bullets at the time of this writing, this database in conjuction with open-source software to work with .x3p files opens up a whole new set of possibilities in terms of a statistical foundation for bullet matching.

The feasibility of creating a database of ballistic images that could be used to identify guns used in crimes was evaluated in a 2008 report by the National Research Council [@nap:2008]. The evaluation investigated the scalability of NIBIN (National Integrated Ballistic Information Network), which uses proprietary matching algorithms provided by IBIS. The bottom line of the report was that in spite of the many technical and practical hurdles, solutions to all but one problem could be found. The problem that remained is that statistically, the quality of the matching algorithm (in this case, of breech-face marks and firing pin impressions) could not withstand a hugely increased number of records while still maintaining a reasonable workload for forensic examiners, who have to examine possible matches suggested by the system.

We have several broad goals in developing a modern statistical matching algorithm. First, we wish to define every statistic or measure used objectively. Second, we will make the definitions and code open-source and publicly accessible so that its open for review by forensic scientists and statisticians alike. Third, we will investigate the distributional properties of these statistics across the available universe of bullets accessible to us in the database. Finally, we wish to provide an easy-to-use interface to serve as a front-end for the algorithm.

Critical to the success of a matching algorithm is the extraction of a set of features describing a bullet signature. In addition to the aforementioned CMS, the CCF is used, as it has for other bullet matching applications [@vorburger:2011]. Traditional bullet matching methods have used strict cutoffs (for instance, 6 CMS) to determine a match versus a non-match. We are aiming to be more robust in using a number of features and deriving conditional probabilities of matches given particular values of these features.

* *CCF*: Function of the optimum shift distance measuring the correlation between two profiles [@vorburger:2011]
* *CMS*: Striated markings that line up exactly with one another without a break or dissimilarity in between them [@biasotti:1959, @thompson:2013]. This and other forensic science papers using CMS typically count a single peak as a striae, while we count peaks and valleys, so our definition typically yields CMS values about twice those commonly found in the literature.
* *CMNS*: Striated markings that do not line up exactly with another, without matching striation between them.
* *Matches*: The number of matching striations between two signatures
* *Non-Matches*: The number of unmatched striations between two signatures
* *D* = $\sqrt{\frac{1}{\text{\#}t}\sum_t \left[f(t) - g(t)\right]^2}$ where $f(t)$ and $g(t)$ are aligned signatures. The euclidean vertical distance between surface measurements of aligned signatures. This is a measure of the total variation between two profiles [@clarkson1933definitions].
* *S*: The sum $S$ of average absolute heights of matched extrema: for each of the two matched stria, compute the average of the absolute heights of the peaks or valleys. $S$ is then defined as the sum of all these averages.  

With this in mind, we have developed an automated matching routine, written in R, which uses open and transparent statistical techniques to arrive at a predicted probability of a match at the bullet land level. This framework is provided as an R package called `bulletr` [@bulletr] with an associated web component, discussed in the next section. This is not the first automatic bullet matching system [@xie:2009, @riva:2014, @bachrach:2002]. But it builds on strong research principles by using a publicly accessible database, including fully reproducible results, and using a broad set of derived features to produce probabilities or scores based on a machine learning algorithm. 

This work has been submitted and accepted (with revisions) by the Annals of Applied Statistics. We are following up by investigating the properties of different features as applied to degraded bullets, and when compared with common cut-offs for match, non-match, and inconclusive from the literature. In particular, because a real world scenario often involves recovering only a fragment of the bullet from a crime scene, many of the traditional features such as CMS need generalizations that handle these cases. Furthermore, there is an open question regarding precisely the size of the fragment needed in order to be confident of a match "beyond a reasonable doubt". This warrants further investigation.

# A Web Framework for Bullet Matching

Critical to the success of a software system like this is that it can be used, extended, and enhanced by a broad range of scientists. This means we need a reproducible web-based software system that opens the R programming tools used for bullet matching to forensic examiners and forensic scientists who may not have the knowledge of, or desire to learn, programming. There are three primary components to this software system: the database, the front-end application, and the back-end application.

The first component is a database. Our database builds on top of the NIST Ballistics Research Database\footnote{https://tsapps.nist.gov/NRBTD}. NIST's database provides an open and transparent source of raw data files representing surface topologies of toolmarks. Our database allows the storage of **processed** versions of these toolmark surfaces, tracking the parameters necessary to reproduce the results. This allows for a quantitative assessment of each step of our bullet matching algorithm. The database is discussed at length in Chapter Four.

The second component of our software system is a web front-end aimed at those who do not have R programming skills. The web front-end allows forensic examiners to upload bullet land images, examine the surface topologies, and perform each aspect of the algorithm in order to arrive at a probability of a match. The final display includes a results page in which all chosen parameters of the algorithm are provided so that a report on the results can be generated and cross-checked by other researchers. Figure \ref{fig:bullets-app} displays a screenshot of this component.

\begin{figure}[H]
\centering
\includegraphics[width=\linewidth]{images/bullets_app.png}
\caption{Prototype user interface for the bullet matching algorithm.}
\label{fig:bullets-app}
\end{figure}

Finally, the third component is a web-based backend designed for the further assessment and development of the algorithms itself. This is aimed at researchers and scientists who wish to iterate on the performance of the algorithm by continuing to assess its results in the context of the database, and the different processed versions of toolmark images available. 

Together, these components form the basis of a modern software system that implements the algorithms from Chapters Two and Three using the software design principles I've outlined. Specifically, this system is **modular** in that the different components can be substituted as needed to fit new use cases. The system is also **extensible**, as functionality can be added to the R package which immediately gets reflected by the web-based systems. It also is fully **web-based**, requiring just a web browser to access the database. And finally, the results are all **reproducible** as each parameter yielding intermediate results is tracked throughout, and available for viewing in the database at any time.