Skip to main content

How statistical learning helps solve criminal activity

Police cars surround a hit-and-run accident on a road.
Hit-and-run traffic accidents are rarely solved, however, two SDSU faculty members are researching ways statistical learning techniques can aid investigations.

An associate professor in South Dakota State University's Jerome J. Lohr College of Engineering have received a grant from the National Science Foundation to train artificial intelligence models in making accurate predictions with theoretical guarantees. 


Hit-an-run accidents are rarely solved. In fact, around 90% of these types of accidents go unresolved. This is primarily due to the lack of evidentiary material at the accident site, which is — at most — tire marks left on the street. That’s just one of the applications that could result from a National Science Foundation-backed project being conducted by two Jerome J. Lohr College of Engineering faculty members.

Resolving a hit and run case from a pair of tire marks is arduous; however, statistical pattern recognition and machine learning techniques can aid in narrowing down potential offenders during the criminal investigation process. That’s just one of the applications that could result from a National Science Foundation-backed project being conducted by two Jerome J. Lohr College of Engineering faculty members.

Statistical learning models can extract useful information from an image, which can be used to generate data. For this situation, AI would extract information related to tread length and tread pattern. This data can then be used to inform classification models — algorithms used to analyze data points that are then categorized into different classes — which have the ability to learn and make predictions from small datasets and few images. This is known as "few-shot learning" and is a key technique utilized by models in making accurate predictions. 

Few-shot learning is a technique that allows a machine learning model to make predictions for new classes based on just a few examples of labeled data. Few-shot learning can be applied to evidence collection during the investigation process, as described above, and is being utilized in the new project in the Lohr College of Engineering. It’s funded by a $350,796 National Science Foundation grant. 

Led by associate professor of statistics Semhar Michael, the SDSU project aims to develop ways to make large-scale predictions about a dataset with a large number of categories but few exemplars in each category using the few-shot, or one-shot, learning techniques. Few-shot and one-shot techniques are similar in nature as both are machine learning models. The major difference is one-shot learning relies on a single example, while few-shot relies on a "few" examples. 

"The goal of this research is to create a range of models and algorithms that can better handle few-shot learning problems with applications to forensic source identification and geospatial intelligence," Michael said. 

Another goal of this research will be to provide statistical guarantees backing these statistical machine learning models. Because these methods will be used in identifying the source of forensic evidence, there needs to be an assertion that they are not only correct, but trustworthy and unbiased. 

"This will help avoid situations where there is are miscarriages of justice," Michael said. 

Michael is collaborating with Christopher Saunders, associate professor of statistics in SDSU's Department of Mathematics and Statistics, who has significant experience in forensic identification of source problems; Yana Melnykov from the University of Alabama, who will provide expertise in computational statistics methods; and Paul May from the South Dakota School Mines, who will contribute to spatio-temporal machine learning models.

Once the statistical guarantees are complete, these models can be applied to real-world settings. In South Dakota, the researchers believe this work can used to support the criminal justice system or intelligence community. They could also be used to disrupt the illicit economy by helping to identify the source of illicit drugs.