Biases in Machine Learning – a Crime?

Physiognomy describes the practice of “assessing a person’s character or personality from their outer appearance” (mostly from their facial appearance). Physiognomy has a long history reaching all the way back to the old greeks, but is now considered a pseudoscience and often associated with scientific racism to justify discriminatory actions. However, the rise of artificial intelligence led to a resurge of interest in physiognomy with a number of research papers claiming to have built working classifiers for determining characteristics of a person from pictures of their faces.

One paper with the title “Automated Inference on Criminality using Face Images” published in 2017 stirred up a significant amount of discourse as it claimed to have trained models correctly classifying criminals from non-criminals based on facial images while controlling for race, gender, age and facial expressions. Not only was there cricism from machine learning researchers regarding the papers technical approach, the ethics community also weighed in. After all, how could it be that machine learning models are capable of achieving something that is generally considered impossible by the scientific community? And what would that mean for our society? Could the police use this technology to prevent crimes from happening? And would that require society to convict people just because they look like criminals to an algorithm? No question, if the findings in the paper were correct, this would pose a substantial ethical issue.
However, it quickly became clear that the paper in question did not give rise to these questions due to a number of fundamental flaws in the conducted research process. Let’s go over some of these problems in more detail:

Bias in the Data

While the researchers repeatedly talk about how unbiased machine learning algorithms are (which is generally speaking true), they ignore the fact that almost all bias is introduced by the data used for training the models. This is also the case with this paper. For training their models, they scrape the internet for pictures of human faces and use that as the “non-criminal” dataset while utilizing ID photos of convicted criminals and wanted suspects as the “criminal” dataset. Ignoring the issue, that some of the scraped pictures from the web might also include criminals and that some of the convicted criminals and suspects might be incorrectly convicted or wanted, there still is a huge problem with this approach: The images from the web are probably made for promotional purposes depicting the subject in a favourable position and positive light. The ID photos on the other hand are very neutral and are not taken for promotional purposes. This means that the trained model might not actually distinguish images of non-criminals and criminals but instead professional, promotional pictures from ID photos.

A second, slightly minor issue is that the model might not pick up on features which are specific to criminals (because these don’t exist), but rather on features which make a jury more likely to convict a subject. For example, it has been shown in other studies that a jury is more likely to convict non-attractive subjects.

The Power of a Smile

Putting aside the bias problems we explored above, the researchers report that their model achieves a 90% accuracy in distinguishing criminals from non-criminals. They also report which facial features are especially important for the decision:

The algorithm finds that criminals have shorter distances d between the inner corners of the eyes, smaller angles θ between the nose and the corners of the mouth, and higher curvature ρ to the upper lip.

This sounds really complicated, what could it mean? The answer is pretty simple: smiling. Their model picked up smiling as the most important feature for distinguishing criminals from non-criminals – what an incredible insight! Again, this points back to the bias in the dataset: People in the non-criminal dataset tended to smile while people in the criminal dataset did not. The trained model is completely incapable of detecting which person is a criminal, it is only correctly detecting which sample set the pictures came from, based mostly on the presence of a frown or smile.

While technical issues can happen even in research papers, it is always important to be aware of the biases introduced with the training data. The real problem with the paper was not that it contained technical problems (after all, this can happen), but rather that the researchers claimed that they had excluded all possible sources for biases and that the results they found were completely neutral. They even describe how they originally expected the “hypothesis” (that one can discriminate criminals from non-criminals based on facial images) to not be confirmed but that “science has spoken” so to say.

To conclude, it is sometimes very difficult to detect bias in your dataset – even for professional researchers. This is why you should always be very careful when reading scientific publications.