ImageNet Roulette and AI Bias

Over the past few weeks, a link made its way around the internet. A minimalist webpage, black text, white background, with a brief description and a button to upload and label your own picture. Once uploaded, the picture was displayed with an inset green box and a label: “fighter pilot,” “chief executive officer,” “alcoholic,” “bitch.” With its quotable hit-or-miss labels, ImageNet Roulette was taken idly as an ironic personality test, an edgy filter – at least, for those fortunate enough to receive benign labels. Some users were designated as members of demographic groups and cultures, some as gendered slurs. Most concerningly, some users (almost always people of colour) were labeled with racial slurs. ImageNet Roulette became disturbing, inscrutable. What kind of artificial intelligence (AI) would label someone an alcoholic? What kind of computer – objective software, right? – would call someone a slur? The kind of AI, it seems, that was trained on biased data: and this is precisely the point.

In order to understand ImageNet Roulette, we have to first understand its foundations. The application relies on a set of categories from the “Person” subsection of ImageNet, a database of categorized images. ImageNet in turn draws its category labels from WordNet, a dataset composed of “synonym sets,” or “synsets,” which are made of words and word phrases with similar meanings. WordNet maps semantic connections, arranging words according to their meaning and their relationships to other words and meanings. This lexical database, developed by researchers at Princeton University, is linked to a wide array of programs and projects, including ImageNet. The database was manually constructed in 1986, meaning that WordNet’s vocabulary and organization are based on its creators’ judgments and “intuitions.” The creators of WordNet, native English speakers and trained linguists, constructed a database according, theoretically, to their best knowledge and understanding. The decisions they made in terms of the connections between words are baked into the database, and their implicit biases and subconscious associations are as well.

WordNet is the basis for ImageNet, which associates an average of 1,000 images to each of WordNet’s synsets. ImageNet’s stated goal is to create “a useful resource” for “anyone whose research and education would benefit from using a large image database.” ImageNet’s pictures are examined and annotated by people, who make decisions and choose labels based on their own interpretations, leaving their fingerprints, and a layer of inherent bias, on the database.

Then, there is ImageNet Roulette. ImageNet Roulette is a neural network* which uses more than 2,500 labels sourced from ImageNet’s “Person” categories to sort user-submitted images. It was created by Dr. Trevor Paglen and Dr. Kate Crawford, with software developed by Leif Ryge. They describe it as a “provocation,” intended to expose the ripple effects of bias in the datasets we use to train artificial intelligence. According to the ImageNet Roulette site, the application uses a deep learning framework to associate each image with its label. Some of the labels, as many users quickly learned, are distressing. There are racial slurs; there are cultural groups; some labels are misogynistic. By opening ImageNet (and, by extension, WordNet) to the public in this way, Paglen and Crawford shone a light on these issues, and, by extension, the issue of bias in AI.

It can be easy to forget that artificial intelligence and computer programs are created by fallible people. ImageNet Roulette forces us to confront this reality: that researchers are biased, that human beings specifically associate images, traits, words, and meanings with racial groups and, to a lesser extent, genders. ImageNet Roulette can only draw from the base it was given, and it was given a base that had consciously-incorporated racist terminology. Had Paglen and Crawford not developed their application, and had it not gone semi-viral, would the general public be aware of this? The statement accompanying the application explains that “AI classifications of people are rarely made visible to the people being classified.” Since ImageNet Roulette’s publication, the “Person” categories within ImageNet have been taken down for maintenance. Without ImageNet Roulette, would new AI have continued to be trained on these data? Would neural networks be learning to identify racialized faces as slurs, as criminals?

The particular experience of ImageNet Roulette is, in many ways, the least of our concerns; it is a sign of a larger issue. Artificial intelligence and neural networks are being used, designed, and developed continuously, using datasets that may or may not be carefully curated. Computer science is overwhelmingly white, and the number of women earning computer science degrees has been in decline. If marginalized people are not represented within the groups creating and examining artificial intelligence and neural networks, this issue will not improve. When largely white, largely male development teams are in charge of establishing category parameters, those parameters are far more likely to have negative biases, simply due to the limited worldviews and experiences of a more homogenous – and dominant – group.

When we teach computers using our ways of seeing the world, how are we shaping that lens? Are we shaping it to be inclusive, or are we shaping it to extend harmful social structures?
There are other ways and better ways to do AI. Paglen and Crawford’s work can be encouraging: someone is paying attention. And now, thanks to them, we are all paying attention.

This article serves as an introduction to Sci+Tech’s new AI column, Alternative Intelligence. Across the world, there are people working on alternative, inclusive understandings of artificial intelligence. These efforts are often, if not always, groundbreaking – and the Daily would like to shed light on the people doing that work. Through this focused lens, we will explore new and different understandings of AI and deep learning which feature new ethical stances, cultural frameworks, and decolonization within AI research and development. If you are interested in contributing to this column, please send an email to scitech@mcgilldaily.com.

*A type of machine learning software inspired by the function of the human brain. Each neuron in the brain handles a small part of any individual problem. Neural networks are composed of cells that each handle small pieces of information. The more examples a neural network is given, the better it performs.