You may have read recently about machine learning's bias problem particularly in word embeddings and vectors. It's a massive problem. If you are using word embeddings to generate associative words, phrases or to do comparisons, you should be aware of the biases you are introducing into your work. In preparation for my EuroPython talk on machine learning with sentiment analysis, I came across some disturbing nearest neighbor vectors when using Google's news vectors[^1] in emotionally charged speech; this provoked me to further investigate the bounds of *isms[^2] in word embeddings.
I must warn you that parts of this post are disgusting, disturbing and awful. If you are having a rough day, feel free to save for another time. If you are already sick of seeing hateful language, this is likely not a post to read at present. That said, I feel my duty as a former journalist to look at it, expose it, and hope to spark better conversations around how we handle both implicit and explicit bias and prejudice in our models.
In my research, not dissimilar to Bolukbasi, Chang, Zou, Saligrama and Kalai's findings, I found word embeddings rife with examples of sexism. Take the following example,
model.most_similar(['lady'], topn=20) produces several expected words, 'woman', 'gentleman', even 'gal' alongside some gems like 'beauty queen', 'FLOTUS' and 'vivacious blonde'. Whereas,
model.most_similar(['gentleman'], topn=20) produces several expected words, 'man', 'gentlemen', 'gent' as well as some flattering terms like 'statesman', 'sportsman' and 'stunningly handsome'.[^3]
To dive a bit deeper into how these biases play out, let's do some standard analogies. We all know the King-Queen comparison, how might that apply to other professions?
In: model.most_similar(positive= ['doctor', 'woman'], negative=['man']) Out: [('gynecologist', 0.7093892097473145), ('nurse', 0.647728681564331),...]
So, Doctor - Man + Woman = Gynecologist or Nurse. Great! What else?
In: model.most_similar(positive=['professor', 'woman'], negative=['man']) Out: [('associate_professor',0.7771055698394775), ('assistant_professor', 0.7558495402336121),...]
So, Professor - Man + Woman = Associate / Assistant Professor. Now, for something near and dear to me...
In: model.most_similar(positive=['computer_programmer', 'woman'], negative=['man']) Out: [('homemaker', 0.5627118945121765), ('housewife', 0.5105047225952148), ('graphic_designer', 0.505180299282074),...]
So, Computer Programmer - Man + Woman = housewife. Or graphic designer. Because of course women only do design work (never great male designers or amazing female DBAs). Now pay attention that some of these vectors have varying degrees of similarity (noted in the second element of the tuple); the higher the number, the closer the vectors. That said, these are real responses from word2vec.[^4]
I hadn't seen much written about word2vec's racist and xenophobic tendencies, but after playing around with sexism, I assumed I would find some. Again, fair warning that hateful language lies ahead!
In: model.most_similar(positive=['immigrant'], topn=30) Out: ('immigrants', 0.7985076904296875), ('Immigrant', 0.6984704732894897), ('migrant', 0.6784891486167908), ('illegal_immigrant', 0.6712934970855713),...]
So it only took our model to the fourth most similar vector to assume our immigrant is illegal. Scanning the rest of the word list, I found some references to violence tied to immigrants, but no positive associative words.
A few searches into African-American and man, I found that 'Negroes' existed not far from 'african_american' and 'black' + 'man'. Taking a look at the other nearest neighbors,
In: model.most_similar(positive=['Negroes'], topn=30) Out: [('negroes', 0.7197504639625549), ('blacks', 0.6292858123779297), ('Negro', 0.5892727375030518), ('Blacks', 0.5798656344413757), ('negro', 0.5609244108200073), ('slaves', 0.5548534393310547), ('niggers', 0.553610622882843),...]
Yep. Word2Vec just dropped the N-Word in the middle of my search. It's clear that Microsoft isn't the only one with potential racist bot abuse.
There were many more offensive phrases I found, many of which I didn't save or write down as I could really only stomach 5 minutes at a time of research until I needed a mental and spiritual break. Here are a summary of some I remembered and was able to find again:
- mexicans => illegals, beaners
- asians => gooks, wimps
- jews => kikes
- asian + woman => teenage girl, sucking dick
- gay + man => "horribly, horribly deranged"
- transsexual + man => convicted rapist[^5]
I'm certain these are not the only -isms that lie in the vectors. Although these offensive vectors are often not the top similar result, we can see that hidden inside these word embeddings are offensive, demeaning, repulsive mirrors on the -isms in our society. Journalists are not always unbiased, and the news itself often contains quotes, references and other pointers to things we might rather not see or confront. Therefore, using the news to train our language models is shown here to expose our model to the *-ism-rich underbelly of our society.
We, as data scientists and computer programmers, should recognize these statistical certainties in our data. I will note that doing similar searches in the Wikipedia vectors produced far less offensive and hateful speech. I would be curious as to other vector models trained on different texts can help us produce ethical models for our use or if we can prove findings around unlearning bias[^6].
Confronting racism, sexism, heteronormativeism and likely many other *isms in our models is not something we can avoid or ignore: it's already here and at work. Taking a raw look at it and determining how we then treat our broken models is a step we will all be forced to take either now or later.
Note: if you find other isms or are working on anything related to challenging bias in machine learning, I would love to hear from you! Feel free to reach out in the comments, email katharine at kjamistan or on social media.*
[^1] To download the model used in this post and read about how the model was developed, go to Google's original word2vec release. tldr; it was trained on 300 billion words via english-language news articles (on Google News datasets) and contains 300-dimensional vectors for 3 million words.
[^2] For the purpose of this post, isms will be used to represent a variety of oppressive societal constructs such as racism, sexism and heterosexism. I am certain there are likely more hidden isms in word embeddings, as well as more examples of these *isms in both the news vectors as well as other embedding models and other languages.
[^3] Mind you: I was surprised at that one! Indeed, it shows the inherent cultural bias of judging all genders by our looks -- another *ism in our social language.
[^4] To see the entire code yourself, check out my github.
[^5] I found 'transsexual' via searching for 'transgender'.
[^6] Bolukbasi, Chang, Zou, Saligrama and Kalai's research was also able to show bias can be expressed as a directional vector(!!!). We could possibly use machine learning to unlearn the aforementioned biases.