A 630-Billion-Word Internet Analysis Shows ‘People’ Is Interpreted as ‘Men’


What do you visualize when you read words such as “person,” “people” or “individual”? Chances are the image in your head is of a man, not a woman. If so, you are not alone. A massive linguistic analysis of more than half a trillion words concludes that we assign gender to words that, by their very definition, should be gender-neutral.

Psychologists at New York University analyzed text from nearly three billion Web pages and compared how often words for person (“individual,” “people,” and so on) were associated with terms for a man (“male,” “he”) or a woman (“female,” “she”). They found that male-related words overlapped with “person” more frequently than female words did. The cultural concept of a person, from this perspective, is more often a man than a woman, according to the study, which was published on April 1 in Science Advances.

To conduct the study, the researchers turned to an enormous open-source data set of Web pages called the Common Crawl, which pulls text from everything from corporate white papers to Internet discussion forums. For their analysis of the text—a total of more than 630 billion words—the researchers used word embeddings, a computational linguistic technique that assesses how similar two words are by looking for how often they appear together.

“You can take a word like the word ‘person’ and understand what we mean by ‘person,’ how we represent the word ‘person,’ by looking at the other words that we often use around the word ‘person,’” explains April Bailey, a postdoctoral researcher at N.Y.U., who conducted the study. “We found that there was more overlap between the words for people and words for men than words for people and the words for women…, suggesting that there is this male bias in the concept of a person.”

Scientists have previously studied gender bias in language, such as the idea that women are more closely associated with family and home life and that men are more closely linked with work. “But this is the first to study this really general gender stereotype—the idea that men are sort of the default humans—in this quantitative computational social science way,” says Molly Lewis, a research scientist at the psychology department at Carnegie Mellon University, who was not involved in the study.

The researchers also looked at verbs and adjectives commonly used to describe people—for example, “extrovert”—and found that they were more tightly linked with words for men than those for women. When the team tested stereotypically gendered words, such as “brave” and “kill” for male individuals or “compassionate” and “giggle” for female ones, men were associated equally with all of the terms, while women were most closely associated with those considered stereotypically female.

This finding suggests that people “tend to think about women more in gender-stereotypical terms, and they tend to think of men just in generic terms,” Bailey says. “They’re thinking about men just as people who can do all kinds of different things and thinking about women really specifically as women who can only do gender-stereotypical things.”

One possible explanation for this bias is the gendered nature of many supposedly neutral English words, such as “chairman,” “fireman” and “human.” A way to potentially counteract our biased way of thinking is to replace those words with truly gender-neutral alternatives, such as “chairperson” or “firefighter.” Notably, the study was conducted using primarily English words, so it is unknown whether the findings translate to other languages and cultures. Various gender biases, however, have been found in other languages.

While the bias of thinking “person” equals “man” is somewhat conceptual, the ramifications are very real because this tendency shapes the design of the technologies around us. Women are more likely to be severely injured or die in a car crash because when car manufacturers design safety features, the default user they envision (and the crash dummy they test) is a male individual with a heavier body and longer legs than the average woman.

Another important implication has to do with machine learning. Word embeddings, the same linguistic tools employed in the new study, are used to train artificial intelligence programs. That means any biases that exist in a source text will be picked up by such an AI algorithm. Amazon faced this problem when it came to light that an algorithm the company hoped to use to screen job applicants was automatically excluding women from technical roles—an important reminder that AI is only as smart, or as biased, as the humans who train it.



Source link