She Said, AI Said: How Gender Bias Creeps Into Chatbots
New research finds that training text is overwhelmingly written by men, embedding in ChatGPT perspectives and stylistic differences that could have far-reaching implications
The rollouts of various artificial intelligence tools have been awash in sexism.
Cases in point: Google’s translation defaulted to masculine versions of words like “doctor” in languages that have gendered nouns; an Amazon hiring algorithm dinged female applicants; and image generators whipped up sexualized images of women while creating professional versions of men.
Tech companies have tried to clean up the results with varying success. For example, Google now offers up both masculine and feminine versions of translations.
Some of these issues stem from the datasets skewing towards male-written sources, a known source of the biases, and now it seems the same problem besets ChatGPT. New research finding that training text is overwhelmingly written by men, embedding in ChatGPT sometimes subtle perspectives and stylistic differences that could have far-reaching implications.
“Gender is an important lens that, for good or ill, shapes our interaction with the world,” Jessica Kuntz, policy director at the University of Pittsburgh Institute for Cyber Law and co-author of the paper, told The Messenger. “And what we write is a reflection of how we view the world.”
The paper estimates that women wrote 26.5% of the billions of words that made up ChatGPT’s training data. Kuntz and her co-author, University of Pittsburgh postdoctoral associate Elise C. Silva, got to that number by peering into OpenAI’s disclosure of the model and breaking down those sources. At least part of the data comes from Common Crawl, an open-source collection composed of over 50 billion webpages, upvoted Reddit comments, a dataset of books called Books1/Books2 and Wikipedia.
It is well documented that the majority of Reddit and Wikipedia users are male, with just 31% and 9%, respectively, of authors on those sites being women. Common Crawl was “extraordinarily difficult” to generalize because of its diverse group of sites, according to the researchers, but by taking the top three sources as “broadly representative,” they settled on 19% of that dataset being authored by women.
- AI Bias: Gender and Race Risks Loom as Banking Embraces Artificial Intelligence
- Medical AI Chatbots Could Worsen Health Disparities for Black Patients
- Women ‘Face a Constraint on Opportunity’ Not ‘Gender Bias’ When Moving up at the Workplace: Study
- Hiring software could reinforce existing bias. Can the ethical AI movement change that?
- Ready or Not, AI Chatbots Are Coming to Hospitals
- Study Hints That AI Chatbots Can Be Solid Personal Trainers
Books1/Books2 was the only category that potentially had an equal split of genders because of the recently achieved parity in mainstream publishing and women over-indexing in the self-publishing space, although Kuntz points out this is the least understood dataset (and the reason behind some recent lawsuits from authors whose books were likely used in the training). Averaging these figures got the researchers to the 26.5% result.
Kuntz acknowledges that this is a simplification of gender that neglects trying to tease out nonbinary authorship and that researchers need to look at other lenses that shape the world of a writer, like race and socioeconomic status, for an even more complete picture of potential issues. The breakdown is at best “vaguely educated guesses,” but the fact that it had to be guessed at is another problem to address.
“Transparency has become a catchphrase – but it is also a necessary precondition to evaluate these tools for bias, potential misuse, and accuracy,” Kuntz and Silvak write in the paper.
Words Matter
Companies like OpenAI attempted to use data sources with limited sexist text, but even the most innocuous writing could affect how chatbots respond. Women write differently than men, according to multiple studies.
Beyond the use of more tentative language, women use more pronouns, making their writing more personal. Women also tend to write about different things, with one study of women-written blogs focusing more on the personal aspects of their lives compared with men, who wrote more about politics, technology and money. Kuntz said she noticed some of these gendered differences in her own writing.
“It's quantifiably different,” Kuntz said. “Now that it's been pointed out, I can't not see it.”
Right now, chatbots like ChatGPT aren’t integrated into daily life or systems we use regularly. But an AI powered by text written primarily by men that is widely used by governments or companies could have a negative impact. For example, Kuntz points out that if these tools end up being integrated into law enforcement or the judicial system, they could subtly reinforce gendered notions around sexual assault and abuse.
On the other hand, Kuntz said that ChatGPT could be better at doing things like writing a resume and really leaning into accomplishments with less tentative language, something some job seekers struggle to do.
“If you feed in your humble resume,” she said. “ChatGPT amplifies it by a factor of 10.”
The paper looked at GPT-3 since OpenAI chose not to share any details about the more recently released GPT-4, including training data, arguing that “both the competitive landscape and the safety implications” precluded the company from revealing information. More disclosure about datasets could go a long way in identifying issues before any harm occurs, Kuntz said.
“We're going to give these tools a prominent place in our society,” Kuntz said. “It does feel like we ought to be very intentional about shaping the choices of what goes into them.”
- Student Loan Servicers That Sent Late Bills to 758,000 Borrowers Get Slapped by the FedsBusiness
- Peloton Stock Surges on TikTok DealBusiness
- Boeing Wants FAA to Clear Smallest 737 Max Jet Despite Overheating ProblemBusiness
- Delta Is the Most On-Time US Airline for Third Year in a Row, Travel-Data Firm SaysBusiness
- Chinese Shadow Bank Files for Bankruptcy as Real Estate Crisis Racks NationBusiness
- The Life and Rise of Chip Wilson, Lululemon’s Controversial Billionaire FounderBusiness
- Where the Jobs Are: These Are the Sectors Doing the Most HiringBusiness
- Furious Customer Confronts Hapless McDonald’s Cashier Over Blue and White McChicken Wrapper, Claims It Shows Support for IsraelNews
- Exxon Mobil Joins Chevron in Blaming California for Billions in Asset ImpairmentsBusiness
- How to Claim Part of Verizon’s Proposed $100 Million SettlementBusiness
- What Did People Who Forgot a Present Do on Christmas Day? Pulled Out Their PhoneBusiness
- Tesla Recalls 1.6 Million EVs in China Over Autopilot Crash RisksBusiness
