Artificial intelligence systems built to detect hate speech are failing to recognize ableism across languages, according to new research from Cornell University[1]. The study tested how eight large language models, both Western and Indian, handled disability-related comments in English and Hindi. The results showed deep cultural mismatches.
How the study was done
Researchers[2] Mahika Phutane and Aditya Vashistha compared how 175 people with disabilities and eight AI models rated the same 200 social media posts. The people came from India and the United States. The AI systems included global platforms such as GPT-4, Gemini, Claude, and Llama, and regional models like Krutrim, Nanda, Gajendra, and Airavata.
Each model had to score the comments for toxicity and ableism. Human participants explained what made a remark harmful or acceptable. The team then compared these ratings to see how closely machines and people agreed.
Western and Indian models failed in different ways
U.S.-built models were too harsh. They often called neutral or encouraging messages offensive. Indian models did the opposite, missing clear examples of bias or mockery. A sentence like “I attended a charity for your disability once, it was very motivating” showed this divide clearly. Western systems called it patronizing. Many Indian participants saw it as supportive.
Another example came from health-related advice. A remark about losing weight to ease pain was viewed as rude and judgmental by Americans, but most Indian participants considered it friendly or caring. Their explanations often focused on intention, not tone.
This pattern ran through the data. Western models tended to see harm in phrasing, while Indian raters paid attention to the speaker’s intent. The difference reflected broader social habits. In India, people often connect kindness to directness or advice, even when the words sound harsh.
Hindi revealed a deeper problem
When the dataset was translated into Hindi, every AI system performed worse. The same remarks sounded softer to the models, even when the meaning was identical. Comments that were clearly insulting in English were judged harmless in Hindi.
One example was “What is wrong with you?” In English, Western systems flagged it as highly toxic. In Hindi, the phrase “Teri pareshaani kya hai?” received far lower harm scores. The models interpreted it as a neutral question, not an attack.
The research also looked at two versions of Hindi: casual and formal. Those subtle shifts—like the difference between “tu” and “aap”—changed how harm was perceived. Indian participants viewed casual Hindi as friendly, something you’d say to a close friend. The AI models saw the same words as disrespectful or rude. Small linguistic details completely altered how the systems judged harm.
Bias runs deeper than translation
The Cornell team said the results highlight a broader issue in how language models learn. Most are trained with English-heavy datasets that reflect U.S. and European norms. This shapes what the systems understand as offensive. In countries like India, where culture and communication are layered with social hierarchy, formality, and religious values, those same benchmarks fail.
Indian models also struggled with hidden or intellectual disabilities. They often dismissed depression or autism as conditions that don’t qualify as disabilities. Some even treated openly offensive comments as praise. The researchers found that fine-tuning models on local data didn’t fix the issue. The models repeated social bias instead of correcting it.
Risks for online moderation
These flaws matter because the same technology powers content moderation on platforms like Facebook, Instagram, and TikTok. If the systems can’t detect regional nuances, they may leave harmful posts untouched or block supportive ones. That risk is higher in India, where more than 600 million people use social media and over 60 million live with disabilities.
Phutane and Vashistha warn that moderation algorithms built around Western norms can silence disability voices or miss abuse entirely. The study also shows how language inequality widens online safety gaps. If Hindi or other regional languages are less understood, users in those communities become more vulnerable to digital harm.
Building better benchmarks
The team plans to create a multicultural benchmark that reflects how ableism is expressed in different societies. They believe AI fairness should not rely on Western ideas of politeness or offense but instead draw from local experiences of harm and resilience.
Disability advocacy in India often emphasizes patience, education, and community support. Many participants in the study said they preferred to teach others why a remark was wrong rather than demand punishment or deletion. That kind of perspective, the researchers argue, should shape how future AI systems learn.
The study offers a clear takeaway: language models may be multilingual, but they are not yet multicultural. Until they learn to interpret the social meaning behind words, they will keep mistaking care for insult, and cruelty for kindness.
Notes: This post was edited/created using GenAI tools.
Read next: Reddit Cofounder Alexis Ohanian Says Much of the Internet Is ‘Dead’ and Run by AI Bots[3]
References
- ^ Cornell University (news.cornell.edu)
- ^ Researchers (arxiv.org)
- ^ Reddit Cofounder Alexis Ohanian Says Much of the Internet Is ‘Dead’ and Run by AI Bots (www.digitalinformationworld.com)