A new experiment shows how tone shifts model reasoning.

A polite “please” might be good manners, but it could make your AI assistant slightly less accurate. A new study from Pennsylvania State University suggests that ChatGPT performs better when asked questions in a blunt or even mildly rude tone.

The research[1], titled Mind Your Tone: Investigating How Prompt Politeness Affects LLM Accuracy, tested how OpenAI’s GPT-4o model handled questions written with different levels of politeness. The team built a small but controlled dataset of fifty multiple-choice questions in subjects such as math, science, and history. Each was rewritten five ways… very polite, polite, neutral, rude, and very rude… creating 250 unique prompts.

When the model was asked to answer the same questions under these five tone conditions, it performed best when the prompt was less respectful. On average, very polite questions were answered correctly 80.8 percent of the time, while the accuracy rose to 84.8 percent when the language became openly dismissive. Even small differences mattered: neutral phrasing scored 82.2 percent and rude language 82.8 percent.

Testing politeness like a science experiment

The researchers, Om Dobariya and Akhil Kumar, ran multiple tests and applied a paired-sample t-test to confirm that the differences were statistically meaningful. In practical terms, that means the effect wasn’t random. The team repeated each question ten times, making sure the model was reset for every run to prevent memory bias.

Each tone level used a set of example phrases. The polite group included lines such as “Could you please solve this problem?” The rude versions used wording like “If you’re not completely clueless, answer this.” The harshest category included insults such as “You poor creature, do you even know how to solve this?” and “Hey gofer, figure this out.”

The idea wasn’t to provoke the chatbot, but to measure whether the framing of a task changed how clearly it reasoned. The researchers describe tone as a “pragmatic cue” that shapes how a model interprets intent. They found that polite wording tends to include indirect or redundant phrasing, which could distract the system from the main task. By contrast, blunt language gives the model a cleaner instruction with less linguistic clutter.

Rudeness helps, but only in narrow tests

The team compared its findings with an earlier 2024 study from Japan’s Waseda University and RIKEN that reported the opposite trend. That earlier work used older models such as GPT-3.5 and found that rude prompts often reduced performance. Dobariya and Kumar argue that newer models like GPT-4o may respond differently because they are trained to prioritize directness over mimicry of human politeness.

In a follow-up note, the Penn State team said it plans to test other systems such as Claude and GPT-o3 to see whether the pattern holds. They also suggest that future work should explore how “perplexity” (a measure of how predictable a sequence of words is) might explain the result. Shorter, more direct prompts often have lower perplexity, which may make it easier for the model to parse intent.

Don’t take it as advice to be rude

The authors made it clear that the study doesn’t endorse hostile or toxic interaction styles. In their ethics section, they caution that using demeaning language with AI tools could normalize aggressive behavior toward humans as well. Their goal, they wrote, is to understand how surface tone affects reasoning, not to promote verbal hostility.

Even with its limits… the experiment involved only one model and a small question set… the study highlights an odd shift in AI behavior. As chatbots evolve, their logic appears to depend less on social cues and more on how clearly tasks are phrased. For users, that means a direct, even curt, question might get the job done faster than a polite request.

In the age of conversational AI, it seems manners may not make the machine.

Notes: This post was edited/created using GenAI tools. I

Read next:

AI Misreads Disability Hate Across Cultures and South Asian Languages, Cornell Study Finds[2]

• X to Add More Profile Details to Help Users Judge Authenticity[3]

By admin