Audits Can Correct Harmful Chatbot Behaviour

Artificial intelligence chatbots are increasingly being criticised for poor social judgement. Some systems have faced lawsuits for recommending dangerous actions, while others have been described as overly agreeable or sycophantic. These concerns may become more serious as AI chatbots take on larger roles in customer service, workplace communication, and human interaction. According to Yan Leng, assistant professor of information, risk, and operations management at the McCombs School of Business at the University of Texas at Austin, understanding and evaluating chatbot behaviour is becoming increasingly important.

To address this challenge, Leng developed a behavioural auditing framework for large language models (LLMs), the technology behind systems such as ChatGPT. Her framework, called state–understanding–value–action (SUVA), is designed to examine how AI systems make decisions. By identifying a model’s behavioural tendencies, organisations can determine whether a chatbot aligns with their values and intended uses. If a model does not perform appropriately, it can potentially be adjusted through prompting or fine-tuning before deployment.

Leng compares the framework to evaluating a human’s values through actions and reasoning. The SUVA process begins by giving an LLM a prompt that includes instructions to reason step by step. Researchers can then analyse how well the model understands the situation, what values it expresses during decision-making, and what action it ultimately chooses. Leng emphasises that these “values” are not evidence of consciousness or human-like thinking, but rather patterns reflected in the model’s generated text.

Working with Yuan Yuan of the University of California, Davis, Leng used SUVA to study the social preferences of eight major LLMs, including OpenAI’s GPT and Meta’s Llama. Their research relied on the “dictator game,” a classic behavioural economics experiment used to measure self-interest and fairness. In different scenarios, AI models were asked how they would divide points between themselves and others. The researchers then analysed whether the models prioritised self-interest, fairness, or broader social welfare.

After conducting thousands of tests, the researchers identified several important patterns. Most models were not entirely self-interested and often showed moderate concern for social welfare. The AI systems also changed their behaviour depending on context. For example, some models became significantly more generous when told they shared something in common with another participant, such as a hometown. Workplace settings also influenced responses, with models more likely to divide rewards equally when contributions were described as equal. These findings suggest that AI systems can adapt their behaviour according to social and environmental cues.

Leng believes the study demonstrates the importance of regular auditing and retraining of AI systems. Since chatbot behaviour may change unpredictably when new versions are released, organisations should continuously reassess models before using them in sensitive settings. The SUVA framework could also be applied to study other aspects of AI decision-making, including moral reasoning, risk preferences, and time-related choices. Despite the enormous complexity of LLMs, Leng finds it remarkable that many human-like preferences appear to emerge through relatively simple behavioural patterns.

More information: Yan Leng et al, SUVA: A Probabilistic Framework for Auditing LLMs with an Application to Social Preferences, Information Systems Research. DOI: 10.1287/isre.2024.0857

Journal information: Information Systems Research Provided by University of Texas at Austin

S	M	T	W	T	F	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

DiscussBoss

Audits Can Correct Harmful Chatbot Behaviour

Leave a Reply Cancel reply