HUMANLM: Simulating Users with State Alignment Beats Response Imitation
Jan 1, 2026·,,,,,,,,,·
0 min read
Shirley Wu
Evelyn Choi
Arpandeep Khatua
Zhanghan Wang
Joy He-Yueya
Tharindu Cyril Weerasooriya
Wei Wei
Diyi Yang
Jure Leskovec
James Zou
Abstract
Large Language Models (LLMs) are increasingly used to simulate how specific users respond to any context, enabling more user-centric applications that rely on user feedback. However, existing user simulators mostly imitate surface-level pat- terns and language styles, which fails to reflect the underlying state of real users (e.g., beliefs, emo- tions). To address these limitations, we propose a novel training framework, HUMANLM, which builds user simulators that accurately reflect real users. Our key insight is, in addition to gener- ating responses, we generate natural-language latent states that align with the ground truth re- sponses through reinforcement learning. These latent states correspond to a set of state dimen- sions which psychologically lead to how real users respond. HUMANLM further synthesizes these aligned latent states into responses that accurately represent real users. For extensive evaluation, we develop HUMANUAL, a comprehensive benchmark on simulating real users based on public data. HUMANUAL consists of six large-scale datasets with 23k users and 227k responses in total. It spans diverse tasks such as generating user responses to daily life issues, political blogs, and chat sessions with LLM assistants. Across the datasets, HUMANLM significantly outperforms the best alternative approaches by an average relative improvement of 16.3% on alignment score from an LLM judge. In a real-time simulation study with 111 participants, HUMANLM achieves the highest scores on similarity with real user responses and humanlikeness.
Type