HUMANLM: Simulating Users with State Alignment Beats Response Imitation

Jan 1, 2026·

Shirley Wu

Evelyn Choi

Arpandeep Khatua

Zhanghan Wang

Joy He-Yueya

Tharindu Cyril Weerasooriya

Wei Wei

Diyi Yang

Jure Leskovec

James Zou

· 0 min read

URL

Abstract

Large Language Models (LLMs) are increasingly used to simulate how specific users respond to any context, enabling more user-centric applications that rely on user feedback. However, existing user simulators mostly imitate surface-level pat- terns and language styles, which fails to reflect the underlying state of real users (e.g., beliefs, emo- tions). To address these limitations, we propose a novel training framework, HUMANLM, which builds user simulators that accurately reflect real users. Our key insight is, in addition to gener- ating responses, we generate natural-language latent states that align with the ground truth re- sponses through reinforcement learning. These latent states correspond to a set of state dimen- sions which psychologically lead to how real users respond. HUMANLM further synthesizes these aligned latent states into responses that accurately represent real users. For extensive evaluation, we develop HUMANUAL, a comprehensive benchmark on simulating real users based on public data. HUMANUAL consists of six large-scale datasets with 23k users and 227k responses in total. It spans diverse tasks such as generating user responses to daily life issues, political blogs, and chat sessions with LLM assistants. Across the datasets, HUMANLM significantly outperforms the best alternative approaches by an average relative improvement of 16.3% on alignment score from an LLM judge. In a real-time simulation study with 111 participants, HUMANLM achieves the highest scores on similarity with real user responses and humanlikeness.

Type

Journal article

Last updated on Jan 1, 2026

Computer Science - Computation and Language Computer Science - Computers and Society Computer Science - Machine Learning

LPI-RIT at LeWiDi-2025: Improving Distributional Predictions via Metadata and Loss Reweighting with DisCo Nov 1, 2025 →