I am currently a Ph.D. student at Cornell Information Science. My Ph.D. odyssey started in 2023, and I have been supported by my great advisors, David Mimno and Jeff Rzeszotarski.
Most recently, I am excited to start my research internship at Microsoft Office of Applied Research, in collaboration with Bahar Sarrafzadeh and Sheshera Mysore!
Before Cornell, I completed an M.Sc. in Computational Linguistics at University of Washington, after obtaining a B.Sc. in Intelligence Sciences at Peking University. I was especially fortunate to have worked with Noah A. Smith, Shane ST, and Yansong Feng.   (Click here to add some weird links)
Described from a high level, my research is about natural language data. My research uses computational methods to explore how human languages shape the world and get shaped by emerging technologies like Large Language Models at the same time. I seek systematic, data-centered approaches throughout the lifecycle of language data, from how they were collected and annotated in the upstream to how they are forged into downstream social interactions and consensus.
Most recently, my work has focused on natural language as interface. This refers to both contexts of (1) Human-Human Interaction: How has language use led to (in)effective communications between groups, especially in the scientific research context? and (2) Human-AI/LLM Interaction: How do we discover and describe users' behaviors, perception, and interaction modes from (large-scale) real-world user-LLM conversations? You can find more information in my publications.
Shengqi Zhu, Jeffrey M. Rzeszotarski, David Mimno
ArXiv Preprint
We explore a new approach to studying real-world user-LLM interactions through large-scale chat logs from the wild. With 140K sessions from 7,955 global users, we highlight (1) interaction patterns (expressions) rapidly formed and molded; (2) longitudinal outcomes (text patterns, retention rates) predicted by early exploration; (3) parallel dynamics besides rapid molding (task stratification, reacting to model-version updates). These draw an "agency paradox": despite LLM input spaces being unconstrained and user-driven, we in fact see less user exploration.
Shengqi Zhu, Jeffrey M. Rzeszotarski, David Mimno
Findings of the European Chapter of the Association for Computational Linguistics (EACL), 2026
We set up a brand new framework to examine the language(expressions) of LLM users apart from the specific task content, modeling how people contextualize their requests within the conversational format. From there, we study the diachronic evolution of user behaviors through text, a novel and crucial indicator of human-LLM interactions.
Shengqi Zhu, Jeffrey M. Rzeszotarski, David Mimno
Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI LBW), 2025
We discuss the current and future affordances of using large-scale in-the-wild user activities as a source of qualitative user data, and highlight the major challenges remaining -- finer-grain control and more ethical data practices.