Hi! This is Shengqi Zhu.
"He just like me fr fr"
(Source: SpecGram)

I am currently a Ph.D. student at Cornell Information Science. My Ph.D. odyssey started in 2023, and I have been supported by my great advisors, David Mimno and Jeff Rzeszotarski.

Most recently, I am excited to start my research internship at Microsoft Office of Applied Research, in collaboration with Bahar Sarrafzadeh and Sheshera Mysore!

Before Cornell, I completed an M.Sc. in Computational Linguistics at University of Washington, after obtaining a B.Sc. in Intelligence Sciences at Peking University. I was especially fortunate to have worked with Noah A. Smith, Shane ST, and Yansong Feng.   (Click here to add some weird links)

Cornell InfoSci page (with my email address)
Google Scholar Semantic Scholar ACL Anthology ORCID

What am I working on?

Described from a high level, my research is about natural language data. My research uses computational methods to explore how human languages shape the world and get shaped by emerging technologies like Large Language Models at the same time. I seek systematic, data-centered approaches throughout the lifecycle of language data, from how they were collected and annotated in the upstream to how they are forged into downstream social interactions and consensus.

Most recently, my work has focused on natural language as interface. This refers to both contexts of (1) Human-Human Interaction: How has language use led to (in)effective communications between groups, especially in the scientific research context? and (2) Human-AI/LLM Interaction: How do we discover and describe users' behaviors, perception, and interaction modes from (large-scale) real-world user-LLM conversations? You can find more information in my publications.

Recent Publications & Projects (view all )

Priming, Path-dependence, and Plasticity: Understanding the molding of user-LLM interaction and its implications from (many) chat logs in the wild

Shengqi Zhu, Jeffrey M. Rzeszotarski, David Mimno

ArXiv Preprint

We explore a new approach to studying real-world user-LLM interactions through large-scale chat logs from the wild. With 140K sessions from 7,955 global users, we highlight (1) interaction patterns (expressions) rapidly formed and molded; (2) longitudinal outcomes (text patterns, retention rates) predicted by early exploration; (3) parallel dynamics besides rapid molding (task stratification, reacting to model-version updates). These draw an "agency paradox": despite LLM input spaces being unconstrained and user-driven, we in fact see less user exploration.

[Preprint]

Show or Tell? Modeling the evolution of request-making in Human-LLM conversations

Shengqi Zhu, Jeffrey M. Rzeszotarski, David Mimno

Findings of the European Chapter of the Association for Computational Linguistics (EACL), 2026

We set up a brand new framework to examine the language(expressions) of LLM users apart from the specific task content, modeling how people contextualize their requests within the conversational format. From there, we study the diachronic evolution of user behaviors through text, a novel and crucial indicator of human-LLM interactions.

[Paper]

Data Paradigms in the Era of LLMs: On the Opportunities and Challenges of Qualitative Data in the WILD

Shengqi Zhu, Jeffrey M. Rzeszotarski, David Mimno

Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI LBW), 2025

We discuss the current and future affordances of using large-scale in-the-wild user activities as a source of qualitative user data, and highlight the major challenges remaining -- finer-grain control and more ethical data practices.

[Paper]

All publications

Misc