I am currently a Ph.D. student at Cornell Information Science. My Ph.D. odyssey started in 2023, and I have been supported by my great advisors, David Mimno and Jeff Rzeszotarski.
Before Cornell, I completed an M.Sc. in Computational Linguistics at University of Washington, after obtaining a B.Sc. in Intelligence Sciences at Peking University. I was especially fortunate to have worked with Noah A. Smith, Shane ST, and Yansong Feng.   (Click here to add some weird links)
Described from a high level, my research is about natural language data. My research uses computational methods to explore how human languages shape the world and get shaped by emerging technologies like Large Language Models at the same time. I seek systematic, data-centered approaches throughout the lifecycle of language data, from how they were collected and annotated in the upstream to how they are forged into downstream social interactions and consensus.
Most recently, my work has focused on natural language as interface. This refers to both contexts of (1) Human-Human Interaction: How has language use led to (in)effective communications between groups, especially in the scientific research context? and (2) Human-AI/LLM Interaction: How do we discover and describe users' behaviors, perception, and interaction modes from (large-scale) real-world user-LLM conversations? You can find more information in my publications.
Shengqi Zhu, Jeffrey M. Rzeszotarski, David Mimno
arXiv Preprint, August 2025
We set up a brand new framework to examine the language(expressions) of LLM users apart from the specific task content, modeling how people contextualize their requests within the conversational format. From there, we study the diachronic evolution of user behaviors through text, a novel and crucial indicator of human-LLM interactions.
Shengqi Zhu, Jeffrey M. Rzeszotarski, David Mimno
Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI LBW), 2025
We discuss the current and future affordances of using large-scale in-the-wild user activities as a source of qualitative user data, and highlight the major challenges remaining -- finer-grain control and more ethical data practices.
Shengqi Zhu, Jeffrey M. Rzeszotarski
Proceedings of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL), 2025
"Language models" is an evergreen and viral scientific term. What exact models have we used it to refer to? What are the scientific implications for the same term to mean "BERT/GPT-2" in 2019 but entirely different things now? Inspired by the Ship of Theseus, our work studies this Ship of LMs in detail. (image source: SRF Kultur Sternstunden)