Biography

Hi! Nice to meet you! I am Chia-Hsuan (Michael) Lee, currently an applied researcher at Capital One AI Foundations team working on post-training for language models. I obtained my PhD from the University of Washington where I was advised by Prof. Mari Ostendorf in the Natural Language Processing Group. I also worked with Prof. Noah A. Smith.

I was a research intern at Google Brain (2022), co-hosted by Ankur Bapna and Yu Zhang; interned at Google Research (2021), hosted by Melvin Johnson; interned at Microsoft Research NLP group (2020), co-hosted by Matthew Richardson and Alex Polozov.

My primary research interests are language modeling and dialogue agents. More specifically:

Language Modeling:
I build the first in-house Reasoning LM for Capital One with reasoning preference optimization (RPO).
I also lead the direction preference optimization (DPO) workstream for in-house instruction-tuned LMs, inlcuding data collection/synthesis, model training and evaluation.
I propose a framework that integrates teacher-generated explanatory critiques and refined responses into the SFT process (CGD.
I explore new pretraining strategies for long-context language models (DOCmT5 (NAACL 2022). I study new in-context learning method IC-DST (EMNLP 2022) and prompt-tuning method (SDP-DST (EMNLP 2021) for language models. I propose a routing framework to dynamically orchestrate multiple language models during inference (OrchestraLLM (NAACL 2024)). I propose a correction framework that enables SLMs to self-correct using in-context exemplars without LLM involvement (CorrectionLM).
Dialogue Agents: My focus has been on task-oriented dialogues (structured information extraction), encompassing task-oriented prompting (SDP-DST (EMNLP 2021), few-shot learning with LLM IC-DST (EMNLP 2022), and data synthesis from LLM-human interaction (DialGen).

I served as an area chair for ACL ARR. I also serve on organizing committees of Multilingual Information Access Workshop. I review for NeurIPS, ICLR, ACL, EMNLP, NAACL.

Here is my CV

You can find me at: chiahsuan.li [at] gmail [dot] com

Highlights

05/2025: Our paper on Reasoning LM critique-guided distillation (CGD) is out on arxiv.
10/2024: Our paper on SLM self-correction (CorrectionLM) is out on arxiv.
09/2024: I have joined Capital One as an applied researcher.
08/2024: I have passed my defense and officially a PhD!
03/2024: “OrchestraLLM” is accepted to NAACL 2024 main conference.
10/2022: Our paper on in-context learning for task-oriented dialogue “IC-DST” is accepted to EMNLP2022 Findings! “Twitter”
04/2022: Our paper on pretraining multilingual long-context language models “DOCmT5” is accepted to NAACL2022 Findings! “Twitter”