Biography

Hi! Nice to meet you! I am Chia-Hsuan (Michael) Lee, currently an applied researcher at Capital One AI Foundations team working on post-training for language models. I RL post-trained our first internal foundation model and actively publish research. I obtained my PhD from the University of Washington where I was advised by Prof. Mari Ostendorf in the Natural Language Processing Group. I also worked with Prof. Noah A. Smith.

I was a research intern at Google Brain (2022), co-hosted by Ankur Bapna and Yu Zhang; interned at Google Research (2021), hosted by Melvin Johnson; interned at Microsoft Research NLP group (2020), co-hosted by Matthew Richardson and Alex Polozov.

My research foucs on reinforcement learning post-training for large language models — improving how models reason and act, not just the answers they produce. My work spans:

  • Fine-Grained Rewards for GRPO: I develop reinforcement learning methods (GRPO / OPD) that improve reasoning capability, including a drift-aware training method that leverages segment-level rewards to raise accuracy while reducing overthinking DASH. I also proposed CGD (ICML), a framework that integrates teacher-generated explanatory critiques and refined responses into SFT.
  • Competence-Aware On-Policy Distillation: I propose SEAD , a new on-policy distillation (OPD) framework that leverages both student and teacher entropy to achieve more effective and efficient OPD.
  • Preference Optimization / Alignment: I lead the DPO workstream for in-house instruction-tuned LMs — spanning data collection/synthesis, training, and evaluation. I also study data-efficient preference learning scaling law, showing that a small, carefully selected set of high-quality preference pairs can match datasets several times larger

Earlier work explored complementary directions in language modeling: long-context pretraining (DOCmT5, NAACL 2022), in-context learning (IC-DST, EMNLP 2022) and prompt-tuning (SDP-DST, EMNLP 2021), inference-time routing across models (OrchestraLLM, NAACL 2024), and LLM-free self-correction for small models (CorrectionLM).

Here is my CV

You can find me at: chiahsuan.li [at] gmail [dot] com

Highlights

  • 07/2026: Our paper on reducing overthinking with fine-grained structure-aware rewards (DASH) is out on arxiv.
  • 06/2026: Our paper on efficient on policy distillation via joint entropy (SEAD) is out on arxiv.
  • 04/2026: Our paper on scaling laws of reasoning preference optimization (Decomposing the Delta) is out on arxiv.
  • 04/2026: Our paper on Reasoning LM critique-guided distillation (CGD) is accepted to ICML 2026.
  • 09/2024: I have joined Capital One as an applied researcher.
  • 08/2024: I have passed my defense and officially a PhD!