Biography
Hi! Nice to meet you! I am Chia-Hsuan (Michael) Lee, currently an applied researcher at Capital One AI foundation team working on language model alignment. I obtained PhD from the University of Washington where I was advised by Prof. Mari Ostendorf in the Natural Language Processing Group. I also worked with Prof. Noah A. Smith.
I was a research intern at Google Brain (2022), co-hosted by Ankur Bapna and Yu Zhang; interned at Google Research (2021), hosted by Melvin Johnson; interned at Microsoft Research NLP group (2020), co-hosted by Matthew Richardson and Alex Polozov.
My primary research interests are language modeling and dialogue agents. More specifically:
- Language Modeling: I explore new pretraining strategies for long-context language models (DOCmT5 (NAACL 2022). I study new in-context learning method IC-DST (EMNLP 2022) and prompt-tuning method (SDP-DST (EMNLP 2021) for language models. I propose a routing framework to dynamically orchestra multiple language models during inference (OrchestraLLM (NAACL 2024)). I propose a correction framework that enables SLMs to self-correct using in-context exemplars without LLM involvement (CorrectionLM).
- Dialogue Agents: My focus has been on task-oriented dialogues (structured information extraction), encompassing task-oriented prompting (SDP-DST (EMNLP 2021), few-shot learning with LLM IC-DST (EMNLP 2022), and data synthesis from LLM-human interaction (DialGen).
I served on organizing committees of Multilingual Information Access Workshop and program committees of Structured and Unstructured Knowledge Integration Workshop. I review for EMNLP, ACL, NAACL, COLING, and ARR.
Here is my CV
You can find me at: chiahlee [at] uw [dot] edu
News
- 10/2024: Our paper on SLM self-correction (CorrectionLM) is out on arxiv.
- 09/2024: I have joined Capital One as an applied researcher.
- 08/2024: I have passed my defense and officially a PhD!
- 03/2024: “OrchestraLLM” is accepted to NAACL 2024 main conference.
- 11/2023: Our paper on orchestrating LLMs with a retrieval-based dynamic router, “OrchestraLLM” is out on arxiv.
- 07/2023: Our paper on collaborative human-LM framework for long conversation generation, “DialGen” is out on arxiv!
- 10/2022: Our paper on in-context learning for task-oriented dialogue “IC-DST” is accepted to EMNLP2022 Findings! “Twitter”
- 06/2022: Started the internship with Ankur Bapna and Yu Zhang at Google Brain Research, New York.
- 04/2022: Our paper on pretraining multilingual long-context language models “DOCmT5” is accepted to NAACL2022 Findings! “Twitter”
- 02/2022: Our workshop on Multilingual Information Access (MIA) will be held at NAACL2022 in Seattle! Call for Paper and Shared Task are out!