Van Dai Do — PhD @ Deakin A2I2

About

My name is Van Dai Do. I build efficient & safe LLMs—reinforcement learning, activation steering with episodic memory, and retrieval-guided decoding. Additionally, I work on time-series forecasting, mostly with foundational models. Current Associate Postdoctoral Research Fellow at Deakin University’s Applied AI Institute (A2I2).

News

🔊 I am invited to present my ACL 2025 poster at Kingston AI Group Symposium 2026, held by Australian Institute for Machine Learning (AIML) at Adelaide!

Oct 2025

📖 I submitted my PhD thesis titled “Efficient and Safe Large Language Models with Reinforcement Learning” after 2 years and 5 months on this journey! Very grateful for the support of my great supervisors Dr. Hung Le and Professor Svetha Venkatesh.

Jan 2026

🎉1 paper accepted to TMLR with Journal to Conference (J2C) Certification!

Oct 2025

🎉1 paper accepted to ICDM 2025

Sep 2025

🎉 I won the 3-Minute Thesis (3MT) competition hosted by Deakin’s A2I2. What a great journey—grateful for the experience and everyone who supported me along the way.

Aug 2025

🎉1 paper accepted to EMNLP 2025

Aug 2025

🎉1 paper accepted to ACL 2025

May 2025

📖I started working as a Research Assistant under Dr. Hung Le's supervision on the Australian Research Council (ARC) DECRA grant to implement innovative time series forecasting models.

Feb 2025

🎉1 paper accepted to ECAI 2024

Jul 2024

📖Invited tutorial at AAMAS

Oct 2023

🛫I started my PhD journey at Applied Artificial Intelligence Initiative (A2I2), Deakin University, Australia with a full-ride HDR scholarship!

Aug 2023

Projects

Reasoning Under 1 Billion

Guiding Reinforcement Fine-tuning with Intrinsic External Episodic Memory reward.

LLMRFTMemory

TMLR 2025

Code →

ALEC — Alignment Learning with Episodic Control

Non-parametric inference-time alignment with episodic memory; sample-efficient alignment under sparse feedback.

AlignmentMemory

EMNLP 2025

DSEM — Dynamic Steering with Episodic Memory

Training-free token-level activation steering using episodic memory; adaptive alignment across safety & style.

LLMSteering

ACL 2025

Code → · Paper →

Prompting with Episodic Memory

RL-based prompt example selection from episodic memory to boost generalization across NLP tasks.

PromptingRL

ECAI 2024 (Oral)

Code →

Publications

Dynamic Steering With Episodic Memory For Large Language Models ACL 2025
Van Dai Do, Quan Tran, Svetha Venkatesh, Hung Le — PDF · Code

2025

Sample Efficient Alignment Learning With Episodic Control EMNLP 2025
Van Dai Do, Quan Tran, Ahmed Kirmani, Lu Zhang, Hung Le

2025

Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language Models TMLR 2025
Hung Le, Van Dai Do, Dung Nguyen, Svetha Venkatesh

2025

Accelerating Long-Term Molecular Dynamics with Physics-Informed Time-Series Forecasting ICDM 2025
Hung Le, Sherif Abbas, Minh Hoang Nguyen, Van Dai Do, Huu Hiep Nguyen, Dung Nguyen

2025

Large Language Models Prompting With Episodic Memory ECAI 2024 (Oral)
Van Dai Do, Quan Tran, Svetha Venkatesh, Hung Le — Code

2024

Deep RL for Multi-Hop Offloading in UAV-Assisted Edge Computing IEEE TVT 2023
Nguyen Tien Hoa, Van Dai Do, Le Hoang Lan, Nguyen Cong Luong, Duc Van Le, Dusit Niyato — Link

2023

Contact

Email: v.do@deakin.edu.au · Phone: +61 412 242 886

Visitors

Live view of where visitors are coming from.