About

My name is Van Dai Do. I build efficient & safe LLMs—reinforcement learning, activation steering with episodic memory, and retrieval-guided decoding. Additionally, I work on time-series forecasting, mostly with foundational models. Current Associate Postdoctoral Research Fellow at Deakin University’s Applied AI Institute (A2I2).

News

🔊 I am invited to present my ACL 2025 poster at Kingston AI Group Symposium 2026, held by Australian Institute for Machine Learning (AIML) at Adelaide!
📖 I submitted my PhD thesis titled “Efficient and Safe Large Language Models with Reinforcement Learning” after 2 years and 5 months on this journey! Very grateful for the support of my great supervisors Dr. Hung Le and Professor Svetha Venkatesh.
🎉1 paper accepted to TMLR with Journal to Conference (J2C) Certification!
🎉1 paper accepted to ICDM 2025
🎉 I won the 3-Minute Thesis (3MT) competition hosted by Deakin’s A2I2. What a great journey—grateful for the experience and everyone who supported me along the way.
🎉1 paper accepted to EMNLP 2025
🎉1 paper accepted to ACL 2025
📖I started working as a Research Assistant under Dr. Hung Le's supervision on the Australian Research Council (ARC) DECRA grant to implement innovative time series forecasting models.
🎉1 paper accepted to ECAI 2024
📖Invited tutorial at AAMAS
🛫I started my PhD journey at Applied Artificial Intelligence Initiative (A2I2), Deakin University, Australia with a full-ride HDR scholarship!

Projects

Reasoning Under 1 Billion

Guiding Reinforcement Fine-tuning with Intrinsic External Episodic Memory reward.

LLMRFTMemory

TMLR 2025

Code →

ALEC — Alignment Learning with Episodic Control

Non-parametric inference-time alignment with episodic memory; sample-efficient alignment under sparse feedback.

AlignmentMemory

EMNLP 2025

DSEM — Dynamic Steering with Episodic Memory

Training-free token-level activation steering using episodic memory; adaptive alignment across safety & style.

LLMSteering

ACL 2025

Code → · Paper →

Prompting with Episodic Memory

RL-based prompt example selection from episodic memory to boost generalization across NLP tasks.

PromptingRL

ECAI 2024 (Oral)

Code →

Publications

Dynamic Steering With Episodic Memory For Large Language Models ACL 2025
Van Dai Do, Quan Tran, Svetha Venkatesh, Hung Le — PDF · Code
Sample Efficient Alignment Learning With Episodic Control EMNLP 2025
Van Dai Do, Quan Tran, Ahmed Kirmani, Lu Zhang, Hung Le
Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language Models TMLR 2025
Hung Le, Van Dai Do, Dung Nguyen, Svetha Venkatesh
Accelerating Long-Term Molecular Dynamics with Physics-Informed Time-Series Forecasting ICDM 2025
Hung Le, Sherif Abbas, Minh Hoang Nguyen, Van Dai Do, Huu Hiep Nguyen, Dung Nguyen
Large Language Models Prompting With Episodic Memory ECAI 2024 (Oral)
Van Dai Do, Quan Tran, Svetha Venkatesh, Hung Le — Code
Deep RL for Multi-Hop Offloading in UAV-Assisted Edge Computing IEEE TVT 2023
Nguyen Tien Hoa, Van Dai Do, Le Hoang Lan, Nguyen Cong Luong, Duc Van Le, Dusit Niyato — Link

Contact

Email: v.do@deakin.edu.au · Phone: +61 412 242 886

Visitors

Live view of where visitors are coming from.