Reasoning Under 1 Billion
Guiding Reinforcement Fine-tuning with Intrinsic External Episodic Memory reward.
TMLR 2025
My name is Van Dai Do. I build efficient & safe LLMs—reinforcement learning, activation steering with episodic memory, and retrieval-guided decoding. Additionally, I work on time-series forecasting, mostly with foundational models. Current Associate Postdoctoral Research Fellow at Deakin University’s Applied AI Institute (A2I2).
Guiding Reinforcement Fine-tuning with Intrinsic External Episodic Memory reward.
TMLR 2025
Non-parametric inference-time alignment with episodic memory; sample-efficient alignment under sparse feedback.
EMNLP 2025
Training-free token-level activation steering using episodic memory; adaptive alignment across safety & style.
ACL 2025
RL-based prompt example selection from episodic memory to boost generalization across NLP tasks.
ECAI 2024 (Oral)
Email: v.do@deakin.edu.au · Phone: +61 412 242 886
Live view of where visitors are coming from.