Conference paper
From Tokens to Latent States: Leveraging Pre-trained Language Models for Improving Partially Observable Reinforcement Learning
Proceedings of the AAAI Conference on Artificial Intelligence, Vol.40(27), pp.23003-23011
Association for the Advancement of Artificial Intelligence (AAAI) Conference on Artificial Intelligence, 40th (Singapore, 20-Jan-2026–27-Jan-2026)
AAAI Press
2026
Abstract
Partially observable Markov decision processes (POMDPs) present significant challenges for reinforcement learning, as agents must learn optimal policies while maintaining belief states over unobserved environment states based on partial observations. We observe a compelling analogy: large language models (LLMs) autoregressively generate token probability distributions based on preceding context, mirroring how belief states are maintained and updated in POMDPs. This insight motivates leveraging the rich prior knowledge embedded in pre-trained LLMs for latent states estimation from observation-action histories. However, two critical challenges emerge: on the one hand, modality misalignment prevents LLMs from directly encoding visual observations and discrete actions; on the other hand, semantic misalignment exists between observation-action sequences and token sequences. To address these challenges, we introduce a novel framework ELSLLM that employs a Johnson-Lindenstrauss projection (JLP) module to transform input dimensions while preserving state similarity with theoretical guarantees, and utilizes modern Hopfield networks (MHN) to store all word embeddings from pre-trained LLMs as a knowledge repository. Through retrieval and querying mechanisms, ELSLLM achieves token-level knowledge alignment without requiring fine-tuning of the pre-trained LLMs. Extensive experiments on partially observable environments demonstrate that ELSLLM achieves state-of-the-art performance, significantly outperforming baseline methods with and without LSTM memory mechanisms. Our work opens new avenues for integrating pre-trained LLMs with reinforcement learning in partially observable settings.
Details
- Title
- From Tokens to Latent States: Leveraging Pre-trained Language Models for Improving Partially Observable Reinforcement Learning
- Authors
- Meiju Li - Beijing Institute of TechnologyRuixiang Sun - Beijing Institute of TechnologyXin Li (Corresponding Author) - Beijing Institute of TechnologyMingzhong Wang - University of the Sunshine Coast
- Publication details
- Proceedings of the AAAI Conference on Artificial Intelligence, Vol.40(27), pp.23003-23011
- Conference details
- Association for the Advancement of Artificial Intelligence (AAAI) Conference on Artificial Intelligence, 40th (Singapore, 20-Jan-2026–27-Jan-2026)
- Publisher
- AAAI Press
- Date published
- 2026
- DOI
- 10.1609/aaai.v40i27.39465
- ISSN
- 2374-3468
- Grant note
- This work was partially supported by the NSFC under Grants 92270125 and 62276024; by the Fundamental Research Funds for the Central Universities, JLU, under Grant 93K172025K01; and by the Fundamental Research Funds for the Central Universities under Grant 2025CX01010.
- Organisation Unit
- Healthy Ageing Research Cluster; School of Science, Technology and Engineering
- Language
- English
- Record Identifier
- 991219346802621
- Output Type
- Conference paper
Metrics
1 Record Views