From Tokens to Latent States: Leveraging Pre-trained Language Models for Improving Partially Observable Reinforcement Learning

Meiju Li; Ruixiang Sun; Xin Li; Mingzhong Wang

doi:10.1609/aaai.v40i27.39465

Back

From Tokens to Latent States: Leveraging Pre-trained Language Models for Improving Partially Observable Reinforcement Learning

Conference paper

Peer reviewed

From Tokens to Latent States: Leveraging Pre-trained Language Models for Improving Partially Observable Reinforcement Learning

Meiju Li, Ruixiang Sun, Xin Li and Mingzhong Wang

Proceedings of the AAAI Conference on Artificial Intelligence, Vol.40(27), pp.23003-23011

Association for the Advancement of Artificial Intelligence (AAAI) Conference on Artificial Intelligence, 40th (Singapore, 20-Jan-2026–27-Jan-2026)

AAAI Press

2026

DOI: https://doi.org/10.1609/aaai.v40i27.39465

Files and links (1)

url

https://doi.org/10.1609/aaai.v40i27.39465View

Published Version Open

Abstract

Partially observable Markov decision processes (POMDPs) present significant challenges for reinforcement learning, as agents must learn optimal policies while maintaining belief states over unobserved environment states based on partial observations. We observe a compelling analogy: large language models (LLMs) autoregressively generate token probability distributions based on preceding context, mirroring how belief states are maintained and updated in POMDPs. This insight motivates leveraging the rich prior knowledge embedded in pre-trained LLMs for latent states estimation from observation-action histories. However, two critical challenges emerge: on the one hand, modality misalignment prevents LLMs from directly encoding visual observations and discrete actions; on the other hand, semantic misalignment exists between observation-action sequences and token sequences. To address these challenges, we introduce a novel framework ELSLLM that employs a Johnson-Lindenstrauss projection (JLP) module to transform input dimensions while preserving state similarity with theoretical guarantees, and utilizes modern Hopfield networks (MHN) to store all word embeddings from pre-trained LLMs as a knowledge repository. Through retrieval and querying mechanisms, ELSLLM achieves token-level knowledge alignment without requiring fine-tuning of the pre-trained LLMs. Extensive experiments on partially observable environments demonstrate that ELSLLM achieves state-of-the-art performance, significantly outperforming baseline methods with and without LSTM memory mechanisms. Our work opens new avenues for integrating pre-trained LLMs with reinforcement learning in partially observable settings.

Details

Title: From Tokens to Latent States: Leveraging Pre-trained Language Models for Improving Partially Observable Reinforcement Learning
Authors: Meiju Li - Beijing Institute of Technology
Ruixiang Sun - Beijing Institute of Technology
Xin Li (Corresponding Author) - Beijing Institute of Technology
Mingzhong Wang - University of the Sunshine Coast
Publication details: Proceedings of the AAAI Conference on Artificial Intelligence, Vol.40(27), pp.23003-23011
Conference details: Association for the Advancement of Artificial Intelligence (AAAI) Conference on Artificial Intelligence, 40th (Singapore, 20-Jan-2026–27-Jan-2026)
Publisher: AAAI Press
Date published: 2026
DOI: 10.1609/aaai.v40i27.39465
ISSN: 2374-3468
Grant note: This work was partially supported by the NSFC under Grants 92270125 and 62276024; by the Fundamental Research Funds for the Central Universities, JLU, under Grant 93K172025K01; and by the Fundamental Research Funds for the Central Universities under Grant 2025CX01010.
Organisation Unit: Healthy Ageing Research Cluster; School of Science, Technology and Engineering
Language: English
Record Identifier: 991219346802621
Output Type: Conference paper

Metrics

1 Record Views