Abstract
Forecasting of time series is vital in multiple sectors especially where precise predictions support decision-making. Transformer-Based models are widely favoured for their efficient self-attention mechanisms that effectively capture dependencies. Nonetheless, their quadratic complexity constrains scalability for extended sequences, resulting in the creation of sparse attention and external memory methods. To address these issues, we propose the Learnable Temporal Sparse Memory iTransformer (LTSMiTransformer), which integrates: Learnable Temporal Sparse Attention which dynamically identifies relevant time steps to reduce computational overhead; Memory-Augmented Module for capturing long-term dependencies without excessive memory consumption; and a Unified Embedding Strategy that enhances feature representation across heterogeneous datasets. Extensive experiments demonstrate that LTSMiTransformer achieves state-of-the-art accuracy, particularly in long-horizon settings, while maintaining computational efficiency. Our analysis highlights its robustness to periodic patterns, trend shifts, and cross-domain adaptation. We also discuss limitations (e.g., hyper parameter sensitivity) and provide actionable insights for future work Codes available on GitHub.