Abstract
Interpretability in multivariate time series classification (MTSC) remains a fundamental challenge, as state-of-the-art deep learning models often operate as "black boxes" due to their intricate architectures. While high accuracy is essential, transparency is equally critical in high-stakes domains like healthcare, where understanding variable contributions and temporal patterns enables more informed decision-making and actionable insights. Existing interpretability methods primarily focus on identifying key time points while often neglecting variable-specific contributions, limiting their ability to offer holistic insights into complex temporal data. To address these limitations, we propose HITS (Hierarchical Interpretable Time Series Classification via Multiple Instance Learning), a novel framework that integrates a hierarchical multiple instance learning (MIL) approach with a dual attention mechanism. Our approach treats each variable-time pair as an instance and models the entire time series as a bag of instances, leveraging MIL-based aggregation and self-attention mechanisms to dynamically capture both variable-level and temporal dependencies. In particular, variable-level attention identifies the importance of individual variables at each time step, while temporal attention captures the sequential significance of timestamps. This hierarchical structure enables HITS to provide fine-grained interpretability, generating actionable insights into variable-time pair contributions. Extensive evaluations on benchmark datasets demonstrate that HITS is comparable with state-of-the-art methods in both accuracy and better interpretability, providing a transparent and robust solution for MTSC tasks.