Abstract
Effective disaster response relies on situation awareness which is the ability to accurately perceive and interpret evolving threats through visual data analysis. Disasters exhibit unique characteristics requiring robust classification models that discern subtle differences. While Vision Transformers (ViTs) show potential, current centralized architectures process all disasters through a single model, limiting disaster-specific pattern recognition. This paper proposes an approach termed DSViT which is a distributed, disaster-specific Vision Transformer framework that dynamically adapts to different disaster types via specialized auxiliary branches. Each branch is tailored to a specific disaster category, utilizing optimized patch embedding sizes and class-token guided attention masking to prioritize relevant spatial regions. Knowledge from these branches is transferred to a primary ViT through an attention loss mechanism, enabling the model to integrate both general and disaster-specific features. The distributed architecture further supports region-specific parameter updates, enhancing adaptability to localized disaster conditions. Experiments show DSViT's superiority, achieving 96.84% accuracy, which is a significant improvement over centralized ViT baselines, with consistent gains across all disaster categories, demonstrating effective context-aware feature capture.