Logo image
BPGV: Behavioral provenance graph views to enhance anomaly detection
Journal article   Open access   Peer reviewed

BPGV: Behavioral provenance graph views to enhance anomaly detection

Michael Zipperle, Yu Zhang, Min Wang, Elizabeth Chang and Tharam Dillon
International Journal of Information Management Data Insights, Vol.6(1), pp.1-13
2026
pdf
1-s2.0-S2667096826000108-main2.43 MBDownloadView
Published Version Open Access CC BY-NC V4.0

Abstract

Anomaly detection Graph data analysis Graph summarization Intrusion detection systems Provenance graphs Subgraph extraction
Provenance-based Intrusion Detection Systems (PIDS) have shown potential in mitigating cyber threats in dynamic real-world environments. PIDS construct provenance graphs from audit logs to detect anomalous nodes, edges, or subgraph patterns. However, as provenance graphs grow in complexity, timely and accurately detecting anomalies becomes increasingly challenging, often resulting in higher false alarm rates. A key limitation of existing graph summarization techniques is their inadequate consideration of graph nodes’ context, leading to limited generalization abilities to capture unseen benign variants. Moreover, there is a lack of subgraph extraction techniques considering contextual information to extract subgraphs for various graph views, leading to reduced robustness and optimization due to the single-model anomaly detection approach. To address these shortcomings, we first present a taxonomy to systematically categorize and evaluate provenance graphs from the perspective of graph summarization, subgraph extraction, and graph representation. Second, we propose a Behavioral Provenance Graph View Anomaly Detection (BPGVAD) framework to detect behavioral anomalies, enabled by two key components: Behavioral Provenance Graph Summarization (BPGS) and Behavioral Provenance Graph Extraction (BPGE). The BPGS generalizes and summarizes nodes based on their context to capture unseen benign node variants. The BPGE extracts subgraphs from different graph views derived from BPGS to enable an optimized multi-model approach for anomaly detection. We evaluated the effectiveness of the BPGVAD framework using the DARAP OpTC dataset, and the results demonstrated improved anomaly detection performance with an accuracy of 99.332%, recall of 1, and a significantly low false alarm rate of 0.669%.

Details

Metrics

1 Record Views
Logo image