Self-Supervised Learning for Android Malware Detection on a Time-Stamped Dataset

Published: Tue, 28 Apr 2026 00:00:00 -0400

Summary

arXiv:2604.23025v1 Announce Type: new

Abstract: Android malware detectors built with machine learning often suffer from temporal bias: models are trained and evaluated without respecting apps' actual release times, inflating accuracy and weakening real-world robustness. We address this by constructing a time-stamped dataset of benign and malicious Android apps and introducing a timestamp-verification procedure to ensure temporal accuracy. We then propose a detection framework that uses Bootstrap Your Own Latent (BYOL) for self-supervised pre-training to learn obfuscation-resilient representations, followed by supervised classification. Under time-aware evaluation, the method attains 98% accuracy and 89% F1. We further characterize malware behavior by analyzing true positives and false negatives using VirusTotal and the MITRE ATT&CK framework. To support reproducibility and further innovation, we release our dataset and source code.

Sources

Lyrie Verdict

Lyrie's autonomous defense layer flags this class of exposure the moment it surfaces — no signature update required.

#arxiv-cs-cr #research

Self-Supervised Learning for Android Malware Detection on a Time-Stamped Dataset

Summary

Sources

Lyrie Verdict

Validated sources