This is the fourth in an ongoing series of blogs focused on AI/ML.
Malware detection is an important part of the Netskope Security Cloud platform, complete with a secure access service edge (SASE) architecture, that we provide to our customers. Malware is malicious software that is designed to harm or exploit devices and computer systems. Various types of malware, such as viruses, worms, Trojan horses, ransomware, and spyware, remain a serious problem for corporations and government agencies. Traditional malware detection systems rely on anti-virus signatures, heuristics, and behavior patterns in sandboxes, which require a significant amount of manual analysis from security analysts and researchers. With new attacks and variants emerging every day, it is hard for organizations to keep pace with malware threats. In comparison, artificial intelligence (AI) and machine learning (ML) has the potential to detect unknown and zero-day malware by automatically learning the malware patterns based on large volumes of historical data. This unique capability has made AI/ML an indispensable part of a modern malware detection solution, complementing heuristic and signature-based approaches.
At Netskope, we have developed a comprehensive, multi-layered threat protection system to scan our customers’ network traffic. AI/ML is used to power multiple engines in the inline fast scan, as well as static and dynamic analysis-based deep scan. In this blog post, we will highlight three of them:
- Inline PE Classifier
- MS Office Classifier
- Cloud Sandbox
Inline PE Classifier
The Portable Executable (PE) file format is used by Windows executables, object code, and dynamic link libraries (DLLs). It’s one of the most common malware file formats. To stop malicious PE files in real-time, we have developed the inline PE classifier. Trained with millions of malicious and benign PE samples, the ML-based classifier is able to identify malware patterns in raw bytes. The classifier doesn’t need to parse a PE file and extract features based on domain knowledge. Therefore, it’s lightweight, fast, and suitable for inline predictions.
The inline PE classifier complements the signature-based malware engines in fast scan. Since its launch, the classifier has detected unique malware samples that were undetectable to signature-based inline engines, without introducing any new false positives. Its runtime in production is just a few milliseconds.
This high efficacy ML classifier enables faster time to detection for unique detections that can be blocked inline and complements the dynamic analysis with advanced forensics in the Advanced Threat Protection engines.
MS Office Classifier
Microsoft Office documents are another common source of malware. As part of Netskope’s Advanced Threat Protection, the Office Classifier is designed to leverage a combination of heuristics and supervised machine learning to identify malicious code embedded in Office documents. The Office Classifier performs static analysis and extracts detailed information about the components in an Office file, including embedded macros (VBA), dynamic data exchange (DDE), and other jpg/mpeg or EXE/PE files. The extracted information is then mapped to hundreds of features to train ML classification models and predict whether a new Office document is malicious or not.
The Office Classifier provides proactive coverage against zero-day malware attacks that can evade signature-based detections. For example, the Office Classifier has detected downloads of multiple zero-day Emotet samples distributed as Office document files targeting multiple Netskope customers (see screenshot below). The Emotet samples used multi-layered obfuscation techniques to bypass signature-based AV software but were detected by the Office Classifier. Recently, the Office Classifier also detected a new set of malicious Office documents that use VBA and LoLbins.
Cloud Sandbox
Sandbox has been proven to be an effective way to detect advanced malware. The Cloud Sandbox is enhanced with a machine learning engine in Netskope’s Advanced Threat Protection system. The Cloud Sandbox collects sample behaviors by executing them in an isolated Windows environment. The report of observed behaviors can then be used for heuristics and ML-based malware detection. Each report contains runtime behavior, such as process trees, where each tree node represents the behavior of a process, including API calls, dynamic link libraries (DLL), registry key activities, file activities, and network activities. We use deep learning transformer techniques to learn the tree structure and activities of the sandbox report and classify whether the file is malicious or not.
Summary
At Netskope, we have integrated AI/ML into our large-scale malware detection system to power multiple static and dynamic analysis engines. It is clear that AI/ML can identify unknown malware with great precision and complement other signature and heuristic engines. There are technical challenges associated with AI/ML, including high accuracy and low latency requirements, changing malware patterns, and model interpretability. We are addressing these challenges to reach AI/ML’s full potential in malware detection.