In this way, we introduce an additional layer of threat analysis interpretability, which helps to enhance the protection we provide to our users
As a defender, one of the challenges facing network security detection today is that we can determine whether a large number of observations on network communications, setting changes, website downloads, etc. represent malicious artifacts that cause fraud. Ransomware, And other attacks that affect our customers.
Bad actors are constantly researching ways to hide these artifacts, also known as the strategies, techniques, and procedures (TTP) used when attacking our customers. If they successfully hide their TTP, then they are more likely to achieve their goals. This challenge has led to an arms race in which bad actors continue to develop more sophisticated techniques to hide, while defenders look for new ways to detect them.
At Avast, we continue to invest in new methods of detecting malicious activity, even if they employ hidden technologies. One such analysis technique is commonly referred to as behavioral threat analysis. This article outlines some key aspects of how Avast performs this type of analysis.
Behavioral threat analysis can detect threats that would otherwise fall into the radar of threat analysis technologies that focus on static analysis of individual elements (such as processes, network connections, or executable files). A key element of threat analysis that supports the behavioral approach is the dynamic graph-based representation that is deployed on the client (such as a PC or mobile phone).
Each event, such as file execution or network communication, is represented in the graph as nodes connected by edges that represent relationships between events. For example, an executed file creates a process that can download some data from a specific IP or host name, then executes the process and therefore creates another process, etc., as shown in the following figure. Therefore, the graph represents a snapshot of behavior that can be observed during a certain period of time.
Because malware authors often use the so-called “land for a living” strategy to cover up other benign tools (such as Command line tool, The default command line interpreter of Windows), if you don’t understand how to use the tool in the attack phase (captured behavioral events represent the threat sequence), the analysis of the tool itself will not produce threat detection.That’s because each individual event may — In isolation — Although they pose a threat together, they seem to be harmless.
This is why the Avast Artificial Intelligence Research Laboratory has invested in the development and deployment of new technologies for representing, analyzing, and detecting malicious behavior. Thanks to Avast’s gigabit sensor network, we have the correct data to calibrate the machine learning model that can identify the fingerprints of malware behavior. A fingerprint is a graphical pattern with a relatively small size (usually up to 10 nodes) that can capture malicious activity without the benign behavioral noise that usually dominates data. Since the data is composed of millions of graphs every day, each graph is composed of thousands of nodes (events) and edges (relationships), so the complexity of this task is similar to finding a needle in a haystack.
Fortunately, we can use our threat intelligence system to filter, enrich and mark data into shapes suitable for training graph neural networks (Neural Networks), which helps us extract these fingerprints in an efficient and scalable way. Once the fingerprint of malicious behavior is generated, we can monitor whether the program on the protected user’s machine has a behavior that exactly or approximately matches the fingerprint. If the malicious behavior exactly matches the fingerprint, we can stop it before it causes any harm, and if it closely matches, we can decide whether to stop it based on the approximate match ratio. In both cases, the newly detected behavior can be submitted back to our cloud to increase our insight into malicious behavior and improve fingerprints.
GNNs have recently received increasing attention in the research and network security industry, and they have proven useful in many fields from social network analysis to chemoinformatics. The main advantage of graph neural networks is that they can process nodes recursively by using the relationship between nodes, thereby naturally capturing node attributes and graph structure. In our example, each node has hundreds of characteristics, such as registry value modification, string representation of process name, modification of memory allocated by other processes, or connection to certain services on the Internet.
Similarly, edges are marked by the type of relationship they represent; for example, the generation of processes or network connections. During training, the model learns the low-rank representation (ie embedding) of the node, which is useful for predicting various attributes of the behavior represented by the graph. In other words, we transform the problem into multi-task classification and train the model end-to-end from the graph to the multi-head output representing the threat intelligence we aim to capture.
Multi-head output is necessary to maximize the value obtained by the system and correctly represent the information our GNN has extracted from the hundreds of millions of behaviors it observes. Some of the responsible persons include:
- Severity head: Determine whether the behavior is classified as clean, malicious, PUP, etc.
- Type head: Determine the type of malicious behavior observed (e.g. Trojan, ransomware, coinminer)
- Strain head: Determine which malware strains are involved in the behavior
- ATT&CK head: Decide which MITRE ATT & CK Behavior involves technology
In addition, we research new technologies explain These complex models are designed to better understand the strategies adopted by different malware strains, and then extract powerful decision-making rules that can be easily used to protect our customers while also respecting their privacy. Continuing with this example, the explanation process allows us to filter out irrelevant behaviors and only identify fingerprints of malicious activities, as shown in bold and red in the figure below.
We can see that the essence of malicious activity is Process 2 Download one Executable file 1 from URL 1 And executing it results in Process 4The rest of the activities seem to be insignificant, so they do not need to be included in the fingerprint, which helps us to drastically reduce the resources required for detection and recovery.This type of analysis and optimization has proven to provide Avast with very valuable insights, including improving the protection of previously unknown bootloaders in the following ways MyKings malware.
Graph-based threat analysis is an indispensable part of Avast threat protection, which takes advantage of the latest developments in GNN. With this novel approach, we use deep neural networks to introduce an additional layer of threat analysis interpretability, which helps to enhance the protection we provide to more than 435 million users.