Detecting Obfuscated Command-lines with a Large Language Model

In the security industry, there is a constant, undeniable fact that practitioners must contend with: criminals are working overtime to constantly change the threat landscape to their advantage. Their techniques are many, and they go out of their way to avoid detection and obfuscate their actions. In fact, one element of obfuscation – command-line obfuscation – is the process of intentionally disguising command-lines, which hinders automated detection and seeks to hide the true intention of the adversary’s scripts.

Types of Obfuscation

There are a few tools publicly available on GitHub that give us a glimpse of what techniques are used by adversaries. One of such tools is Invoke-Obfuscation, a PowerShell script that aims to help defenders simulate obfuscated payloads. After analyzing some of the examples in Invoke-Obfuscation, we identified different levels of the technique:

Each of the colors in the image represents a different technique, and while there are various types of obfuscation, they’re not changing the overall functionality of the command. In the simplest form, Light obfuscation changes the case of the letters on the command line; and Medium generates a sequence of concatenated strings with added characters “`” and “^” which are generally ignored by the command line. In addition to the previous techniques, it is possible to reorder the arguments on the command-line as seen on the Heavy example, by using the {} syntax specify the order of execution. Lastly, the Ultra level of obfuscation uses Base64 encoded commands, and by using Base8*8 can avoid a large number EDR detections.

In the wild, this is what an un-obfuscated command-line would look like:

One of the simplest, and least noticeable techniques an adversary could use, is changing the case of the letters on the command-line, which is what the previously mentioned ‘Light’ technique demonstrated:

The insertion of characters that are ignored by the command-line such as the ` (tick symbol) or ^ (caret symbol), which was previously mentioned in the ‘Medium’ technique, would look like this in the wild:

In our examples, the command silently installs software from the website evil.com. The technique used in this case is especially stealthy, since it is using software that is benign by itself and already pre-installed on any computer running the Windows operating system.

Don’t Ignore the Warning Signs, Inspect Obfuscated Elements Quickly

The presence of obfuscation techniques on the command-line often serves as a strong indication of suspicious (almost always malicious) activity. While in some scenario’s obfuscation may have a valid use-case, such as using credentials on the command-line (although this is a very bad idea), threat actors use these techniques to hide their malicious intent. The Gamarue and Raspberry Robin malware campaigns commonly used this technique to avoid detection by traditional EDR products. This is why it’s essential to detect obfuscation techniques as quickly as possible and act on them.

Using Large Language Models (LLMs) to detect obfuscation

We created an obfuscation detector using large language models as the solution to the constantly evolving state of obfuscation techniques. These models consist of two distinct parts: the tokenizer and the language model.

The tokenizer augments the command lines and transforms them into a low-dimensional representation without losing information about the underlying obfuscation technique. In other words, the goal of the tokenizer is to separate the sentence or command-line into smaller pieces that are normalized, and the LLM can understand.

The tokens into which the command-line is separated are essentially a statistical representation of common combinations of characters. Therefore, the common combinations of letters get a “longer” token and the less common ones are represented as separate characters.

It is also important to keep the context of what tokens are commonly seen together, in the English language these are words and the syllables they are constructed from. This concept is represented by “##” in the world of natural language processing (NLP), which means if a syllable or token is a continuation of a word we prepend “##”. The best way to demonstrate this is to have a look at two examples; One of an English sentence that the common tokenizer won’t have a problem with, and the second with a malicious command line.

Since the command-line has a different structure than natural language it is necessary to train a custom tokenizer model for our use-case. Additionally, this custom tokenizer is going to be significantly better statistical representation of the command-line and is going to be splitting the input into much longer (more common) tokens.

For the second part of the detection model – the language model – the Electra model was chosen. This model is tiny when compared to other commonly used language models (~87% less trainable parameters compared to BERT), but is still able to learn the command line structure and detect previously unseen obfuscation techniques. The pre-training of the Electra model is performed on several benign command-line samples taken from telemetry, and then tokenized. During this phase, the model learns the relationships between the tokens and their “normal” combinations of tokens and their occurrences.

The next step for this model is to learn to differentiate between obfuscated and un-obfuscated samples, which is called the fine-tuning phase. During this phase we give the model true positive samples that were collected internally. However, there weren’t enough samples observed in the wild, so we also created a synthetic obfuscated dataset from benign command-line samples. During the fine-tuning phase, we give the Electra model both malicious and benign samples. By showing different samples, the model learns the underlying technique and notes that certain binaries have a higher probability of being obfuscated than others.

The resulting model achieves impressive results having 99% precision and recall.

As we looked through the results of our LLM-based obfuscation detector, we found a few new tricks known malware such as Raspberry Robin or Gamarue used. Raspberry Robin leveraged a heavily obfuscated command-line using wt.exe, that can only be found on the Windows 11 operating system. On the other hand, Gamarue leveraged a new method of encoding using unprintable characters. This was a rare technique, not commonly seen in reports or raw telemetries.

Raspberry Robin:

Gamarue:

The Electra model has helped us detect expected forms of obfuscation, as well as these new tricks used by the Gamarue, Raspberry Robin, and other malware families. In combination with the existing security events from the Cisco XDR portfolio, the script increases its detection fidelity.

Conclusion

There are many techniques out there that are used by adversaries to hide their intent and it is just a matter of time before we stumble upon something new. LLMs provide new possibilities to detect obfuscation techniques that generalize well and improve the accuracy of our detections in the XDR portfolio. Let’s stay vigilant and keep our networks safe using the Cisco XDR portfolio.

We’d love to hear what you think. Ask a Question, Comment Below, and Stay Connected with Cisco Security on social!

Cisco Security Social Channels

Instagram
Facebook
Twitter
LinkedIn