Decoding Data Exfiltration – Reversing XOR Encryption

By: Brian Hussey

One of the first and most important questions that intrusion analysts are asked after a network attack is “did they steal anything?”. And if so, “what did they take?”. Often, this is also one of the most challenging questions to answer when the analyst only has a post-intrusion forensic image to work with. Frequently, the analyst’s primary objective becomes identifying and locating data exfiltration files.

For those not familiar with the term, data exfiltration files are created by an attacker to contain stolen data on the victim box. It is basically a storage container that he later intends to transfer back to his own computer. Data exfiltration files may be a simple keylogger text file or HTML files concatenated by web scraping malware. However, they can also contain targeted company intelligence or entire SQL database dumps. Content varies as widely as the attacker’s imagination and end goals.

Although Data Exfil files could be anywhere on the system, in my experience, I often locate them in the following directories:

  • C:\Windows\system32
  • C:\Temp
  • C:\Documents and Settings\profilename\Local Settings\Temp

Once the data exfiltration files are located, they are often obfuscated, which further complicates the issue. The files could be encrypted with advanced algorithms such as Blowfish or AES-256, however, the most common type of encryption I see is much simpler; XOR. Hence, the purpose of this blog entry is to provide analysts with a technique to recover data from these types of files.

First, a very brief explanation of XOR (shortened from the term exclusive or): XOR is a bitwise operator that examines the individual bits of each character and compares them to the XOR key. If the two bits are identical, then the result is “0” if they are different, the result is “1”. Once this is run through every bit in a data exfiltration file, it results in a very effective scrambling of the data. Wikipedia provides an excellent resource for the mathematics behind this function but this is not necessary to complete the techniques I am discussing today. For the analyst’s purpose, we only need to know that the attacker has encrypted the data with an XOR key and we need to identify that key in order to recover the data.

So, I will begin with identifying a multi-character XOR key, which is the most common implementation that I see. The picture below shows a simple login page to Bank of America. This is the sample data exfiltration file that I will be using. It is saved in HTML format, typical of how malware would capture and save it.

Figure 1 - Bank of America Screenshot

The picture below shows the Bank of America HTML login page in hexadecimal format. The interface shown is from a program called Hexplorer, available for free from I highly recommend Hexplorer for this kind of work because there are very few other hex tools available that allow you to input multi-character XOR keys.

Figure 2 - Bank of America Hex

The next step I took was to encode the HTML file with a multi-character XOR key. I chose the word hidden. (Shown in the picture below).

Figure 3 - XOR Hexplorer

The result is shown in the picture below. It appears to be encrypted; some analysts may give up at this point because reversing encryption is extremely challenging. However, I encourage everyone to look deeper into the file, because decryption may be easier than expected.

Figure 4 - Scrambled XOR Hex

The key to understanding how to decode this file lies in the hex 0x20 character. Hex 0x20 is equivalent to the space bar in ASCII and is represented only as blank space. When an XOR key is applied, the hex 0x20 characters will always return with a true (0) value. Hence, when multiple hex 0x20 characters exist in a string, they will actually reproduce the XOR key. It is often difficult to find, but careful examination of the entire file may show you the key sitting plainly in the hex. The picture below again shows the encoded Bank of America file, however, this is a different part of the file, where more blank space existed. Notice the terms highlighted in red.

Figure 5 - XOR'd Hex showing key

The case-opposite (all caps) XOR key, hidden, is shown clearly. The analyst may now take this file and use Hexplorer to reverse the XOR encryption with the known key. The result will be the complete Bank of America HTML file.

It should be understood that this technique is often more difficult to implement because the key may not be a simple English word. It may be written in a different language and may even be comprised of random characters. This makes it more difficult to see the term in the encrypted file. The key is to locate multiple instances of the same repeating characters. Also, there are variable implementations of XOR that can complicate decryption as well. I have seen malware authors begin the XOR transmutation function at specific byte offsets within the data exfiltration file. In this case, the analyst must identify the offset, determine if it is a repeating function, and finally decrypt only the sections of the file that are actually XOR’d.

The technique is also highly dependent on the existence of hex 0x20 in the data exfiltration file. Keylogger and basic text files will likely have very little blank space, however, HTML, Word and database files may have quite a bit. Like most techniques, this can be employed on a case-by-case basis and will hopefully prove helpful in your investigations.

Identifying a single character XOR encryption key can be done using a similar technique. The picture below shows the Bank of America file encrypted with the key b. Frequently, because the space (0x20) is such a ubiquitous character, the most frequently occurring character in a file may be the actual XOR key. Additionally, there are other tools on the market that make single character identification much easier. Didier Stevens created XORSearch, which does an excellent job of identifying single character XOR, ROT and ROL keys. It can be found at:

Figure 6 - B XOR

It should also be noted that this technique is useful in extracting malware from antivirus quarantines. Frequently, intrusion analysts are only called in after a first responder already tried to fix the problem. The “fix” may include installing an antivirus program that captures and encrypts valuable evidence. If the only copy of malware that you need to analyze is located in a quarantine container, then consider what methodologies may have been used to lock them inside. For example, simply unzipping McAfee quarantine files with 7-zip and reversing the files with the XOR key j (hex 0x6A) will yield the original malware.

XOR encryption is used frequently, for both legitimate and illegal purposes; it is important for analysts to know that this encryption can be broken with minimal effort and the result may be very valuable to the investigation.

Comments are closed.