Emotet deep dive analysis
I recently did a deep dive analysis of Emotet and thought I would share the analysis I have done. I havent spent too much time on the macros/PowerShell used to download the malware as there are already plenty of resources available that have that covered. Using x32dbg I have broken down how the malware creates the seemingly random filenames for the malware, enumerates and encrypts the running processes, how the malware sets up it’s C2 connectivity and also how to extract the config.
MD5 of analysed sample:
Example email containing malicious Word document.
Word document with embedded macros, clicking ‘Enable Content’ will launch the macro content.
Macros launch encoded PowerShell command to download payload from list of compromised websites. The URL’s are often obfuscated using base64 and are relatively easy to decipher:
Payload being downloaded from compromised website and subsequent call to attacker C2:
Process tree listing. This shows PowerShell being used to download the malware to the User directory. Original filename is 215.exe, this is then copied to the malwares persistence location and renamed:
Unpacking the Malware
The malware uses a common technique of process hollowing to unpack itself in memory. Extracting the unpacked binary can be done by setting a breakpoint on VirtualAlloc in a debugger such as x32dbg. In this example the unpacked malware was then stored in a buffer [edi+54] at location 001d0115. The header was prepended with some junk code, once the unpacked binary has been dumped this needs removing using a tool such as HXD to create a clean PE header.
Location of unpacked binary in memory map:
Statically analysing the unpacked malware shows a single import as being present - IsProcessorFeaturePresent. This is because the malware functions are hashed and each library and its associated API’s are loaded dynamically when needed. The first two function calls in this sample contain the hashes for ntdll and kernel32.
Once the functions sub_92ACEC9 and sub_92BE17 have been called the deobfuscated API calls then become visible in x32dbg:
A closer look at these two functions shows how the malware dynamically loads its API calls. The first function resolves ntdll.dll and its associated API calls. This is done by moving the hash values into global variables and the following parameters being pushed onto the stack.
The hash value D22E2014 is moved into ECX, this is the hash value of ntdll.dll.
A call is then made to a function which traverses the PEB. The PEB contains information about the currently running processes including the list of DLL’s that have been loaded or mapped into the process memory. The FS register contains the address of the data structure called the ‘Thread Information Block’ (TIB) and a pointer to the PEB can be found in the TIB at the offset value of 0x30. Based on this information a pointer to the PEB can always be found at FS:.
D22E2014 is moved to the EBX register and a loop iterates through the name of each running process that has been identified from the TIB. The loop checks that each character of the process name is a lowercase character and the hashing routine is performed. Once completed the hashed value is stored in EAX and compared to EBX. When the two values match the malware knows it has found the process it was looking for in memory, in this case ntdll.
The next function then performs the same process, however the hash value is different as the malware is locating kernel32. The below image shows the hash has been generated and stored in EAX, this is then compared to the hash value in EBX. The values match so the malware has successfully located kernel32.
Once the DLL has been identified the malware needs to locate the API’s it wants to use.
Once the DLL has been identified the malware then reads the Export Address Table of the DLL and hashes each name and compares it to the hash values it has stored to check for matching hashes. Once a match has been found it has the location of the API calls it wants to use such as GetWindowsDirectoryW.
Malware identifies root directory of filesystem by calling GetWindowsDirectoryW.
Directory C:\Windows identified and then a call is made to GetVolumeInformationW.
This API call retrieves information about the file system and the Windows root directory. The above image shows the fouth parameter which is pushed onto the stack will retrieve the volume serial number.
Serial number of virtual machine that analysis was conducted on:
The returned value can be seen in the below image, however this is reversed due to endianess.
The malware then begins to create a mutex using this data. First a call is made to snwprintf which writes formatted data to a string, in the below image the format that will be used is ‘Global\I%X’, where ‘%X’ will be the volume information.
Once the format of the mutex name has been set, a call to CreateMutexW is then called. The newly created mutex, highlighted in green, is stored in the EAX register.
A second mutex is then created, however this one is formatted with ‘M%X’ instead of ‘I%X’.
The next function then calls CreateEventW using the format ‘E%X’ in conjunction with the serial number.
The malware makes a call to GetModuleFileNameW, this returns the current location of where the malware is running from. In the example below the malware has been manually unpacked and is running from the desktop:
The malware decrypts a lists of strings that will be used to create a random filename for the malware.
Once the strings have been decrypted the malware then begins the process of generating a name using a combination of two strings.
A call is made to lstrlenw to calculate the length of the combined strings. This returns the hex value 177 to the EAX register and is then moved into ECX.
When 177 is converted from hex to decimal, the string length is 375 characters which can be seen in the screenshot below.
The mov command highlighted below shows data being moved into EAX, this is the volume serial number of the infected machine that was captured earlier for the mutex creation – EAA53FEC.
The hex value EA A5 3F EC taken from the serial number is then divided by the string length of 375, the remainder of this equation is stored in EDX.
The value in EDX is the hex value 62 which is 98 in decimal. The relevance of 98 is that the malware has now identified the first part of the filename by moving 98 places along the list of strings. The image below shows that is has landed in the string ‘loada’.
The malware then performs some checks to make sure it captures the entire string and the first part of the filename is enumerated.
The process is then repeated to get the second string which will make up the final part of the filename. In the previous routine a ‘NOT EAX’ command was called after ‘DIV ECX’, this will reverse the bits in EAX. This ensures that the string used for the second part of the filename will differ to the first. After this routine completes the second string identified is ‘tangent’.
Next the malware prepares the location of where the malware will persist. To do this a call is made to SHGetFolderPathW, the second parameter passed to this API call is the value 1C.
Microsoft’s documentation states that this parameter is the CSIDL value. A CSIDL value identifies the folder whose path is to be retrieved. The value 1C corresponds to CSIDL_LOCAL_APPDATA which relates to the filesystem location AppData\Local.
Once the folder location has been retrieved the malware then begins to setup the persistence location. The below image shows where the string loadatagent will be used as directory name and executable name. This is used in conjunction with the AppData location identified from the call to SHGetFolderPathW.
Malware creating persistence location in filesystem with call to CreateDirectoryW:
New process then created:
System Info & Process Enumeration
The malware captures information about the currently running operating system and also information on the architecture and processor by calling RtlGetVersion and GetNativeSystemInfo..
The PEB is then located, this allows the malware to enumerate all running processes from memory. The below image shows the PEB location being moved into EAX.
A call is then made to CreateToolHelp32Snapshot, the value 2 is pushed onto the stack which relates to then value ‘TH32CS_SNAPPROCESS’. This means that all processes will be included in the snapshot.
A call is then made to Process32FirstW, this retrieves information about the first process encountered in a system snapshot. Each process is enumerated and stored in memory:
Once the malware has captured information such as the hostname and a list of running processes. A unique identifier is created for the infected machine using the hostname and volume serial number.
This information is the compressed using the deflate algorithm:
A call to CryptGenKey creates an AES128 session key, this is used to encrypt the compressed data. A hash value is also generated by calling CryptCreateHash.
Location of AES Key:
Location of hash, this is stored below the list of plaintext process names:
Call made to CryptEncrypt which is passed the above key and hash. The data stored in ebp-4 highlighted below contains the deflate compressed data identified earlier.
The image below shows the data is now encrypted, this is highlighted in red. The data highlighted in green in is the hash value.
The AES key is then exported with a call to CryptExportKey which returns a key BLOB. The key BLOB is a secure way of sending sessionkeys over the internet and can then imported on the receiving side of the internet connection for decryption.
The image below shows the keys and BLOB type being pushed onto the stack. The parameter CRYPT_OAEP specifies that RSA encryption will be used when exporting the key BLOB.
The RSA encrypted session key, hash value and encrypted data is then Base64 encoded and ready to be sent to the attacker.
Next the malware begins to setup connectivity to the C2 infrastructure, this process begins with a routine that decrypts a string that is the formatting of an IP address:
First C2 then loaded:
A list of strings is then generated which will be used to make up the full URL path.
Before these strings are implemented the formatting of the URL is setup. The below image shows where two strings will be used to make up the URL, this will be the IP address and a string from the list that was generated:
Once the formatting has been implemented the strings are populated:
The malware then retrieves the User-Agent HTTP request header string that will be used.
Internet connectivity is then setup with calls to InternetOpenW, InternetConnectW, HttpOpenRequestW and HttpSendRequestW.
The lpOptional field highlighted above in HttpSendRequestW contains a handle to the data that is going to be sent to the C2. This is base64 encoded data that was identified earlier in this report.
The malware config that contains the IP addresses and port numbers of the attacker infrastructure is stored in plain text in the unpacked binary. From the analysis conducted the IP address 190[.]117[.]206[.]153 has been identified as belonging to the attackers C2 infrastructure. This address can be located in the binary by converting the numerical values of the IP address to hex values and searching for the hex pattern in the memory map. Note when searching for the IP addresses reverse the order of IP address due to endianness.
By right clicking on this pattern and following in the dump the malware config can be identified:
In the above image the first four hex values contain the IP address and the following two values are the port to be used. Following this data in the memory map shows that the config is stored in the data section of the malware.
This section can now be dumped and the C2’s can easily be extracted using the following python script written by CyberCDH:
Malware copied to the following location: