This post was originally published here by Chris Sanders.
Not all attacks require the use of malware, but most of them can be traced back to some form of unwanted malicious code executing on a trusted system. The files can be compiled executables, simple scripts, or even office documents hiding malicious macros. While these types of files are normal, experienced hunters know that you can hunt for compromises by examining the download and execution of specific file types known to be associated with malware. In this post, I’ll discuss a few techniques you can use to hunt on the network for file types that could be suspicious given the right context.
Suspicious File Types
The file types I’ll discuss in this post have perfectly legitimate uses, but they can also be linked to malicious activity. There are four categories to consider:
- Executables. These are programs designed to execute on specific operating system platforms. Because they’re compiled, understanding their purpose often requires running them in a sandbox, more complex behavioral analysis, or reverse engineering.
- Scripts. These are simpler programs that rely on the appropriate scripting engine to be present to execute. For instance, a Python interpreter must be installed on a system to execute a Python script. Some operating systems have native interpreters for specific script types, such as shell scripts on Linux and PowerShell scripts on Windows. These scripts are usually easy to decipher because they aren’t compiled, although they can be obfuscated using various techniques.
- Documents. Modern document formats provide a great deal of flexibility to include dynamic content. While this extends the functionality of a simple document, it also opens that functionality up to be abused by attackers. The adversary can manipulate common document types so that they can execute malicious code. This is commonly seen in Microsoft Office documents that contain macros and PDF files that can execute code. These features often require the user to click a button to allow the execution, but humans are susceptible to falling prey to cleverly crafted social engineering tricks.
- Archives. While an archive isn’t malicious on its own, attackers frequently archive their malicious code to make it look less suspicious or to evade content filtering or antivirus tools. For an archive file to be extracted, a tool that can interpret the archive format must exist on the target computer. Common formats are ZIP, CAB, and TAR because they are natively supported by various operating systems. Once extracted, you might be dealing with an executable or a script.
This list by no means encompasses every type of suspicious file you’ll encounter, but these are the most common manifestations of malicious code.
Hunting for Suspicious File Types on the Network
You can find evidence of files at multiple points in their lifetime. These observations usually occur when the file is being downloaded over the network, or when it’s being executed on the host.
Let’s first start by looking for suspicious file types being downloaded by hosts on our network. To do this, we’ll examine proxy logs and ask the question “Did any system on my network download any PDFs?”
Of course, a PDF in itself is not malicious, but it could harbor something malicious in the right context.
We can answer this question with a simple search using Sqrrl Query Language:
SELECT * FROM Sqrrl_ProxySG WHERE cs_uri_extension = ‘pdf’
This query returns all HTTP proxy records (Sqrrl_ProxySG) containing a file download identified by the PDF extension using the sc_uri_extension field.
There’s a problem with this method though. Just because a file has a PDF extension doesn’t mean it’s a PDF. More relevant, just because a file doesn’t have a PDF extension doesn’t mean it’s not a PDF. Attackers frequently change extensions of file types to avoid detection. All hope is not lost, however. By examining the header of a file or its structure you can make more accurate determinations about the content of a file.
This query also looks for PDF files, but it does so based on an analysis of the MIME type, which this proxy includes in its logging:
SELECT * FROM Sqrrl_ProxySG WHERE rs_content_type = ‘application/pdf’
This technique is also fallible, but is a bit more reliable than just relying on the file extension. Keep in mind that specific file types often have multiple MIME types associated with them based on platform or version.
You can combine queries like this to look for all sorts of combinations of file types. This query looks for all files with a Microsoft Excel (XSLT) extension or a Microsoft Word (application/msword) content type:
SELECT cs_host,cs_uri_stem,cs_method,cs_uri_extension,rs_content_type FROM Sqrrl_ProxySG WHERE cs_uri_extension = ‘xslt’ OR rs_content_type = ‘application/msword’
Notice that we’ve discovered a couple files that have application/msword content types, but have a JSP extension. Interesting! Looking for file extension and content type mismatches like this is a good place to begin a hunting exercises.
Using this information, I can begin investigating the sources of these files and the hosts that downloaded them by clicking the Explore button and expanding the relationships that exist. I can look for suspicious domains, malicious user agents, and more.
Finally, it may be helpful to aggregate all the file types on your network to get a baseline for what file types are more common than others, and to look for outliers.
This query selects the content type field (rs_content_type) from the HTTP proxy data source and counts all unique entries for that field. The results are sorted by the count with the most common file types at the top.
SELECT COUNT(*),rs_content_type FROM Sqrrl_ProxySG GROUP BY rs_content_type ORDER BY COUNT(*) DESC
You should see several file types you recognize at the top of this list. The outliers will be at the bottom, or you can reverse the order of the list by changing DESC to ASC.
In larger enterprises, searching for many of these files types is probably going to be labor intensive for manual hunting. If you want to further analyze any of these file types you’ll probably start with some automated analysis to hone your investigation, or you’ll focus these searches on a specific host for which you’ve already observed other suspicious occurrences. As with many hunting techniques, this one is much more powerful when combined with others.
Conclusion
If malicious code executes on your system, there will generally be a file associated with it. In this post, I discussed the common types of files that contain malicious code and discussed techniques for finding those files using network data sources. In the second article in this series, we’ll discuss finding suspicious files using host-based data sources, and I’ll provide tips for investigating suspicious files once you’ve found them.