This post was originally published here by Hem Karlapalem.
The art of Threat Hunting can be especially fun when dealing with isolated individual pieces of puzzle. This article brings out the importance of email header analysis and how it can help in a hunt trip. Email header analysis is one of the oldest techniques employed by incident handlers and this article tries to revive this old technique to see how it can be looked at through the lens of Threat Hunting. Since plenty of threat campaigns using email as a vector to distribute Malware and spam infrastructure, understanding the various email headers will help threat hunters to find missing links.
Overview on Email Headers
Email headers contain information which is used to track an individual email, detailing the path a message takes as it crosses mail servers. This is especially helpful when investigating SPAM, MalSPAM and phishing emails. Though there have been tools developed such as Email Gateways which can catch this, at times it is still necessary for a hunt team or threat intel team to use email header analysis to track a threat actor, campaign, or infrastructure.
As per the RFC 2822 from IETF, an email message consists of header fields followed by a message body. The header lines are used to identify particular routing information of the message, including the sender, recipient, date and subject. Some headers are mandatory like FROM, TO and DATE. Other header information includes the sending timestamps and the receiving timestamps of all the mail transfer agents(MTA) that have received and sent the message.
Important fields that could be of interest are:
- Origination date field
The origination date specifies the date and time at which the creator of the message indicated that the message was complete and ready to enter the Mail delivery system. So, this is the time that a user pushes the “send” or “submit” button in an application program - Originator Fields
The originator fields of a message consist of the below fields and indicates the source of the message.
a) From
This field specifies the author(s) of the message i.e, the mailbox(es) of the person(s) or the system(s) responsible for writing the message.
b) Sender
This field specifies the mailbox of the agent responsible for the actual transmission of the message. For example, if Person A is sending a mail on behalf of another Person B, the mailbox of Person A would appear in the “Sender:” field and the mailbox of the actual author would appear in the “From:” field.
c) Reply-to
This is an optional field. If present, it indicates the mailbox(es) to which the author of the message suggests that replies be sent. In the absence of this field, replies should by default be sent to the mailbox(es) specified in the “From:” field. In many cases, phishing authors have exploited this field by having this enabled so that the recipient/victim of this mail might send the information to a different unintended mailbox. - Destination Address Fields
The destination address fields specify the recipients of the message.
a. To This field contains the address(es) of the primary recipient(s) of the message.
b. Cc
This field abbreviated as Carbon Copy contains the addresses of others who are to receive the message, though the content of the message may not be directed at them.
c. Bcc
This field abbreviated as Blind Carbon Copy contains addresses of recipients of the message whose addresses are not to be revealed to other recipients of the message. - Identification Fields
These are optional as below:
a. Message-ID
Every message should have a “Message-ID:” field. The “Message-ID:” field contains a single unique message identifier that refers to a particular version of a particular message. A message identifier pertains to exactly one instantiation of a particular message and subsequent revisions to the message each receive new message identifiers. The generator of the message identifier MUST guarantee that the msg-id is unique.
b. In-reply-to
The contents of this field identify previous correspondence which this message has answered.
c. References
The contents of this field identify other correspondence which this message references. Also, one more point to be noted is that all reply messages should have “In-Reply-To:” and “References:” fields. - Informational Fields
These are all optional.
a. Keywords
The “keywords:” field contains a comma-separated list of one or more words or quoted-strings.
b. Subject
This is the most common field and contains a short string identifying the topic of the message.
c. Comments
This field contains any additional comments on the text of the body of the message. The “Subject:” and “Comments:” fields are unstructured.
d. Encrypted
If data encryption is used to increase the privacy of message contents, the “ENCRYPTED” field can be used to indicate the nature of the encryption. - Trace Fields
These are a group of header fields which provides trace information and which are used to provide an audit trail of message handling. In addition, it also indicates a route back to the sender of the message.
a. Return-Path
This field is added by the final transport system that delivers the message to its recipient. The field is intended to contain definitive information about the address and route back to the message’s originator.
b. Received
A copy of this field is added by each transport service that relays the message. The information in the field can be helpful while troubleshooting any network problems as well as while investigating Phishing and SPAM. - Additional Fields
Additionally, there are parameters as below which helps in investigation.
a. VIA
The VIA parameter may be used to indicate what physical mechanism the message was sent over
b. WITH
The WITH parameter may be used to indicate the mail or connection level protocol that was used, such as SMTP or X.25 transport protocol.
c. Date and Time Specification
The headers will also carry the date, time zone information which would be one of the key information to investigate.
d. User-Agent
This field specifies the client software or program used by the source to send the mail
Note: Email headers should always be read from Bottom to Top
Overview of Email Inbound and Outbound
An Email program like MS Outlook is a client application that needs to interact with a mail server. Typically, there are two servers, one for incoming and the other one for outgoing email. The client receives email through one of the three below protocols,
- Post Office Protocol (POP)
- Internet Message Access Protocol (IMAP)
- Microsoft Mail API (MAPI)
All incoming mail is stored on a mail server and further distributed into the appropriate mailbox. POP Users can download all their mail. They can further store or delete them. So, in case of POP, all incoming emails are stored on a user’s workstation.
On the other hand, IMAP and MAPI users have the option of leaving their email on the server, though they can make copies on their own workstation.
All Outgoing mail uses the Simple Mail Transfer Protocol (SMTP). Its objective is to transfer mail reliably and efficiently. SMTP is the only protocol used to transport mails across networks, usually referred to as SMTP Mail Relaying.
SMPT Basic Structure
Sample Email Header and Fields of Interest
Below are the email headers for one of the Malspam campaigns found to distribute JAFF ransomware. The ones marked in BOLD are the interesting headers for performing hunting.
Received: from breakawaydistributing.com ()
by creativedude248726@gmail.com;
Tue, 11 Apr 2017 14:12:51 +0000 (UTC)
Message-ID: <D5342094.84830072@breakawaydistributing.com>
Date: Tue, 11 Apr 2017 07:12:24 -0700
Reply-To: “USPS International” <lrnvaoy1467488@breakawaydistributing.com>
From: “USPS Ground” <lrnvaoy1467488@breakawaydistributing.com>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.14) Gecko/20080421 Thunderbird/2.0.0.14
X-Accept-Language: en-us
MIME-Version: 1.0
To: creativedude248726@gmail.com
Subject: Our USPS courier can not contact you parcel # 754277860
Content-Type: text/plain;
charset=”us-ascii”
Content-Transfer-Encoding: 7bit
Email headers can be parsed online with the help of below tools,
However, in some cases due to the confidentiality of the mail and due to organizational policies, you might be refrained from using these online tools.
I’ve made a simple tool emailHeaderParser which can be used offline mentioned in the references.
Email Abuse Overview
Email still remains the preferred threat vector for most threat actors to deliver malicious payloads to victims. As per statistics from Securelist, Mal Spam has contributed to more than 66% to attacks globally.
Typically, below are the various types of Email abuse that we come across in the cyber realm.
- Mal SPAM Malicious payloads
- As Attachments (zip archives, MS Office documents, PDF etc]
- A malicious URL in the body which downloads a payload
- SPAM often clubbed with social engineering techniques and by spoofing display names pretending to be from legit brands
- Business Email compromise [CEO Attacks, Spear-Phishing] targeting individuals
- Emails targeting Individuals or Entities with an intent of Threatening
All the above forms of attack attempts to trick the victims to either open the attachment, click on the URL or act on the mail which would be devastating at a later stage.
From an investigation perspective, the email headers that we have discussed in the earlier sections are all helpful to track back to its origin and to immediately respond with appropriate measures like blocking the source etc. The richer the messaging media, the more opportunity for the adversaries to camouflage malicious content within the rich content.
However, with all the sophistication on the malicious actor end, it is necessary to understand what the actual embedded data are, where it is coming from, whether the source has been spoofed or not, and so on.
Methods for Hunting:
Tracking Back to the Source
The FROM header helps identify the sender of the mail. However, that can be spoofed. So, most of the time, this may not be a vital data point. However, in widespread campaigns, the same sender might be used for all the mail sent. To overcome SPAM filters, attackers have come up with new technique called a “Hailstorm” attack where every sender is unique.
The FROM address could be searched across the Internet with the help of Google Dorks to see if there is any history for this and if anyone else has already observed this.
From: "USPS Ground" <lrnvaoy1467488@breakawaydistributing.com>
The RECEIVED header is another vital information source which helps to understand where the mail has traversed (or “hopped”). Basically, these hops would be mail relays & servers. With this header, the sender’s infrastructure and location could be located through the IP Address that gets captured– that helps with attribution. From there, these IP addresses can be checked against existing blacklists to identify anything malicious.
The REPLY-TO field is normally filled in with the email address for replying to the message. This is another sign of the email to be malicious.
The MESSAGE-ID field provides a nice clue as to the actual origin of the mail. Message-identifiers are supposed to be unique identifiers and a common technique is to use the date and time of the message generation as the source of the first part of the message ID. This along with the Date field helps us to identify the country from where the email has originated. Lastly, the domain information in the message ID helps to identify the actual domain associated with this email.
Message-ID: <D5342094.84830072@breakawaydistributing.com>
Leveraging Threat Intelligence
Message-ID: <D5342094.84830072@breakawaydistributing.com>
There are numerous threat intel vendors who offer premium services and maintain the inventory of these malicious actors. Also, there are a few open source threat intel sites which carry information about malicious email actors. So, it’s always a good idea to compare the captured email headers to known IOCs to understand if the email was part of a targeted attack or just general spamming. With email fraud continuing to rise, new ways of securing attribution (especially by leveraging email threat intelligence) is highly recommended in addition to your other security practices.
For example, let’s say we have an IOC pertaining to a campaign in the form of email IDs, source addresses etc. Running this against the email headers in an automated way would help to see if the organization’s infrastructure is impacted as well by the same threat actor/campaign. However, in case of “Snow-Shoe” campaigns, the spammers use various source IP Addresses to dilute reputation metrics and evade filters. Threat Intelligence here can be of great help!
Below is the list of possible IOCs for lookup on collected email header data.
- FROM email addresses
- Originating IP Addresses
- Attachment Names
- Embedded URLs
- Subject Line
- Display Name
The most frequently spoofed “header from” field is the Display Name, for which there is currently no authentication mechanism available.
While we can’t say for certain that these would help in detection, they are really helpful in drawing in statistics through big data platforms.
Bulk Analysis through Analytics platforms
Subject fields can be analyzed by the content they include, such as shipping orders clubbed with a randomly generated number for every spam mail targeting the organization to evade the filters.
Below are few indicators which can be automated and can be run against the huge header data collected.
From: field
- Misspelled domain names
- Misspelled sender’s name
- Improper capitalization
- Domain names that do not match the supposed seller
- Gibberish in the email address
- Unknown senders
To: field
- Multiple recipients
- Unrelated recipients
- Odd groupings of recipients
Attachments:
- Email attachments you are not expecting to receive
- Files which appear to have double extensions (like photo.jpg.exe)
Subject line:
- Subjects which convey a sense of urgency
- Subjects which try to scare us or tempt us with something illicit
- Subject lines which don’t match the content of the message
- Strange wording, poor grammar, misspellings, and odd capitalization
- Emails which appear to be replies to messages we never sent
Also, conducting behavioral analysis on data collected with the above parameters could help you find the needle in a haystack. There are different data analysis packages available in Python, R which could help to find some interesting patterns. Commercial Security Analytics solutions also could help with the advanced techniques like Linked-Data Analysis.
At an organizational level, performing analytics might be time-consuming and expensive. However the kind of value that it generates through some patterns is in no comparison with the damages that might arise.
Conclusion
Humans are fallible and it is inevitable that at least one person in your organization is going to open a malicious email.
However, knowing what to do afterwards is as important as knowing how to avoid danger in the first place. As a closing note, below are the various ways in combatting email.
- Do not trust that any message that you receive is legitimate, treat it with suspicion
- Look at messages for content, misspellings and other anomalies
- Do not click on any embedded links
- Do not open any attachments
- Keep your antivirus software up to date
Offline Header Parsing Tool – https://github.com/krlplm/parseemailheader
– See more at: https://sqrrl.com/hunting-email-headers/#sthash.oj7eIxja.dpuf