The art of Threat Hunting is more fun at times when especially dealing with individual pieces of puzzle which are isolated. This article brings out the importance of Email header analysis and how it helps in a hunt trip. Email header analysis is one of the oldest techniques employed by Incident Handlers and this article tries reviving this old technique to see how it can be looked through the lens of Threat Hunting.
With plenty of Threat campaigns using Email as Threat vector to distribute Malware and Spammers with their SPAM infrastructure, understanding the various email headers will help teams from Threat Intelligence and Threat Hunting to find the missing links.
Overview on Email Headers
To give an overview about Email headers, they contain information which is used to track an individual email, detailing the path a message took as it crossed mail servers. Especially, this is helpful while investigating SPAM, MalSPAM and Phishing Emails. Though, with the technology advancements like Email Gateways who can catch this, at times for a Hunting team or a Threat Intel team, the Email Header Analysis would help to track a Threat Actor, Campaign and their infrastructure.
As per the RFC 2822 from IETF, an email message consists of Header fields followed by message body. The header lines are used to identify particular routing information of the message, including the sender, recipient, date and subject. Some headers are mandatory like FROM, TO and DATE. Other header information includes the sending timestamps and the receiving timestamps of all the mail transfer agents(MTA) that have received and sent the message.
Important fields that could be of interest are,
1. Origination date field
The origination date specifies the date and time at which the creator of the message indicated that the message was complete and ready to enter the Mail delivery system. So, this is the time that a user pushes the “send” or “submit” button in an application program.
2. Originator Fields
The Originator fields of a message consist of the below fields and indicates the source of the message.
This field specifies the author(s) of the message i.e, the mailbox(es) of the person(s) or the system(s) responsible for writing the message.
This field specifies the mailbox of the agent responsible for the actual transmission of the message. For example, if Person A is sending a mail on behalf of another Person B, the mailbox of Person A would appear in the “Sender:” field and the mailbox of the actual author would appear in the “From:” field.
This is an optional field an if present, it indicates the mailbox(es) to which the author of the message suggests that replies be sent. In the absence of this field, replies should by default be sent to the mailbox(es) specified in the “From:” field.
In many cases, phishing authors have exploited this field by having this enabled so that the recipient/victim of this mail might send the information to a different unintended mailbox.
3. Destination Address Fields
The destination address fields specify the recipients of the message. Below are the fields,
This field contains the address(es) of the primary recipient(s) of the message.
This field abbreviated as Carbon Copy contains the addresses of others who are to receive the message, though the content of the message may not be directed at them.
This field abbreviated as Blind Carbon Copy contains addresses of recipients of the message whose addresses are not to be revealed to other recipients of the message.
4. Identification Fields
These are optional as below,
Every message should have a “Message-ID:” field. The “Message-ID:” field contains a single unique message identifier that refers to a particular version of a particular message. A message identifier pertains to exactly one instantiation of a particular message and subsequent revisions to the message each receive new message identifiers. The generator of the message identifier MUST guarantee that the msg-id is unique.
The contents of this field identify previous correspondence which this message answers.
The contents of this field identify other correspondence which this message references. Also, one more point to be noted is that all reply messages should have “In-Reply-To:” and “References:” fields.
5. Informational Fields
These are all optional.
The “keywords:” field contains a comma-separated list of one or more words or quoted-strings.
This is the most common field and contains a short string identifying the topic of the message.
This field contains any additional comments on the text of the body of the message. The “Subject:” and “Comments:” fields are unstructured fields.
If data encryption is used to increase the privacy of message contents, the “ENCRYPTED” field can be used to indicate the nature of the encryption.
6. Trace Fields
These are a group of header fields which provides trace information and which are used to provide an audit trail of message handling. In addition, it also indicates a route back to the sender of the message.
This field is added by the final transport system that delivers the message to its recipient. The field is intended to contain definitive information about the address and route back to the message’s originator.
A copy of this field is added by each transport service that relays the message. The information in the field can be helpful while troubleshooting any network problems as well as while investigating Phishing and SPAM.
7. Additional Fields
Additionally, there are parameters as below which helps in investigation.
The VIA parameter may be used to indicate what physical mechanism the message was sent over.
The WITH parameter may be used to indicate the mail or connection level protocol that was used, such as SMTP or X.25 transport protocol.
c. Date and Time Specification
The headers will also carry the date, time zone information which would be one of the key information to investigate.
This field specifies the client software or program used by the source to send the mail
Note: Email headers should always be read from Bottom to Top
Overview of Email Inbound and Outbound
An Email program like MS Outlook is a client application that needs to interact with a Mail server. Typically, there are two servers, one for incoming and the other one for outgoing email. The client receives email through one of the three below protocols,
- Post Office Protocol (POP)
- Internet Message Access Protocol (IMAP)
- Microsoft Mail API (MAPI)
All Incoming mails are stored on a mail server and further distributed into the appropriate mailbox. POP Users can download all their mail. They can further store or delete them. So, in case of POP, all incoming emails are stored on User’s workstation.
On the other hand, IMAP and MAPI users have the option of leaving their email on the server, though they can make copies on their own workstation.
All Outgoing mails uses the Simple Mail Transfer Protocol (SMTP). Its objective is to transfer mail reliably and efficiently. SMTP is the only protocol used to transport mails across networks, usually referred to as SMTP Mail Relaying.
Sample Email Header and Fields of Interest
Below are the email headers for one of the Malspam campaigns found to distribute JAFF ransomware. The ones marked in BOLD are the interesting headers for performing hunting.
Received: from breakawaydistributing.com ()
Tue, 11 Apr 2017 14:12:51 +0000 (UTC)
Date: Tue, 11 Apr 2017 07:12:24 -0700
Reply-To: “USPS International” <email@example.com>
From: “USPS Ground” <firstname.lastname@example.org>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:188.8.131.52) Gecko/20080421 Thunderbird/184.108.40.206
Subject: Our USPS courier can not contact you parcel # 754277860
Email headers can be parsed online with the help of below tools,
However, in some cases due to the confidentiality of the mail and due to organizational policies, you might be refrained from using these online tools.
I’ve made a simple tool emailHeaderParser which can be used offline mentioned in the references.
Email Abuse Overview
Email still remains the preferred Threat vector for most Threat Actors to deliver the malicious payload to the victims. As per statistics from Securelist [https://securelist.com/statistics/], Mal Spam has contributed to more than 66% globally which proves again that Phishing is an unprecedented attack vector to lure the victims.
Typically, below are the various types of Email abuse that we come across in the cyber realm.
- Mal SPAM Malicious payloads
- As Attachments [Zip Archives, MS Office documents, PDF etc]
- A malicious URL in the body which downloads a payload
- SPAM often clubbed with Social Engineering techniques and by spoofing Display Names pretending to be from legit brands
- Business Email compromise [CEO Attacks, Spear-Phishing] targeting individuals
- Emails targeting Individuals or Entities with an intent of Threatening
All the above form of attacks attempts to trick their victims to either open the attachment, click on the URL or act on the mail which would be devastating at a later stage.
From an Investigation perspective, the email headers that we have discussed in the earlier sections are all helpful to track back to its origin and to immediately respond with appropriate measures like blocking the source etc.…The richer the messaging media, the more opportunity on the malicious actors to camouflage malicious content within the rich content.
However, with all the sophistication on the malicious actor end, there is a great deal of power to understand what the actual embedded data are, where they are coming from, whether the source has been spoofed or not, and other pertinent data.
We shall further explore on how someone can start hunting using these email headers in the next section.
Various ways of Hunting
Tracing back the source
The FROM header helps in identifying the sender of the mail. However, the same can be spoofed. So, most of the times, this may not be a vital data. However, in widespread campaigns, there might be a same sender in all the mails. To overcome the SPAM filters, now the attackers have come up with new technique called Hailstorm attack where every sender mail is unique.
The FROM address could be searched across the Internet with the help of Google Dorks [https://www.exploit-db.com/google-hacking-database/] to see if there is any history for this and if anyone else has already observed this.
From: “USPS Ground” <email@example.com>
The RECEIVED header is another vital information which helps to understand the hops that the mail has traversed. Basically, these hops would be mail relays & servers. With this header, the sender’s infrastructure and location could be located through the IP Address that gets captured which helps in attribution. Even the Email blacklists compare against these IP Addresses to identify anything malicious.
The REPLY-TO field is normally filled in with the email address for replying to the message. This is another sign of the email to be malicious.
The MESSAGE-ID field provides a nice clue as to the actual origin of the mail. Message-identifiers are supposed to be unique identifiers and a common technique is to use the date and time of the message generation as the source of the first part of the message id. This along with the Date field could help us to identify the country from where the email has originated. Lastly, the domain information in the message id helps to identify the actual domain associated with this email originator.
Leveraging Threat Intelligence
There are numerous Threat Intelligence vendors who offer premium services who maintain the inventory of these malicious actors. Also, there are few Open-Source Threat Intel portals like (threatminer.org) who carry information about malicious email actors. So, it’s always a good idea to run the Email headers captured with the Threat Intelligence IOCs to understand if the email was part of a targeted attack or part a general SPAM. With Email fraud continue to rise more than 100% year on year, new ways of attribution especially leveraging Email Threat Intelligence providers is highly recommended in addition to the other security defenses in place.
For example, we have an IOC about a campaign in the form of Email IDs, Source Address etc. Running this against the email headers in an automated way would help to see if the organization’s infrastructure is impacted as well by the same Threat Actor/Campaign. However, in case of “Snow-Shoe” campaigns, the spammers use various source IP Addresses to dilute reputation metrics and evade filters. Threat Intelligence here can be of great help again!!
Below is the list of possible IOCs for lookup on Email header data collected.
- FROM email addresses
- Originating IP Addresses
- Attachment Names
- Embedded URLs
- Subject Line
- Display Name
The most frequently spoofed Header From field is the Display Name, for which there is currently no authentication mechanism available.
However, we might not certainly say that these would help in detection but would be of great help in arriving at the statistics through big data platform and coming up with patterns.
Bulk Analysis through Analytics platforms
Subject fields containing personalized content like shipping order clubbed with a random generated number for every spam mail targeting the organization to evade the filters.
Improper capitalization used in the domain names & subjects could also be one of the other malicious ways.
Below are few indicators which can be automated and can be run against the huge header data collected.
- Misspelled domain names
- Misspelled sender’s name
- Improper capitalization
- Domain names that do not match the supposed seller
- Gibberish in the email address
- Unknown senders
- Other discrepancies
- Multiple recipients
- Unrelated recipients
- Odd groupings of recipients
- Email attachments you are not expecting to receive
- Files which appear to have double extensions (like photo.jpg.exe)
- Subjects which convey a sense of urgency
- Subjects which try to scare us or tempt us with something illicit
- Subject lines which don’t match the content of the message
- Strange wording, poor grammar, misspellings, and odd capitalization
- Emails which appear to be replies to messages we never sent
Also, conducting a behavioral analysis on the data collected with the above parameters mentioned could help find the needle in a haystack. There are different data analysis packages available in Python, R which could help to find some interesting patterns. Commercial Security Analytics solutions also could help with the advanced techniques like Linked-Data Analysis. One such blog that I found of interest is below,
At an organizational level, performing analytics might be time-consuming and expensive, however the kind of value that it generates through some patterns is in no comparison with the damages that might arise.
Humans are fallible and they are going to open a malicious email. However, knowing what to do afterwards is as important as knowing how to avoid danger in the first place. This article helps to give some basics on email headers and later points out those tricks to identify a bad one vs good one.
As a closing note, below are the various ways in combatting email.
- Do not trust that any message that you receive is legitimate, treat it with suspicion
- Look at messages for content, misspellings and other anomalies
- Do not click on any embedded links
- Do not open any attachments
- Keep your antivirus software up to date
- Offline Header Parsing Tool – https://github.com/krlplm/parseemailheader