In the fast-moving world of cyber security incident response, the challenge is to stay ahead of the threat. Incident responders must move faster, be more agile, have longer stamina than the attacker, and be more responsive than malware can morph and be concealed.
In the world of small networks (1-100 nodes), this is not a particularly oppressive challenge with the old methodologies, tools and procedures. However, in midsize to large-scale enterprises, the old ways and tools will leave you chasing your tail in an attempt to find the malware, isolate the breach, and remediate the network as it morphs or infects faster than you can find and remediate it.
Typically the successful breach response will follow the following technical phases:
- Phase I - Find the Breach Identify the malware and other advanced persistent threats (APT) Determine origin of the code Determine the variants of the exploit Determine the characteristics of exfiltration (if hard coded) Identify all the systems, services and devices that have been compromised Determine what or who is the target of the attack and what the objective of the attack was (e.g. PCI data, personal data) Determine the origination of the attack and the attack vector into your network Determine if it originated from outside your network or inside and respond accordingly Collect information about the command and control nodes that were used in the attack (IP addresses, domain names, etc.) Ensure that no APT is dormant on the system Prepare a forensic package to assist law enforcement in the prosecution of the case
- Identify the malware and other advanced persistent threats (APT) Determine origin of the code Determine the variants of the exploit Determine the characteristics of exfiltration (if hard coded)
- Determine origin of the code
- Determine the variants of the exploit
- Determine the characteristics of exfiltration (if hard coded)
- Identify all the systems, services and devices that have been compromised
- Determine what or who is the target of the attack and what the objective of the attack was (e.g. PCI data, personal data)
- Determine the origination of the attack and the attack vector into your network
- Determine if it originated from outside your network or inside and respond accordingly
- Collect information about the command and control nodes that were used in the attack (IP addresses, domain names, etc.)
- Ensure that no APT is dormant on the system
- Prepare a forensic package to assist law enforcement in the prosecution of the case
- Phase II – Remediate/isolate the contaminated systems
- Phase IV - Re-secure the network
Needless to say, speed is essential in completing all of these phases, but especially with Phase I and Phase II. The longer that it takes to stop the bleeding, the more exposure there is in terms of fines, legal liability and damage to corporate image. In practical terms, this need for maximizing the speed in which a breach is handled places some very exacting and demanding requirements on the capabilities of the tool that is used to perform the incident response (IR).
First and foremost, the tool must be forensically sound. In nearly every case, you will identify that there is a breach and your team will begin reacting to it long before law enforcement agencies respond.
On the IR battlefield, your incident responders will make decisions based on the reliability of the data that you collect, and that requires a forensic grade of exactness. If law enforcement gets involved, your computer incident response team (CIRT) will need to provide them with forensically sound data to enable the successful prosecution of the case.
Before everyone jumps on the single hard drive examination model bandwagon in which every drive in the enterprise is imaged and then examined with a stand-alone forensic tool, be aware that this model doesn’t scale to the size and complexity of the modern malware battlefield, nor is it required for a successful prosecution of the case.
For large enterprises, this model is way too slow and expensive. Your tool must be able to generate digital fingerprints in the form of Message Digest 5 (MD5) and SHA hashes for each piece of evidence that is collected. Your tool must also have the ability to store that evidence in an investigative container with procedures and controls to ensure to a legal standard that the integrity and originality of the investigative container are pristine.
Any file or other OS artifact that must be acquired and maintained as part of the investigation must be done so as not to change the file system artifacts or metadata. These artifacts include the file created date, file modified date, last accessed date and, depending on the file system involved, the deleted date. It also applies to items like volatile memory and individual processes that are imaged. It goes without saying that the investigative tool must not change these investigative artifacts in the acquisition process.
The preservation of the metadata and the generation of the hash values for each piece of evidence allow your team to testify to the originality of the evidence and to the preservation state of the evidence. To stand up to legal scrutiny, your tool should have the ability to log your actions and any action should be able to be replicated. Without these basic elements, successful prosecution of perpetrators of the breach will likely fail.
Second, the CIRT tool must be truly enterprise capable. Given the speed in which an investigation must be accomplished, coupled with the size of modern enterprises, you should be able to conduct searches simultaneously across your enterprise without performing a self-induced denial of service attack on yourself.
Surprisingly, the forensic tools with the most market share for the enterprise environment are not capable of this essential task. The reason for this is found in the evolution of these tools. Originally, these tools were stand-alone computer forensic examination programs designed for “dead drive” forensics on single computers.
As the need arose for remote acquisitions and analysis, these companies simply added an agent that allows the examiner to access a remote computer, but they didn’t change the investigative dynamics of the programs themselves.
Under this model, all the data from the remote computer must be transported from the remote node back to the investigative computer for analysis. If an examiner needed to do a grep search across both the allocated and unallocated space on a hard drive—a normal occurrence for a complete forensics investigation—the contents of the entire hard drive of each computer in question must be transported across the network.
To shield their customers from really understanding this limitation (and from performing a self induced DOS on their network), these companies often license their programs under a “concurrent connection” model. Under the concurrent connection model, the examiner is constrained to using between one and 10 concurrent connections, thereby limiting the examiner to only being able to search up to 10 computers at a time. In the world of three terabyte hard drives for desktops and even small networks containing 1,000+ computers, it doesn’t take a rocket scientist to figure out that the concurrent connection model will not scale.
The irony of this type of licensing model, however, is that the customer thinks that it is a limitation of their software license, not that the tool simply cannot handle the task. To truly support an incident response investigation, your tools must be able to search all of the nodes on your network in parallel. Only by searching simultaneously can your CIRT get ahead of the breach and any polymorphic activity that is occurring.
Third, memory analysis shouldn’t be a bolt on “we do it too.” It should be part of the core functionality of your CIRT tool. Your tool must be able to handle remote node memory extremely well. Let’s face it, the ability to identify processes, identify the executable that spawned the processes, identify what other processes have hooks into a given process, and the ability to identify process dependencies are all critical to finding and isolating a breach. It is in this memory space that the malware must live to function.
Consequently, your tool should have the ability to operate in this space and have the ability to remotely acquire individual processes and/or the entire memory of the remote nodes. Unfortunately, a significant amount of tools out there only acquire what is actually located in the RAM at that particular moment, ignoring or unable to access the data that has been cached in the pagefile.sys or hiberfile.sys as part of the virtual memory management of a system.
When you do perform your memory acquisition, if your acquired memory image size is the same size as your installed RAM, that is a pretty good indicator that you are only getting the data that is loaded into RAM and not the cached bits and bytes located in your virtual memory. Similarly, in an enterprise environment, you want to have the option of not pulling the entire contents of RAM over the network in lieu of pulling out individual processes directly.
If your tool can’t do both of these things, it is probably time to reevaluate your tools. Given the frequency that you and your examiners are going to be probing and imaging RAM on a large breach, it shouldn’t be a 12-step process. Your tool should have the ability to remotely image the entire memory or a subset of the processes with a simple right click of the examiner’s mouse. Using the right tool, this simply isn’t too complex a task.
Fourth, your tool should be able to support live analysis methodologies. In the dynamic world of today’s business and the increasing quantity of malware incidents, this should be a no-brainer. One shouldn’t have to bring down a network or even just the critical nodes on the network to conduct an investigation.
Unfortunately, many of the leading tools don’t support live analysis of critical components of the enterprise. This requires the CIRT to either bring down the resource to create a forensic image to analyze offline, or—at a minimum—export subsets of data to analyze offline. A prime example of this is an organization’s Microsoft Exchange mail server. One of the leading causes of malware infestations is through someone clicking on an email attachment. A recent study conducted by TNS Global determined that more than 30% of all users open suspicious emails. Given that this is so common, your CIRT tool must be able to analyze the Exchange server and the contents of the individual mailboxes without taking them off line and halting office productivity.
Fifth, your CIRT tool agent needs to reside low enough in the remote node’s operating system to ‘see’ root kits. The resulting impact of this failure is that most IR tools rely on the remote node’s operating system to tell it what is there in the form of process and file listings. Try finding a root kit when the OS can’t see it using this method—you can’t. To be effective, the IR tool should be able to operate in both the OS/User readable realm as well as the physical hard drive. At the physical levels, the examiner can see everything that the OS is trying to mask, including root kits and shielded processes. This is essential to an effective IR.
Sixth, your CIRT tool should be able to mount remote nodes as local physical disks on your examination machine. There will be occasions when you will need to run a specific program for a particular purpose on an identified remote node. The top tier tools will allow you to select the remote node, select the media of interest, and then mount the device. Once mounted, the best-of-breed tools will create a volume for the device at the physical layer on your examination box. This will allow you to run programs on your local examination box against the mounted drive as if it were a truly local, physically attached hard disk. This is very versatile for specialized situations. Unmounting the remote node should be just as easy, with no residual entries or system hangs within either the host or remote OS.
Seventh, your CIRT tool should enable the team to conduct the complete investigation remotely, without requiring physical access to the remote nodes. As simplistic as this sounds, how many teams are still slapping USB or DVD disks into remote nodes and then imaging to them? Part of the impetus behind these practices is the investigative mindset that still focuses on the traditional (but outdated) “one disk, one case, one examiner” dynamic that is still taught by basic forensic educational programs. Another aspect that supports these practices is the lack of tools that can actually deliver on the claims made in the marketing slicks. A tool that will allow the full remote investigation and remediation of a cyber-threat response will support the following capabilities:
- Ability to image all the memory or selected memory segments of a node, both physical and virtual, across the network to either the local examiner’s node or another location designated by the examiner.
- Ability to support CIRT automated workflows
- Ability to drop down into an integrated command shell or GUI that will allow the examiner to remove rogue processes and perform other administrative functions on the remote node
- Support all aspects of the forensic investigation
- Remote recovery of deleted data on the remote node
- Built in viewers for the most common file formats on the remote nodes
- Ability to do remote screen captures. While often overlooked, a screen shot of what is going on at a remote node can be invaluable, especially in court
Eighth, it is imperative that a CIRT tool is able to perform extensive logging in three critical areas: examiner actions, remote node network traffic, and remote node processs/applications activity. As previously mentioned in this paper, it is essential that the CIRT tool be forensically sound from its foundation up. One of the critical components to that foundation is the ability to log all the actions taken by the examiner. This provides an exact record of what actions were performed and serves as the ultimate shield against misguided defense attorney claims or assertions that the examiner destroyed or planted evidence.
Additionally, the logs will also serve as an excellent basis from which to analyze findings and to form the basis for reporting to the client. Logging of remote node network traffic is critical to determine exfiltration methodologies, exfiltration destination addresses and the content of the payload for malware. Logging of the remote node processes and applications simplifies the identification of malware, the mode of infection, identification of attack vectors, and creates the ability to identify, isolate and remediate breaches in a fraction of the time it would take using traditional tools. In the evolving world of hacker methodologies, attacking the common log aggregators and the individual system logging systems is very common. The ability to have a self-contained logging system that is only accessible through your CIRT tool provides a trusted log base that can be used as the basis of investigation.
Ninth, in an enterprise CIRT tool it doesn’t make sense to limit the number of “seats” or concurrent users via a licensing scheme that is non-responsive to the realities of incident response. The reason that you need an enterprise level CIRT tool is because the issue that you are combating is a very BIG problem, often spread globally. While each CIRT will have standard response procedures that will dictate the number of the initial responders, often the true scope of an incident is apparent only after the team is neck deep in in the initial response investigation. Requirements to surge support into ongoing and developing responses are an everyday occurrence. It is critical then to have a CIRT tool that supports the reality of the response environment. Companies that are proactive will have a tool inside of their infrastructure pre-incident that will allow their internal staff to be augmented at a moment’s notice by surge responders or additional resources with no delay due to negotiations with software vendors for more examiner seats. That said, if the tool that you are using employs the concurrent connection methodology schema, it probably can’t handle more responders and additional investigative demands anyway. Today’s environment demands you have a tool that can.
Tenth, it is abundantly clear that our infrastructure budget is under constant downward pressures. Your CIRT tool should be versatile enough to support other functions in the environment than just look for malware. The capabilities necessary to be a world class CIRT tool also are the capabilities that are required for other needs within a corporate structure.
In many instances, the price for a world-class CIRT tool can more than pay for itself in the savings generated by consolidating tools and functions within the organization. One of our enterprising government clients has turned the purchase of their world class CIRT tool into a revenue generation center with positive cash flows by providing a fee for service function to other departments within their agency.