I Introduction

In the modern age, almost all information is generated, edited and stored in a digital format. Digital evidence presents in a variety of ways including text-based records like spreadsheets, reports or notes, as well as multimedia like photographs, and audio or video recordings.

The nature of digital evidence can sometimes make it complex to access and interpret. Just like physical evidence, digital evidence can be easily subject to spoliation if not handled properly, with a strict chain of custody maintained.

Because every individual and organisation typically has such a large quantity of digital information in their custody, it can be a daunting task to find the relevant pieces. Digital forensic and electronic discovery professionals can assist with the collection, analysis, security, review, production and reporting on all types of digital evidence.

II Preservation of digital evidence

i Forensic images

Any type of physical media capable of storing information can be forensically imaged. This includes computer hard drives, solid state drives, external USB devices, SD cards used in cameras, game systems, smartphones, among others.

All of digital forensic science is based on a class of computer algorithms known as 'hash algorithms'. Hash algorithms are designed to provide a digest (or 'hash') of any given input. Given an identical input, a hash algorithm will generate an identical output every time. If a hash algorithm generates the same output for two inputs, it mathematically proves those inputs are exactly the same.

Forensic images are an exact, bit-by-bit copy of a digital storage device that is written out to a file. To verify a forensic image, the hash of that output file is compared with the hash that was generated when the information was read from the original storage device. If these hash values match, it means the forensic image is an exact copy of the original disk and can be treated as such for the purposes of analysis.

Once a forensic image is generated, that image cannot be modified. If the image is modified, the next time the evidence is verified, the hash will not match. If a forensic image were to be restored to a digital storage device and placed in a computer, the content of that digital storage device would be indistinguishable from the original.

Digital forensic analysis is performed against a forensic image rather than the original device, ensuring that the evidence and the investigation are not compromised.

ii Targeted collections

While forensic images are truly the 'best evidence' when it comes to digital information, they can be time consuming to create, and are not always desirable to clients because of budget constraints or concerns about overcollection and data privacy.

In these cases, it is possible to perform a targeted collection that captures only specified files or folders on a computer system or account. A best practice for targeted collection is to encapsulate the target in an archive called a logical image.

Logical images are similar to forensic images in that they cannot be altered without causing a hash mismatch, and in that they preserve metadata. Logical images differ from forensic images in that they do not provide as much context for an investigation, and they do not allow deleted file recovery, because only the specified files are captured.

A major downside to targeted collections is that if for some reason relevant data is not identified as part of the initial target, the examiner will have to go back to the source and recollect. Recollecting the same sources always causes an increase to cost, and there is always an additional risk that the necessary information will no longer available from the source.

iii Collecting from the Cloud

The popularity of Cloud services is increasing daily as more and more individuals and organisations move towards decentralised infrastructure and software as a service solution for their information technology needs.

Each Cloud provider has a different method of transferring or exporting data back to their users. In some cases, a given Cloud provider's export solution will be deficient for the purposes of an investigation or even non-existent. In those cases, specialised third-party software will need to be used to complete the collections.

Cloud collections are more similar in nature to logical images than they are traditional forensic images. In the Cloud, there is no unallocated space to image. Because there is no unallocated space, there is nowhere to look for deleted information outside of the Cloud provider's normal retention policy.

The same approach to Cloud collections can also be applied to social media services. Public posts can be collected via digital forensic tools. Provided credentials, built in export tools or digital forensic tools can be used to collect entire accounts, including direct messages between individuals on the social media platform.

III Analysis of digital evidence

i Forensic analysis

Forensic analysis is a 'deep dive' into a system that is capable of storing data. Analysis methodology will vary according to the context of the case, but typically in civil litigations, forensic analysis focuses on the systems themselves and the usage of those systems rather than the documents contained on them.

Digital forensics is closely tied to information security incident response. These types of responses are necessary when an organisation is damaged by a cyber-attack and needs assistance getting the attacker out of their network, remediating the network from the attack, and even notifying their customers regarding any personally identifiable information that was exfiltrated by the attacker. Digital forensic analysis can be used to determine the initial point of compromise and assist with directing the remediation and determining the extent of damage caused by the attack.

The most destructive forms of cyber-attacks are currently ransomware attacks that encrypt all the information on a victim's network, essentially holding the information hostage until the ransom is paid, as well as business email compromise attacks. The most common attack vector for both of these is known as 'phishing'.

Phishing is the act of sending an email that appears to be legitimate but actually contains a malicious link. If the user clicks on the link, it will either download malware to the victim's computer, or it will send the user to a website designed to steal the user's login credentials. Once login credentials are stolen, the attacker can use those to perform privilege escalation attacks, or to send messages to other users in the organisation and gain access to their accounts as well.

Deleted document recovery and extraction of user-generated information is also considered part of the forensic analysis process. Once the documents are exported, the review of the documents and other user-generated content is covered by electronic discovery services.

ii Electronic discovery

Electronic discovery is the digital version of the traditional discovery process. During a discovery process, relevant documents are exchanged between parties involved in litigation to exchange relevant information prior to a trial.

While electronic discovery review can act as a 'deep dive' into individual documents and emails, the quantity of information available is sometimes daunting. Luckily, electronic discovery is one area where technology can at least partially assist in solving the problem it has created.

Once information is loaded into an electronic discovery platform, there are many analytical tools that can be leveraged to help get to relevant data faster. Typically, the available analytical features will include concept clustering, which groups documents with similar concepts so that entire irrelevant concepts can be excluded at once. Another extremely useful analytical tool is email threading, which ensures reviewers will only have to review the final, most complete email thread.

An example of an analytics-heavy electronic discovery workflow is predictive coding or assisted review. During assisted review, the system will learn from the actions of human reviewers for a pre-set number of training rounds. After the training rounds are complete, the system will suggest coding or categorisation for the remaining records and display the results to the review team. The review team will perform quality control checks against the documents categorised by analytics and perform additional training rounds if they are not satisfied with the system's margin of error.

There are also advanced analytical features available today such as semantic analysis (the system will predict someone's mood at the time an email was composed), and visualisation tools, so that reviewers can quickly see a visualisation of who within a data set was communicating with which other custodians, about which topics, how frequently and when.

iii Physical analysis

While analysing the digital information is important, it is also useful to physically inspect devices for clues about their usage, or to determine what happened to the information that used to be stored within them. Users will occasionally physically damage devices in an attempt to destroy the information contained within. Those efforts are not always entirely successful, and it is important to check the true state of the hardware.

In some rare cases, we have seen new storage devices installed into older computers in an attempt to pass those new storage devices off as the originals. This practice is detectable by querying the system vendor for the original equipment that was included with the computer system and comparing it against the parts currently installed in the computer. We have seen laptops with hard drives installed that were manufactured years after the system was built, with the users trying to pass them off as the original hardware.

iv Reporting and productions

The first form of output from a digital forensic investigation will be exports from tools that have analysed system artifacts, document listings and a general summary of initial findings. These items, along with a listing of what information is available on a given data source, can be shared with counsel and discussed to provide further direction to an investigation.

Forensic expert reports generally outline the most useful evidence found during an investigation, construct a narrative of what was done with the available evidence sources and draw conclusions relevant to the case based on the evidence found on devices. Occasionally it is necessary to provide copies of forensic images to the opposing side in a litigation so that they can have the report verified by an additional expert.

Electronic discovery reports typically are in the form of a document production. Document production specifications are discussed and exchanged early in a court proceeding. Typically, a production will be received from the opposing side and loaded into the electronic discovery platform for review by counsel.

IV Typical Areas of Interest

i User-generated information

The most obvious form of information available from digital devices and accounts is user-generated information such as email, documents, calendar entries and notes. This is the data that the user chooses to save to the digital storage device and is the actual reason for their usage of the device in the first place.

In addition to user-generated information, computers are typically used for secondary purposes and end up storing user-generated information in places that aren't quite as obvious. For example, a user may decide to back up their smartphone to their computer in case something happens to that device, or they decide to upgrade to a new device. In those cases, an investigator may be able to locate or recover that smartphone backup and essentially be able to perform an investigation on the user's smartphone as it was at the point in time when the backup was created.

In the past decade, cryptocurrency has become more mainstream and began to gain real value. It is possible that during an investigation, a cryptocurrency wallet, or even simply wallet addresses may be discovered. This both poses risks for the investigator (who may be accused of stealing the cryptocurrency, as they had access to the keys), as well as greatly assists the investigator, who is now able to conduct all kinds of online asset tracing.

ii System artifacts

In addition to information intentionally generated by users, there are also system artifacts that are a byproduct of the standard usage of the computer. Computers are constantly recording what they are used for in some manner. Whether information is available through a resource like a system log, every file on a system also has metadata that can be interpreted and pieced together to provide a narrative of a user's actions on a computer.

Major events that occur on computers are typically logged. For example, every time a user logs into a computer, the system notes what time that occurred, which user it was, and even if there were any incorrect password attempts prior to the successful login.

In addition to events such as logins being recorded, most systems typically keep lists of which files were recently accessed. This is done so that the user has the convenience of referencing a list of their most recently accessed files, but also helps an investigator to tell the story of what the computer was used for.

Computers also maintain similar lists in other system artifacts for things such as internet history, and even accessory history, such as a record of which USB storage devices were connected to a computer, and when. By correlating the USB storage history with the file access history, it is sometimes possible to tell if a user copied any files from a system to an external USB storage device.

One of the major benefits of forensic imaging is that it creates an exact copy of a digital storage device, including any unused space. This is useful because until information on a storage device is overwritten with new information, that information can be recovered by computer forensic software. That means it is possible to both report some information about which files were deleted by a user and when, as well as to recover copies of those deleted files and export them for review.

V Summary

Because almost all information is digital, computer forensics and electronic discovery are necessary parts of almost every investigation. Collecting information properly and defensibly is key. If the origin of data is questionable or tampering is suspected, important evidence could be rendered useless.

With so much data available, digital forensic and electronic discovery analytical tools are indispensable. It is absolutely necessary to cut through the noise as quickly as possible and locate any meaningful information.

Once the relevant facts are located, digital forensic experts can help put it together and construct a timeline or narrative of the important facts, outlining exactly what happened, and quickly getting the truth into a report for use by the courts.


Footnotes

1 Joel Bowers is a managing director in the Cyber Risk practice of Kroll.