How to Plan and Execute Defensible Collections

Back to Blog Posts

The digital universe continues to grow at a staggering rate, increasing the amount and kinds of data and sources subject to preservation and collection in a dispute. 

This article will walk you through critical questions to ask as you develop the optimal defensible collection strategy for your team. Then, I’ll provide an example legal collection workflow so you can see the principles in practice.

Note: The content of this article also appears in DISCO’s Data Collection Playbook. Download the full ebook here.

Identifying data for collection

First things first: Processing the entire collection can be costly and time consuming, but it’s critical that all potentially relevant data sources are collected. How do you thread this needle?

Even seemingly straightforward data collections carry certain risks. Failing to correctly identify relevant data can result in under-collection, a process that can result in severe legal and monetary repercussions, as well as impact your ability to defend your position. 

On the flip side, without a collection strategy, you may wind up with the over-collection of irrelevant materials, resulting in additional risks and costs. 

Ask the following questions before you begin a collection to ensure you’re collecting the right data:

1. Are there shared resources that are likely to contain relevant materials?

For instance: Did the relevant employees use a file share or email distribution list for communications? 

If so, consider prioritizing the collection and processing of these shared resources before collecting individual hard drives or email accounts – these are likely to contain highly duplicative data. 

Need to collect Slack data? Read our guide 👀

2. Do you have a well-defined date range you are able to limit the collection to? 

Many email archiving and collection tools allow you to filter the email and attachments based on a specified date range. Those tools may also deduplicate data at collection time to reduce the volume you need to process.

3. What metadata is required for the litigation process? 

Be aware of how applications handle metadata as documents are collected. Is there defensible documentation and validation necessary to meet your discovery requirements?

Native searching and filtering may not be comprehensive. For example, if you’re searching a set of documents that include non-OCR'd PDFs, the text will ‌not be searched and thereby excluded, whether those documents are relevant or not. 

A good rule of thumb is to collect broadly with limited filters, and then utilize defensible ediscovery software to cull down further.

Identifying the right collection tools and capabilities

Every data source you collect is subject to different considerations. Some useful features of a toolset you might adopt for collections include the capabilities to:

  • Filter
  • Preview
  • Search, and/or
  • Visualize previously held data

These capabilities should be applicable across a broad variety of data sources and data types. Consider, for example, whether your tool allows you to examine data from Slack or mobile devices as easily as it allows you to examine emails.

💡The ability to evaluate the relevance of previously held data prior to collection is essential to your collection process.

To identify the optimal technological solution for your case and team, ask:

1. Does the tool collect the necessary metadata?  

This includes, but is not limited to, the document’s creation date, last modification, author, and last person to modify.

2. Does it allow you to filter and search by key pieces of the metadata prior to collecting the data?

This includes, but is not limited to, such as date range, file type, sender and recipients.

3. Does the tool export the data in a format that can be processed in an ediscovery database without transformation?

While many technologies can collect data from different cloud and local devices, it is possible you may need to engage with experts for systems that don’t have out-of-the box solutions. 

Creating your collection plan

Next, follow these steps to create the optimal data collection plan.

Estimate the total volume of data

The number of documents in a single collection may vary widely. 

For instance, loose files like spreadsheets and documents may have about 6k-10k documents per GB – whereas email accounts with few attachments could have up to 50k messages per GB

By estimating the total volume of data to collect, you can also estimate the time needed to process, search, review and produce the results. 

Establish your timeline

The amount of time that you have to perform a collection will depend on court-ordered or agreed-upon deadlines

Work backwards from your initial product deadline to ensure you complete your collections in a timely manner. 

Ask these questions to help establish your timeline:

1. How much data do you anticipate the collection will contain? Can you estimate based on historical collections?

2. Do you need to prioritize certain custodians or data sources to be processed, reviewed and produced by any specific deadlines

3. How long do you anticipate the discovery period will last? 

Create the collection plan

Create a plan to document the key custodian information, data types and filter criteria. Refer to previous matters (if available) to help prepare the timeline.  

Not sure how to start? Download our printable tracker – or, get the playbook for a link to a spreadsheet you can copy and modify.

Download now

Ensure defensibility of your collection

A critical component of your collection process is its defensibility, or your ability to show that the data you collected has not been altered in any way throughout the litigation lifecycle. 

Should the judge and/or opposing party question the legitimacy of a collection, you must be able to prove that the data you reviewed and ultimately presented is as it existed in the ordinary course of business. 

Common ways to establish defensibility include:

Maintain and document chains of custody 

Keep track of the possession, movement, transfer, and location of data from the time it is identified as evidence to the time it is submitted.

Validate collections with document hashing 

This is the process of giving files a unique digital identifier in the form of a hash that changes if a file’s contents are changed. By comparing hash values before and after collection, you can validate that the documents collected were not altered during the collection process. 

Note: Document hashing is most useful for files that exist on physical hardware, and may be provided in a log for documents collected from cloud data stores. 

Keep comprehensive audit trails

In addition to a chain of custody, an audit trail will contain information that shows: 

  • When a collection took place
  • Any filtering applied at the time of collection
  • The volume of data collected
  • Who initiated the collection

Document the process

When preparing for a collection, ensure you have documented the process and location of maintaining the chain of custody and audit trail. 

Next: Choose your collection methodology ➡️

Example collection workflow

Here’s what this process looks like in action.

Collection creation

After recognizing the need to begin a collection, assign someone to not only execute the collection, but also to document the steps taken. 

Ensure this person understands the general context and deadline for this collection.

Note: Some legal hold solutions allow you to create a collection from within the platform, creating a single hold-to-collect interface. This can make collection creation, assignment, and documentation more efficient.

Source and custodian identification

Next, identify all sources and custodians from which to collect data. Ensure you’re using a legal hold and collection software that integrates directly into your company’s data ecosystem, so you can see all possible sources of collection from a single interface. 

💡Pro tip: Make sure you have the necessary permissions to access and copy that data, not just view it! Depending on the collection method and data source, you may need to do some additional liaising with administrators. 

Parameter identification

Each combination of source and custodian has the potential to produce massive amounts of data. 

As such, you’ll need to consider parameters for data collection such as key dates, type of document needed, and export format. The format of the source data (e.g., email vs. chat message) and your team’s technological capabilities will determine your ability to collect based on these parameters. 

After identifying key parameters, you are ready to begin the collection. 

💡Pro tip: Certain data types, such as chat messages, are inherently hard to filter, preview, and export in an easily reviewable format. If you are repeatedly collecting from sources that include chat messages, ensure that you have the capabilities to easily and efficiently review those messages.

Executing and defending the collection

Finally, you must gather the identified data into a secure location. 

Collect and stage the data to ensure that all data is organized and validated before sending on for processing. (This isn’t required, but it is a best practice.) 

Communicate with your stakeholders! Depending on the variety and volume of data, a collection may take many hours or days. You or the team member(s) managing the collection process should prioritize the key information and communicate the collection status frequently to the case team.  

Watch closely: Whoever is in charge of the collection should closely monitor the process in case an error occurs. Many modern systems allow users to set up automated alerts when a collection is completed or an error occurs. 

Subscribe to the blog
Dave Hendershott

Dave Hendershott is the Director of Forensics at DISCO, where he leads the forensic department via high-level project expertise and team management, including developing the strategy and execution of DISCO's forensic offerings. Dave has more than 20 years of experience in computer forensics and 1,500+ hours of forensics and technical training. His investigations have ranged from homicides to intellectual property matters, and he's testified 20+ times in support of digital forensics findings. He brings his deep passion for computer forensics to every engagement.