Back to Blog Posts

Ediscovery in the Modern Age: Handling Collaborative Data Sources

Emerging Data Sources
4 Min Read
By: 
Kristin Zmrhal
Posted: 
March 29, 2024
social link
social link
social link

https://www.csdisco.com/blog/esi-collaborative-data-sources

Today’s working world, where ediscovery technologists are regularly tasked with collecting data from collaborative data sources – like Slack, SMS, chat in Zoom, chat in Google docs, and so on – is a far cry from when my ediscovery career started out twenty years ago. 

On my first client site, phone calls and in person meetings were far more common. In fact, we had to “refresh” our Lotus Notes mailbox every hour to check to see if we had received any messages. 

By contrast, my personal experience now – constantly trying to remember where I saw a note I need to respond to (was it a DM? a channel? an email?) – is not unique. 

Modern data sources: By the numbers

In fact, according a study from Spiceworks Ziff Davis,

  • 51% of end users prefer real-time business chat apps (e.g., Slack, Microsoft Teams) over email for internal communications.
  • Analog voice usage continues to fall year-over-year, dropping from a 52% adoption rate in 2019 to 43% over a three-year period.
  • Following rapid adoption increases in 2020, the usage growth of web conferencing apps and business chat apps has leveled off.
  • Most companies (51%) now prefer providers that offer all-encompassing communications solutions.

According to a recent blog post from Slack, 80% of Fortune 100 companies, including industry titans like Airbnb, Time, and Zendesk, are now relying on its Slack Connect platform to connect their teams.

And so, as employees change the way they communicate, it means that the evidence we collect for discovery is changing. 

As you embark on negotiating your next ESI protocol, I’ve outlined a few key questions to ask yourself, contextualized in the scenario below.

Five considerations for collaborative data sources in ediscovery

Consider this scenario. Your client has received a complaint, and must begin the discovery process. They have identified two key custodians they will send a legal hold notice to and collect documents from, Joe and Jane. 

They have shared documents across multiple sources – the company uses Gmail, Google Docs, Zoom, Slack, and Dropbox. 

Below is an example of their documents in Google drive. 

An example of 2 different collections of files including powerpoint, word documents ,excel sheets, and files

What is a custodian for a collaborative data source? 

So, what is a custodian? And what files do you collect in this scenario? 

Are Joe and Jane custodians of files that are merely shared with them? 

What if those files are created by someone that is not a custodian in the matter? 

Do you have the capabilities in place to ensure that the files that Joe and Jane share are not duplicated? 

💡Pro tip: Define your collection strategy based on the common uses and organization of documents in your client’s shared file systems. 

The ESI of real-time communications 

Suppose Joe and Jane also use Slack. The key channels they use to discuss business are called #engineering and #projectabc. They also both belong to dozens more channels. 

So what do you collect? Any channel that Joe and Jane are members of? And do you collect all of the direct messages and multi-part direct messages even though they’re likely not relevant? 

💡Pro tip: Make sure you understand the key messages and topics you’re looking for. It may be worthwhile to collect any channels that your custodians are in, and filter the irrelevant content out in a more robust platform once the messages are loaded. 

💡Pro tip: Often the collected format of real-time communications needs to be transformed so attorneys can search, review and produce the messages. Be sure to work with the ediscovery technologists to ensure that the content loaded for review will meet your production obligations. 

Collecting collaborative metadata for ediscovery

As you collect the documents from Slack and Drive, you’ll also notice that there’s additional information that can help you prioritize documents for review and production. 

For instance, ‌channel names, ‌collaborators and participants are important for identifying potentially key or privileged materials. Note: This information is not automatically available when you load the documents. 

💡Pro tip: Make sure your ediscovery provider overlays the necessary information in a way that you can search, filter and highlight the content. 

Working with multiple versions of collaborative docs

Modern collaboration systems also store many versions of documents automatically. 

This “auto-save” feature is a blessing for those of us who remember losing critical information because we didn’t save a version of the document that we were working on. However, these versions can be a nightmare to decipher in a review. 

Collecting every single version will result in thousands more documents to review and produce, often versions that don’t have any substantive changes. These versions cannot be deduplicated with common hashing techniques, and even grouping the versions together for review will be costly – attorneys will have to spend time determining if each version is relevant or irrelevant. 

To mitigate the risk of skyrocketing processing and hosting fees, determine a strategy to collect the documents for the relevant time period. 

💡Pro tip: Most collaborative data systems allow you to collect documents as they existed on a specific date (perhaps the date of the complaint, the legal hold or date of collection). Choose wisely and limit the collection of too many versions at once. 

Navigating cloud attachments for ediscovery

One of the best parts of working with modern tools is their interconnectivity. 

I can create a doc in Google Docs, then add a colleague for comments and they’ll get a Slack message notifying them of the comment. Instead of sending around the same document named v1, v1-kzedits, v2-k2-edits, and so on, I can link the document to a message I send. And while I may edit the document after sending, my colleague will see the most recent changes without having to reload. 

Once again, great for productivity, challenging for ediscovery.  

Both Google and Microsoft have capabilities that allow you to attach links from cloud storage (like Google Drive and OneDrive). It wasn’t until recently that they also provided collection capabilities for getting those documents when emails are collected. 

The same is not true of other data sources, so be aware of what is and isn’t included in a collection. Even if the cloud attachments are collected, they are not true parents/children, and additional consideration must be made when ingesting, reviewing and producing these files.

Collecting from collaborative data sources with DISCO

Watching how hybrid work and modern applications have changed over the course of the past decade is so cool – and daunting for ediscovery technologists. 

That’s why DISCO’s designed to make collection easier.

DISCO Hold’s collection capabilities enable legal teams to seamlessly identify, preserve, collect, and promote data from sources like Slack, Box, and Google Vault for review, including for litigation, investigations, and compliance. With collection capabilities, DISCO Hold enables you to achieve faster time to evidence, reduce the cost of duplicating redundant data in third-party archives, and reduce the overall cost of litgation and review.

See what we can do for you: Request a demo.

Kristin Zmrhal
Vice President of Product Strategy
How to Handle Complex Data Types in Ediscovery: The Expert Guide

Your complete guide to complex data types in ediscovery.

View more resources
Table of Contents
0%
100%