Ediscovery Data Sources: How to Work with Collaborative Data

Emerging Data Sources

4 Min Read

By:

Kristin Zmrhal

Posted:

February 7, 2025

Table of Contents

⚡️ 1-Minute DISCO Download

Today’s working world, where ediscovery technologists are regularly tasked with collecting data from collaborative data sources – like Slack, SMS, chat in Zoom, chat in Google docs, and so on – is a far cry from when my ediscovery career started out twenty years ago.

On my first client site, phone calls and in person meetings were far more common. In fact, we had to “refresh” our Lotus Notes mailbox every hour to check to see if we had received any messages.

By contrast, my personal experience now – constantly trying to remember where I saw a note I need to respond to (was it a DM? a channel? an email?) – is not unique.

Collaborative data sources: By the numbers

In fact, according a study from Spiceworks Ziff Davis,

51% of end users prefer real-time business chat apps (e.g., Slack, Microsoft Teams) over email for internal communications.
Analog voice usage continues to fall year-over-year, dropping from a 52% adoption rate in 2019 to 43% over a three-year period.
Following rapid adoption increases in 2020, the usage growth of web conferencing apps and business chat apps has leveled off.
Most companies (51%) now prefer providers that offer all-encompassing communications solutions.

According to a recent blog post from Slack, 80% of Fortune 100 companies, including industry titans like Airbnb, Time, and Zendesk, are now relying on its Slack Connect platform to connect their teams.

And so, as employees change the way they communicate, it means that the evidence we collect for discovery is changing.

As you embark on negotiating your next ESI protocol, I’ve outlined a few key questions to ask yourself, contextualized in the scenario below.

5 considerations for collaborative ediscovery data sources

Consider this scenario. Your client has received a complaint, and must begin the discovery process. They have identified two key custodians they will send a legal hold notice to and collect documents from, Joe and Jane.

They have shared documents across multiple data collaboration tools – the company uses Gmail, Google Docs, Zoom, Slack, and Dropbox.

Below is an example of their documents in Google Drive.

An example of 2 different collections of files including powerpoint, word documents ,excel sheets, and files

1. Identifying custodians for collaborative ESI data sources

So, what is a custodian? And what files do you collect in this scenario?

Are Joe and Jane custodians of files that are merely shared with them?

What if those files are created by someone that is not a custodian in the matter?

Do you have the capabilities in place to ensure that the files that Joe and Jane share are not duplicated?

💡Pro tip: Define your collection strategy based on the common uses and organization of documents in your client’s shared file systems.

2. ESI data from real-time communications

The key channels Joe and Jane use to discuss business are called #engineering and #projectabc. They also both belong to dozens more channels.

So what ESI data do you collect? Any channel that Joe and Jane are members of? And do you collect all of the direct messages and multi-part direct messages even though they’re likely not relevant?

💡Pro tip: Make sure you understand the key messages and topics you’re looking for. It may be worthwhile to collect any channels that your custodians are in, and filter the irrelevant content out in a more robust platform once the messages are loaded.

💡Pro tip: Often the collected format of real-time communications needs to be transformed so attorneys can search, review and produce the messages. Be sure to work with the ediscovery technologists to ensure that the content loaded for review will meet your production obligations.

3. Collecting collaborative metadata for ediscovery

As you collect the documents from Slack and Drive, you’ll also notice that there’s additional information that can help you prioritize documents for review and production.

For instance, ‌channel names, ‌collaborators and participants are important for identifying potentially key or privileged materials. Note: This information is not automatically available when you load the documents.

💡Pro tip: Make sure your ediscovery software overlays the necessary information in a way that you can search, filter and highlight the content.

4. Working with multiple versions of collaborative docs

Modern collaboration systems also store many versions of documents automatically.

This “auto-save” feature is a blessing for those of us who remember losing critical information because we didn’t save a version of the document that we were working on. However, these versions can be a nightmare to decipher in a review.

Collecting every single version will result in thousands more documents to review and produce, often versions that don’t have any substantive changes. These versions cannot be deduplicated with common hashing techniques, and even grouping the versions together for review will be costly – attorneys will have to spend time determining if each version is relevant or irrelevant.

To mitigate the risk of skyrocketing processing and hosting fees, determine a strategy to collect the documents for the relevant time period.

💡Pro tip: Most collaborative data systems allow you to collect documents as they existed on a specific date (perhaps the date of the complaint, the legal hold or date of collection). Choose wisely and limit the collection of too many versions at once.

5. Navigating cloud attachments for ediscovery data

One of the best parts of working with modern data collaboration tools is their interconnectivity.

I can create a doc in Google Docs, then add a colleague for comments and they’ll get a Slack message notifying them of the comment. Instead of sending around the same document named v1, v1-kzedits, v2-k2-edits, and so on, I can link the document to a message I send. And while I may edit the document after sending, my colleague will see the most recent changes without having to reload.

Once again, great for productivity, challenging for ediscovery data collection.

Both Google and Microsoft have capabilities that allow you to attach links from cloud storage (like Google Drive and OneDrive). It wasn’t until recently that they also provided collection capabilities for getting those documents when emails are collected.

The same is not true of other data sources, so be aware of what is and isn’t included in a collection. Even if the cloud attachments are collected, they are not true parents/children, and additional consideration must be made when ingesting, reviewing and producing these files.

Ediscovery data collection from collaborative data sources with DISCO

Watching how hybrid work and modern applications have changed over the course of the past decade is so cool – and daunting for ediscovery technologists.

That’s why DISCO’s designed to make collection easier.

DISCO Hold’s collection capabilities enable legal teams to seamlessly identify, preserve, collect, and promote data from sources like Slack, Box, andfor review, including for litigation, investigations, and compliance. With collection capabilities, DISCO Hold enables you to achieve faster time to evidence, reduce the cost of duplicating redundant data in third-party archives, and reduce the overall cost of litigation and review.

See what we can do for you: Request a demo.

Kristin Zmrhal

Vice President, Strategy

Kristin Zmrhal has spent over twenty years working in the legal technology industry as a consultant, advisor, project/program manager, and technologist. At DISCO, Kristin drives product strategy and innovation, with a specific focus on modernizing and improving the in-house dispute resolution process through technology. Prior to her work at DISCO, Kristin built and led Google’s Ediscovery Project Management & Operations team in Silicon Valley. Before Google, she spent many years as a consultant for several Fortune 500 companies and AMLaw 200 firms.

‍

DISCO An Agentic AI Workflow for Litigation 2026

Ready to get started with agentic AI for litigation? Through concrete, step-by-step examples, modeled around an investigation of the publically available Enron data set, we will show you exactly how to use an AI agent to build dual timelines, expose production gaps, and draft precise factual analyses — in short, how to use an agentic AI workflow from the initial pleading to pre-trial preparation.

View more resources

More industry trends and DISCO updates

Emerging Data Sources

July 23, 2026

Livestream Ediscovery: The Complete Guide

Learn how to preserve and collect livestream ESI, navigate the unique challenges live video presents, and apply best practices for defensible ediscovery.

Emerging Data Sources

June 25, 2026

Ediscovery for ChatGPT and LLMs: The Complete Guide

Learn how to identify, preserve, collect, and review ChatGPT and LLM interaction data as ESI in this guide for legal teams navigating this emerging ediscovery challenge.

Emerging Data Sources

March 27, 2026

Trend Watch: How AI Hallucinations Are Reshaping Legal

Track the trends in legal decisions in cases involving AI hallucinations, including court sanctions for fabricated citations and how to build a verification workflow.