Back to Blog Posts

What Did You Ask AI? A Guide to Defensible GenAI Data Preservation

Emerging Data Sources
4 Min Read
By: 
Rian Kennedy
Posted: 
February 4, 2026
social link
social link
social link

https://www.csdisco.com/blog/defensible-genai-data-preservation-guide

avatar image 3avatar image 1avatar image 2
Get the very best in litigation technology and expert partnership
Talk to sales
⚡️ 1-Minute DISCO Download

GenAI data preservation is the legal requirement to save AI prompts, responses, and metadata as discoverable ESI. Because these interactions reveal user intent and work product, they must be treated with the same defensibility as email or Slack.

💬 Key Quote: "Default deletion settings will not outweigh preservation obligations – the pace of adoption is simply too fast to wait."

🌊 Dive Deeper: Check out the section "How Major Commercial LLMs Handle Preservation" for a breakdown of how Google Gemini, Microsoft Copilot, and ChatGPT deal with preservation obligations. 

Who isn’t using generative AI (GenAI) in their workplace these days? Tools like Microsoft Copilot, Google Gemini, Anthropic Claude, and OpenAI’s ChatGPT are now core to organizational productivity. But with this broad adoption comes a significant legal reality: GenAI data is discoverable. 

The Federal Rules of Civil Procedure treat GenAI interactions as electronically stored information (ESI), just like email or documents. The truth is, these tools weren't built with a "duty to preserve" in mind – but case law is evolving rapidly to ensure they are treated with the same rigor as any other ESI.

💡Note: Using GenAI tools to accelerate document review — such as identifying responsiveness or privilege during litigation — is generally protected. The internal mechanics of the AI-assisted review process are typically considered attorney work product, though the results of the review must still be defensible. 

This blog will explore how AI is being used in the workplace, thus creating discoverable content that needs to be defensibly preserved.

Three key types of discoverable GenAI data

To build a defensible strategy, you must focus on preserving three specific categories of data: prompts, responses, and metadata.

Prompts

These are the inputs or questions users ask, and they're critical because they reveal the user's intent. Think about an engineer using "vibe coding" to generate software or a marketer outlining a campaign — the prompt is the key evidence.

Responses

This is the AI-generated output (text, code, or documents). The challenge for legal teams is figuring out where this data lives once it's created and scattered across different platforms, especially outputs that are ephemeral.

Metadata

This is the "behind-the-scenes" data: time stamps, user IDs, session IDs, and audit logs. This information is vital for validating the data and establishing a chain of custody just like in any other ediscovery effort. In fact, metadata is often the only way to prove which version of an LLM was used, which is absolutely critical for reproducibility in litigation.

How major commercial LLMs handle litigation preservation obligations

Recent court rulings, such as the order in In re: OpenAI requiring the preservation and segregation of output logs, signal that litigation preservation obligations override default deletion settings. But how do the major platforms handle this evolving requirement? Each one is different and often depends on your licensing tier:

Google Gemini

In Google Gemini, interactions (prompts and responses) are stored in a user's account history and governed through Google Vault. This is generally considered a strong, defensible, in-place preservation solution for enterprise users, as the hold is applied via the familiar Google Workspace ediscovery toolset.

Microsoft Copilot for Microsoft 365

In Microsoft Copilot, user interactions are treated as electronic communication and stored in place in the user’s Exchange mailbox. This means IT and legal can leverage established Microsoft Purview ediscovery workflows. It's crucial to know, though, that these interactions are often stored in hidden mailbox folders, so standard searches may miss them without a specific Purview configuration.

OpenAI’s ChatGPT

For Enterprise customers, OpenAI’s ChatGPT has a Compliance API to help manage retention and exports. However, for lower tiers, manual holds or collect to preserve workflows are often needed. Preservation hinges on a legal hold overriding the user's ability to delete their chat history, and courts have stepped in to require the preservation of logs in litigation contexts.

Anthropic's Claude

Enterprise usage of Claude typically allows for configurable retention and export, but does not support an integration into a broader ESI suite like with Microsoft or Google. Administrators need to be keenly aware of these policy settings and export mechanisms.

GenAI data preservation: Risks and challenges

Legal teams are still wrestling with significant challenges. The biggest "blind spot" is often the consumer versions of these tools, which lack administrative oversight and have short default retention windows. 

Another massive question mark involves Agentic AI — autonomous decision-making bots. As these bots are utilized more and more, are we creating new "custodians" that need to receive a legal hold notice? The definition of a custodian is shifting in real time. 

Also, remember, preservation is more than just "keeping data around." It means freezing deletion policies, exporting data into reviewable formats, and maintaining audit logs that demonstrate defensibility.

📚 Related reading: Legal Hold Best Practices: A Process Guide 

Best practices for GenAI data preservation

Here are the best practices for legal teams to address these gaps:

  1. Inventory your tools: Map out every GenAI tool in use across the organization, sanctioned or not. Understand where the interactions are stored and where potential gaps may reside.
  2. Update hold notices: Update your litigation hold policies and playbooks to explicitly reference GenAI prompts, outputs, and metadata.
  3. Collaborate with IT: Where enterprise controls exist, use them. This means coordinating with IT to ensure auto-deletion is disabled for relevant users and that enterprise-level audit logging is active.
  4. Train employees: Policy without education won't cut it. Employees need training on when AI use creates a business record, how to handle sensitive matters, and when consumer tools are inappropriate.

This is exactly where modern legal hold technology becomes essential to help automate some of these processes. Once templates are updated for the notices, a system like DISCO Hold is critical for maintaining a clear audit trail of ensuring custodians were notified and acknowledged the litigation hold and its requirements. 

Did you know? DISCO Hold serves as the system of record, complementing native controls by bringing together preserved interactions and defensible documentation in one place. 

GenAI data preservation: The proactive path forward

Preserving GenAI prompts, responses, and metadata is no longer a theoretical exercise. Emerging case law makes it abundantly clear that default deletion settings will not outweigh preservation obligations – the pace of adoption is simply too fast to wait for perfect guidance. 

The path forward is proactive. Legal and IT teams must update policies, understand native platform capabilities, and implement workflows that account for how GenAI is actually used in their organization. Tools like DISCO Hold play a critical role in bringing order and defensibility to that effort, allowing organizations to embrace the benefits of AI while taking preservation seriously—before the courts force their hand.

Ready to start collecting GenAI data defensibly? Talk to our team about DISCO Hold and our collections capabilities.

Rian Kennedy
Director, Legal Hold Sales

Rian Kennedy is Director of Legal Hold Sales at DISCO. He has been an industry veteran in the legal and information governance space for close to 20 years, advising his clients in selecting the right technologies to streamline legal and business workflows, minimize risk, and ensure client outcomes are met.

avatar image 3avatar image 1avatar image 2
Get the very best in litigation technology and expert partnership
Talk to sales
Practice Area Overview: DISCO for Class Action Litigation

How DISCO helps firms navigate class certification discovery, large data sets, and numerous custodians in class action cases.

View more resources
0%
100%