Back to Blog Posts

Mastering Social Media Data for Ediscovery

Emerging Data Sources
4 Min Read
By: 
Julio Ruelas
Posted: 
August 30, 2024
social link
social link
social link

https://www.csdisco.com/blog/mastering-social-media-data-for-ediscovery

Over 5 billion people participate in social media globally – and new platforms crop up every year. The scale and speed of this digital transformation greatly increases the potentially relevant data for a case team to investigate when conducting ediscovery.

However, the right skills and tech stack can help make handling new data types a breeze, speeding up your time to important information and evidence. 

As we continue to conduct business in this increasingly digital world, follow these steps to master social media data for ediscovery. 

Unlock the complete guide of best practices for handling complex data types in ediscovery.

Evidence goes viral

The vast amount of social media data falls squarely within the scope of electronically stored information (ESI) that is potentially discoverable under the Federal Rules of Civil Procedure.

The official Advisory Committee notes accompanying the amended 37(e) in the federal rules even go so far as to call out this data source explicitly: “It is important that counsel become familiar with their clients’ information systems and digital data — including social media — to address these issues.” 

Still, there is confusion as to which aspects of social media data are discoverable, and what the most defensible process is for each platform. 

Below, we’ll cover the five largest platforms: Facebook, X (formerly Twitter), YouTube, TikTok, and LinkedIn.

Facebook

Founded in 2004, Facebook became the largest social network in the world in 2021 with nearly 3 billion users, half of whom were logging on daily. Its largest audience is users aged 25-34. Facebook’s parent company, Meta, also owns Instagram, Threads, and WhatsApp. 

What it is: A social network that allows users to post updates, photographs, videos, classified ads, and other content. Users can “follow” one another, and interact with one another in public or private comments or messages and “likes.” Facebook also allows livestreams, and permits the monetization of various content – such as videos, ads, and Facebook Marketplace commerce.

How many users: Facebook has over 2 billion daily active users in the U.S. 

Challenges: Authentication of data, collection of non-public data from Facebook, continually evolving laws regarding use of Facebook data across the globe

Facebook in court 

Facebook, now known as Meta Platforms, has been involved in a wide range of legal cases, involving privacy, antitrust, content moderation, child endangerment, and use of images without permission

Aside from Facebook’s (and Meta’s) own assorted legal embroilments, content posted on Facebook has, on many occasions, been used as evidence in court. Facebook posts and messages have been used in court to show evidence of bullying and stalking, unfit parenting, hidden income in divorce proceedings, and wrongful termination, among others. 

However, not all Facebook-sourced content is admissible, and a number of challenges may have to be cleared, such as authentication of the content, or allegations of hearsay.  

Considerations and best practices: Facebook

Be aware of different record types

Several types of records from Facebook can be used as evidence in court, depending on the case and how the evidence is collected and presented. These may include:

  • Public posts. These can include status updates, photos, videos (including livestreams), audio recordings, or comments (one’s own, or on other individuals’ content), which can be used to indicate the poster’s sentiments or intentions, actions, or even associations – such as membership in a specific club or group. If collected via screenshot, such evidence would need to be authenticated.
  • Private messages. These may be direct messages or chat logs, and generally require a subpoena or warrant to be accessed unless willingly provided by a participant in the conversation.  
  • Account information. This may include friend lists (useful to establish connections between individuals of interest), Facebook group memberships, and event attendance (or intent to attend – for example, accepting an invitation).
  • Metadata. Metadata may include time stamps on public or private posts, location data, advertising history, and even IP addresses. A court order is generally required for Facebook to disclose such information. 

It has been widely discussed that Facebook habitually collects vast volumes and types of data from its users, and may be compelled to share that data with government or law-enforcement agencies

If you expect that Facebook-generated content will become relevant to your matters, it is important to work with partners who understand the details of collecting, parsing, and searching such data.

X (formerly Twitter)

Over the last several years, X has regained prominence as a major avenue of social, political, and interpersonal discussion. 

Perhaps more than any other social media application, the potential for legally actionable content is only growing on this platform. Other microblogging platforms have sprouted in recent years, including Threads (an offshoot of Instagram), Bluesky, and politically influenced alternatives such as Truth Social.  

What it is: Social “microblogging” platform (as of the time of this writing in June 2024, limited to messages of 280 characters) 

How many users: 

  • X (Twitter) has 368 million daily active users
  • By comparison, threads (Instagram) has 130 million monthly active users 

Challenges: Short lifecycle, specific requests required

X (Twitter) in court 

X has a lengthy history in court, from stalking cases to libel and slander cases to inciting the London riots in 2011

In one high-profile case related to Wikileaks and the 2016 presidential election, X (at the time referred to as Twitter) sought to subvert the Rule 45 subpoena based upon First Amendment rights to anonymous speech. In this case, the court ruled against X because of the narrowness of the request – which excluded personal communication and demonstrated material relevance of the user’s identity – and the fact that only X itself could directly provide the information. 

More recently, X lost an appeal of a ruling that allowed special counsel Jack Smith to access records from former President Donald Trump's X account as part of his federal election interference probe – and the Supreme Court heard a different case wherein X was found non-liable for content posted by a terrorist organization.

Considerations and best practices: X 

Be aware of private info that requires a subpoena or court order 

While some material is publicly available, much will require either the cooperation of the account holder or, more challengingly, X itself.

Information not readily accessible to the public includes the following: 

  • Password 
  • Email address 
  • Phone number or address book (which helps X suggest users you know) 
  • Location information (where you’re posting from) 
  • System log data (mobile carrier, device and application IDs, IP address, browser, the referring domain, pages visited, and search terms) 
  • Specific posts (formerly “tweets”) set as private, direct messages and deleted posts

Per the company’s FAQ on legal requests

“Obtaining non-public information, such as an email address used to sign up for an account or IP login information, requires a valid legal process like a subpoena, court order, or other local legal process, depending on the country that issues the request.

Requests for the contents of communications (e.g., posts, Direct Messages, media) require a valid search warrant or equivalent to be properly served on the correct X corporate entity. Law enforcement or government agents must demonstrate a higher burden of proof before a judge will authorize such a request.

For additional information on the types of legal process required to obtain specific types of account information please see the “Types of Legal Process” section in our transparency report and X’s Guidelines for Law Enforcement.” 

Act swiftly 

The most recent 3,200 posts are visible in a timeline, and X’s advanced search function can drill down even deeper based on timing, user, and subject matter. 

If a user deletes an incriminating post, the window of time to recover it is merely 30 days

Be specific 

Requests for data from X must be sufficiently narrow and specific for the social media behemoth to comply. If either of these parameters is not met, X is not afraid to fight back. 

In general, the best practice with regard to X-related requests should be to ensure your request is limited to material that is clearly relevant to the case, time-bound, and not readily accessible from any other data source.

It is also important to include the following data points in any request: 

  • Username 
  • URL of the X profile 
  • Date range(s) of the requested information 
  • Details about the specific information being requested and relevance to the case 
  • Valid email address for X to acknowledge receipt of the legal request

YouTube

Every minute, over 500 hours of user- and enterprise-generated content is uploaded to YouTube, which generates over $31 billion in ad revenue a year. 

YouTube videos and real-time livestreams span everything from the mundane to the catastrophic, and the site’s traffic skyrocketed during the COVID-19 pandemic. 

As with other social media tools, there are numerous similar platforms, such as Vimeo, Vevo, DTube, etc.

What it is: Video hosting platform 

How many users: YouTube has 2.7 billion monthly active users. 

(Fun fact: YouTube is the second most visited website in the world, behind parent company Google.) 

Challenges: Obtaining metadata, establishing authenticity

Unlike other streaming platforms, YouTube relies on users to create content, and only searches for policy violations after the fact

This loose approach to regulation of content has certainly exposed the organization to criticism and misuse of the platform. It also means that people by the millions are uploading potentially relevant video content every day

Previously existing barriers to including video evidence in a matter (namely, cost and the complexity of managing video data in review) have been greatly reduced, especially with the addition of user-friendly features such as auto-transcription and searchable time-syncing. 

This wealth of potentially relevant data is increasingly prominent as a result. 

YouTube in Court 

YouTube has faced myriad critiques ranging from copyright infringement and peddling conspiracy theories to darker things like violence and sexual exploitation of adults and minors

Legions of content moderators are bombarded by questionable material every day and strive to pull down violators. Some former moderators have sued for emotional distress

Considerations and best practices: YouTube 

It is critically important to include user-generated content on platforms like Youtube, Vimeo, and others in your ESI scoping, but, as with any complex media, certain key steps are necessary. 

Whether you are looking to use a video as character evidence or as direct evidence of an alleged event, the content must meet the threshold of admissibility for relevance and authenticity. 

But even when the video has been admitted as evidence, there are some additional factors to consider in your digital evidence analysis: 

Take care to preserve metadata

Even if the video is still actively being hosted on the video sharing platform, use appropriate forensic collection technology to ensure that all relevant account metadata is preserved along with the video itself. 

From an authentication standpoint, information like date and time of upload, account information, and even IP address may be germane to a case. 

Act swiftly

As with X, time is of the essence with YouTube requests. In the event the video in question was recently deleted, your counsel may be able to request a copy from YouTube directly, but these deleted files are unrecoverable after a period of a few weeks. 

Leverage AI for review

Historically, reviewing video evidence was time- and cost-prohibitive, because of the high cost of converting the media to a reviewable format, and the amount of billable time it would take to review tens or thousands of hours of video. 

Luckily, with today’s AI-powered tools like DISCO, every frame of audio or video content is transcribed and converted into a format that can be searched, categorized, and analyzed for words and phrases.

AI can make connections across thousands of hours of video that would have previously been impossible. 

Note: Transcriptions are only as good as the audio of the video. It’s still up to lawyers to validate that they’ve reviewed the relevant content. 

Enlist digital forensics experts to identify deepfakes

Deepfakes are AI-generated content portraying real people doing and saying things that did not actually take place. The quality of deepfakes is such that it is nearly indistinguishable from authentic video. 

Thankfully, digital forensic experts can identify certain things that are a dead giveaway that a video has been tampered with, including: 

  • Lens distortion 
  • Color filter array (CFA) artifacts 
  • Noise level and pattern anomalies 
  • Compression artifacts 
  • Editing artifacts

Although deepfakes are a relatively recent phenomenon, we can likely expect to see more statutes and case law emerge in the near future. 

The rapid evolution of AI technologies creates daunting challenges for the authentication and use of evidence in court. Already, a suit has been filed over AI use of a deceased performer’s voice, and a number of bills have been proposed to prevent malicious or inappropriate use of deepfakes. Care should be taken to authenticate any key photo or video evidence, lest it turn out to be the product of an AI tool like DALL-E or Sora.

TikTok

TikTok arrived in the U.S. in 2017 and has gained immense popularity in recent years. Its users are largely under 35, and videos range from dance challenges to insider tips on home inspections

What it is: A social media platform that allows users to create, edit, share, and discover short videos. The rise of TikTok has influenced a number of other platforms, including those which predate TikTok, to adopt similar functionality, such as Instagram Reels and YouTube Shorts. Tiktok – like Instagram and Facebook – also allows “live” feeds.

How many users: 

  • TikTok has over 1.5 billion daily active users in the U.S. 
  • By comparison, Instagram (short videos are called Reels) has 500 million daily active users 

Challenges: Legal woes, the complexity of extracting all metadata, a rapidly evolving feature set, holistic viewing within platform

TikTok in court 

Note: At the time of this writing, the future of TikTok’s availability in the United States is uncertain due to ongoing legal disputes.

To date, most litigation and scrutiny around TikTok has been about the platform itself (the app has been banned from government devices, and TikTok’s CEO was required to testify before Congress). 

However, user-generated content will almost certainly start to show up in court. Much like YouTube, TikTok videos may be used as evidence of individuals’ actions and whereabouts. 

Additionally, TikTok is increasingly used to share and spread information from all around the globe. It has a popular TikTok “Live” function, wherein creators can live-stream a video feed, and receive virtual “gifts” that can be converted to real-world currency. And the economic complications don’t stop there. TikTok is generating billions of dollars in advertising revenue, popular users frequently post sponsored content, and nearly five million American businesses have a presence on the app, including some that make use of the TikTok Shop feature.

For legal practitioners, causes of actions involving TikTok could range from social media marketing liability to large-scale copyright infringement concerns. As users and corporate entities alike monetize the platform, concerns about individual likeness, song sampling, and unfair advertising practices could all birth regulatory scrutiny or large-scale litigation – especially considering the memetic nature of TikTok content, which rewards copycat behavior, and often utilizes shared filters, “trends,” and sounds (many of which come from other users’ content, or from copyrighted sources such as music or films). 

In this constantly evolving landscape, it is likely that an image, sound, or concept may be repurposed for advertising without the consent of the user who made it – and just as likely that a plaintiff in a class action claiming grievous injury will post a TikTok video dancing their heart out.

Considerations and best practices: TikTok 

TikTok, like many next-gen social media platforms and communication applications, is nothing like a traditional “document.” 

The public data collected from TikTok posts may include: 

  • The video or photograph(s) posted
  • Description text
  • Closed caption text (whether auto-generated or manually created)
  • Dynamic user interaction data, such as likes and comments
  • Filters and sounds used to create the post
  • Metadata about the user and the posting 

In addition, TikTok stores numerous data points that are not publicly visible. These can include: 

  • Profile and post views
  • Account and viewer activity analytics
  • Direct messages (DMs)
  • Saved live-streams (with accompanying data, such as chat logs and recorded “gifts” sent to the broadcaster)
  • Creator’s revenue from TikTok

Collecting from a platform such as TikTok can involve a number of legal considerations, such as the location of any implicated users, appropriate retrieval of the post data, and establishment of chain of custody. 

If you anticipate that a matter will involve the collection and review of TikTok data, it is crucial to engage with a partner who can not only provide a platform that will facilitate intelligible review of the relevant content, but who also has experience dealing with similar situations. 

LinkedIn

Launched in 2003, LinkedIn emerged as a platform for professional networking, allowing users to build online resumes and connect with potential employers and colleagues. 

What it is: Professional networking platform

How many users: 137 million active daily users 

Challenges: Obtaining metadata, disparate data types

LinkedIn was designed to function similarly to a digital resume, allowing users to showcase their work experience, skills, and accomplishments; however, it soon transcended its original purpose, and has begun serving as a space for industry news, thought leadership, and even virtual socializing. Many describe it as the “Facebook of business,” signifying its wide global reach and cultural significance. 

LinkedIn currently has over 900 million members in more than 200 countries and regions worldwide. The United States has the most members, with over 199 million, followed by India with 101 million.

LinkedIn in court 

Although LinkedIn has not faced as many legal actions as Facebook or other social media entities, it has been involved in court cases regarding data scraping, where the legal boundaries of accessing public profile information are debated. 

Content posted on LinkedIn – publicly, privately, or anywhere in between – can potentially be used as evidence in court, just like any other social media content. For instance, LinkedIn posts have been used to establish witness credibility (or lack thereof), or in employment disputes

Considerations and best practices: LinkedIn

Much like the other social networking platforms, LinkedIn hosts a number of data categories, including public content (posts, articles, comments, reactions), private content (chats, messages), account information, and metadata pertaining to all other types of content. 

Although for the most part LinkedIn activity is more likely to fall on the professional, public-facing side, the business-related nature of the platform means that LinkedIn content could easily prove to be valuable, or even dispositive, in a legal action. 

LinkedIn presents itself as a tool to grow one’s business, and more than 58 million companies use LinkedIn to recruit or advertise. It is certainly feasible that, in the event any of these companies were accused of malfeasance, LinkedIn content could be brought in as evidence. 

Thus far, LinkedIn content has been used in court cases far less than evidence collected from other platforms; however, it is always worth considering this possibility, and working with a partner who has experience dealing in similar scenarios. 

Overview: A Strategic Approach to Social Media

Understanding how your clients use these various platforms will enable you to construct a plan to manage the growing data volumes. 

Where do you look for social media data? 

The key to determining which technology to investigate is to understand how relevant custodians are communicating, and on which platforms

Understanding the nature of a case, plus if and how custodians are leveraging social media, helps determine the priority of discovery.  Plus, many businesses use these platforms for their communications, so this doesn’t apply only to individual employees.

What do you look for? 

Each social media data source contains a potentially voluminous amount of disparate data dating back to the inception of a user’s account. It is important to understand what your technology partner will include in their capture of such data and what metadata will or be included. 

User-generated social media ESI may include: 

  • Engagement data (for example, a user’s posts, likes, and comments) 
  • Direct messages 
  • Chat logs 
  • Friends or connections 
  • Profile 
  • Log-on and posting times 
  • Location data from photographs
  • Some deleted materials 

System-generated social media ESI may include: 

  • Proprietary unique identifier 
  • Item type 
  • Parent item/thread 
  • Recipients 
  • Author/poster 
  • Linked media 
  • IP addresses
  • Location data from IoB and IoT devices

How do you collect social media ESI? 

While it may be enticing to print out a screen capture of a public social media site, or even have the account owner press the “download your data” button offered by several platforms, it is important to remember that this will not necessarily include all the useful or relevant data, and may be limited to only public posts. 

Additionally, some social platforms limit what you are able to export based on the type of account a user maintains. Working with a forensic collection technology that specializes in social media collection will ensure you are able to gain access to the full scope of potentially relevant information. It is also important to work with a technology that can render ‌social media data into an easily reviewable format. 

When collecting social media data, keep in mind that the format of the collection is paramount to ease of review, and not all collections are created equally. Some third-party collections may not do a good job of presenting the information in an easily digestible and easily producible matter.

Good and consistent collection of data = easy review and production.

When can you use social media ESI? 

While there is ample precedent and case law to support the inclusion of social media ESI (even data that is private) in a discovery request, the requesting party still has an obligation to meet the requirements of FRCP 26(b) and demonstrate relevance to the case. 

And the bar for relevance, Federal Rule of Evidence 401 is far from high. Evidence is relevant if “it has any tendency to make a fact more or less probable than it would be without the evidence” and “the fact is of consequence in determining the action.” To ensure that this does not become a fishing expedition, the court will often limit subject matter and duration of admissible ESI. 

An additional area of concern with social ESI in particular is authentication – meaning, the account and material posted to it were actually generated by the custodian or named account owner. As with relevance, the bar is not terribly high. Federal Rule of Evidence 901 states that to establish authenticity, “the proponent must produce evidence sufficient to support a finding that the item is what the proponent claims it” and this done via presenting the “distinctive characteristics” of an account according to 901(b)(4). These characteristics may include account name, photos of the account owner, nicknames, IP address, specific topics, or slang. Keep in mind: this type of information is only available if the data was properly collected.

Ediscovery Expanded: Mastering Complex Data from Slack to Signal and Beyond

Now that you’ve mastered social media data, uncover the considerations and best practices for handling other complex data types in ediscovery, including:

  • Mobile data
  • Ephemeral messaging
  • Internet of things devices
  • Virtual conferencing data

Download the complete guide here.

And, if you’re ready to collect from collaborative data sources with DISCO, request a demo to see what we can do for you.

The Legal Hold Playbook

A guide to implementing self-service legal hold at your organization

View more resources
Table of Contents
0%
100%