Back to Blog Posts

mBERT: How Multilingual Predictive Tagging Powers Cross-Border Litigation

Product Spotlight
15 Min Read
By: 
Matthew Prendergast
Posted: 
August 22, 2024
social link
social link
social link

https://www.csdisco.com/blog/mbert-multilingual-predictive-tagging

This article will explain how implementing a system that combines multilingual AI-predictive tagging and AI-generated summaries allows you to organize, prioritize, and effectively identify critical evidence that may otherwise remain lost behind language barriers in multilingual, cross-border litigation.

The challenges inherent in multilingual, cross-border litigation

Cross-border litigation is on the rise. This comes as no surprise, considering the world is more interconnected now than it has ever been. Unfortunately, this increase in globalization brings with it a host of challenges, most notably – as, if you’re reading this article, you’ve certainly experienced – language barriers reducing ediscovery to an agonizing crawl. 

Legal professionals have worked around these language obstacles for years, but it’s likely your tried-and-true methods simply are not efficient or cost-effective enough to support the rise in the number and magnitude of cross-border legal disputes. 

This means it’s time to implement smarter processes powered by smarter technology. 

What’s not working: Hiring for language skills and mass translation

Typical approaches to language barriers in cross-border litigation center around people and project management. 

To date, the easiest solution to address this issue has been to employ lawyers with the relevant language skills – or to translate all the documents in a database before review. 

These solutions come with their own problems.

First, hiring for language skills puts the onus on law firms to prioritize multilingual ability over relevant legal experience – especially when you hit a crunch and need access to those language skills ASAP. 

Second, the pool of lawyers with the necessary language skills and legal expertise required for these cases is small. And the matters can be large

For complex matters in particular, expecting a single person, or even a small team, to review hundreds of gigabytes of multilingual data can be unrealistic, both as a matter of efficiency and scale.

Planning to simply translate all documents in a database ahead of review creates similar pitfalls. Asking lawyers to expend valuable time during ediscovery translating documents is a tall order, especially there is an uncertainty that all or even some of these documents will be relevant.

Why is generative AI important for cross-border disputes?

The advent of artificial intelligence, especially generative AI, provides attorneys a powerful toolset for effectively adapting to the growing globalization in the legal world. 

For example, DISCO’s generative AI platform Cecilia is able to generate quick and succinct summaries of documents, including those in foreign languages. 

But how can you identify relevant docs in foreign languages – rapidly and accurately – in the first place?

The behind-the-scenes hero: multilingual predictive tagging

Multilingual predictive tagging drastically simplifies the searching, sorting, and management of your data, ensuring no time is lost hunting for evidence – and that your team doesn’t miss important documents. 

DISCO Ediscovery employs an AI model ‌that automatically assigns tag predictions across a database. This allows lawyers and review teams to efficiently sort and search through all their foreign language data – and develop cogent case strategies by using AI-generated summaries for faster time-to-evidence when building their cases.  

The benefits of multilingual tagging

DISCO’s multilingual AI-predictive tagging platform (mBERT) is a large language model trained on data sources from a hundred different languages. 

As your case team conducts its initial reviews of data sets, mBERT learns the patterns behind your tagging applied to English-only sources and applies those learnings to foreign language documents. In short, as you teach DISCO what English-language content you care about, it will tag and return to you the critical evidence that otherwise might have been lost behind a language barrier.

Multilingual tagging speeds up complex litigation

The ability to classify and sort millions of documents – in a fraction of the time it ordinarily takes for manual processing – allows teams to find and drill down into the most critical documents for their case.  

Note: Even with the advent and increasing sophistication of AI processing, legal teams must always control the review process. Live by the motto, “Trust, but verify.” The AI should be the support for the human attorney, rather than the decision-maker.

Therefore, the system doesn’t tell human legal professionals how to run a review or what tagging to employ. Instead, mBERT stays in the background like a legal assistant, learning how to predict the lawyer’s tagging behavior. 

Multilingual tagging balances ediscovery

Not only will DISCO AI identify the most relevant documents regardless of language, but it will also serve them up front and center using AI-prioritized review

For example: Let’s say you’re representing a company trying to expand its production in Brazil. But there are questions about adherence to the country’s strict deforestation and land use regulations. 

DISCO’s AI-prioritized review – in tandem with its multilingual predictive tagging model – ensures that each new batch of remaining unreviewed data includes the English and Portugese documents with the highest possible relevancy and importance scores for a specific issue.

By consistently providing your team with the most relevant documents first, the time sink for translating hundreds of potentially unimportant documents disappears. And with it, you reduce review time and review cost, and increase your value in the eyes of clients. 

The benefits of foreign language generative AI document summaries

To further streamline your cross-border litigation with generative AI, our Cecilia platform allows you to instantly create concise AI-generated summaries of long or critical documents, in multiple languages, directly within the document viewer. 

This capability drastically reduces time to evidence by providing quick, informative, and reliable recaps of complex documents and enhances review quality by cross-verifying tags with key document summaries.

Tackle cross-border litigation with DISCO

If you found this guide helpful – and if the best practices described resonate with you – let’s talk.

DISCO Ediscovery’s unique combination of multilingual AI-predictive tagging, AI-prioritized review, and Cecilia doc summaries delivers the solution to an increasingly prevalent, burdensome, and unavoidable issue in the legal world. 

Each of our AI-powered tools feeds into and is a complement to the next, giving lawyers focused tools to navigate the once-cumbersome task of managing vast amounts of multilingual data and documents:

  • Multilingual AI-predictive tagging accurately classifies foreign language documents based on the behavior of the case teams spearheading the review.
  • AI-prioritized review delivers the most relevant documents at the front of each review batch
  • Cecilia AI-generated doc summaries offer succinct and reliable overviews of each document, allowing lawyers quickly to determine which are the most critical documents to translate first. 

Let us make your multilingual litigation easier. Get a demo.

Generative AI for Litigation: What You Really Need to Know

How to use generative AI for investigation, doc review, drafting, and billing.

View more resources
Table of Contents
0%
100%