Deep Learning for Legal Technology (Part 2)

Back to Blog Posts

In my last post, I discussed the why and how of DISCO AI – to make legal professionals more efficient in ediscovery using deep learning technology – but there’s a bigger part of the how that went unmentioned. Like all predictive coding systems, DISCO AI relies on iterative learning. First, the reviewer codes some document, then the system makes predictions. The reviewer responds by coding the predicted documents, and so on until the reviewer is satisfied that the document review is complete. 

DISCO’s system is unique from other predictive coding solutions.

Interactive Learning

Whereas other systems foist a fixed process on the reviewer, we designed DISCO with the goal of helping the reviewer rather than imposing an immutable review structure. In other systems, the machine decides which documents will be reviewed, a process that is called active learning. For DISCO, the machine makes predictions but it is ultimately the reviewer who decides what to review. We call this setting interactive learning, and it flows from DISCO’s core belief that technology should augment and not dominate the lawyer’s workflow.

Continuous Learning

Central to interactive learning is the idea of a separation between the reviewer and the machine. The reviewer chooses documents to review based on his or her best judgement. The machine observes the reviewer’s decisions and updates its recommendations. Our name for this paradigm is asynchronous learning. It is asynchronous because the user’s workflow is uninterrupted, and the machine is continuously learning from the reviewer to improve its predictions.

DISCO AI makes recommendations quickly.

Our competitors require review of several hundred or several thousand documents before revealing any machine predictions. They do so because they want to ensure that the predictions are as accurate as possible at the first cut, and the machine needs a lot of examples in order to learn. DISCO has a different philosophy.

Our observation is that often our clients are trying to find relatively rare documents, and if our machine predictions can make such documents less rare, then it can provide value even if many or most of its predictions are erroneous. For example, suppose the review has defined a tag that is applied to 1 in every 1,000 documents. If our system can increase that proportion to 1 in every 100 documents, then it would only be correct 1% of the time. But it would still increase the frequency of tagged documents by 10X over the background rate. When predictive sorting is combined with intelligent searching, the rate can be increased even further. As a consequence, DISCO starts making predictions after just 50 documents have been tagged. As more and more documents are tagged, the accuracy of predictions increases.

Overcoming the saturation challenge

We strongly believe in our flexible paradigm of human-machine interaction, but it does introduce some challenges. With active learning, the machine can request coding decisions on the documents that cause it to learn the fastest, although the reasoning behind such documents may not be obvious to the reviewer. In interactive learning, the human choices could lead the machine down the proverbial garden path, and so we had to develop some special tricks to mimic the positive benefits of active learning.

The machine learns to make predictions that match what it has seen. If the examples of a tag that the machine has seen are all of a certain type, then the machine will focus in on that type of document to the exclusion of others. This problem is exacerbated if the reviewer only looks at the documents recommended by the machine, because this will tend to increase the proportion of the examples that are similar to what the machine has seen before. Ultimately, once all of the similar documents have been tagged, then the machine will have no more predictions, although the reviewer still has documents to tag. We call this saturation.

To avoid saturation, we use a technique called score stratification sampling. The crucial insight is that usually the machine has seen new kinds of examples but there are few of them relative to the dominant type of example. This influence can be counteracted by forcing the machine to account for all of the examples of a tag and not just most of them. To accomplish this, we stratify the documents by score and present the machine with extra examples of the documents getting the lowest predictive scores. Our data shows that this results in much better predictions and a more even exploration of the different reasons why a document might be coded a certain way.

These techniques, plus a few others, make it possible for DISCO AI to offer a flexible review process tailored to the reviewer. We’ve got a lot of other tools in the pipeline that we hope to release soon that will make this process even more streamlined, too.

Subscribe to the blog
Dr. Alan Lockett