We’ve all been there. You have a 6 p.m. FedEx deadline and you’ve been waiting all afternoon for your production to finish. The reviewers have been done since noon, but here you are, watching the second hand of the clock, looking up the fastest routes to the nearest FedEx box, and starting to sweat. If you’re lucky, you’ll be able to dash out of the office and make it to the box at 5:59pm.
Now imagine that you were using DISCO Ediscovery. The attorneys finished the final QC at noon and you started the production at 12:05pm. The production is ready to be copied to a hard drive at 12:35pm. You enjoy the sunshine as you walk to the FedEx box, and easily make it to your neighborhood beer garden in time for happy hour (outside and socially distanced, of course). Better, huh?
At DISCO, the engineering team is dedicated to making your life easier. We worked hard last year to make the above scenario possible, which now means:
- More than 80% of productions take less than 30 minutes, end to end
- Hundreds of productions are running in the platform every day, with speeds up to 1000 pages per second. Clients running 1.5 million page productions are able to download a single zip containing the entire production in 25 minutes.
- The most complex, longest-running productions are now 20x faster than before
And we’re just getting started.
Faster productions with none of the headaches
Since the beginning, productions in DISCO have been different. Our single-step process eliminates the headaches of having to generate tiff images before review or manually move your productions through a dozen steps. Instead, you enter your production settings with a few clicks up front, and DISCO takes care of the rest.
Once you hit “Create” on the Productions page, four phases run behind the scenes:
- Assign: Runs your production search to identify which documents should be included
- Prepare: Analyzes documents to create Bates numbers and production load files, and re-duplicates or de-duplicates the documents based on your specifications
- Produce: Stamps production images, burns in redactions, copies native files to production folder, and validates data.
- Zip: Packages all of the production files, including the .dat, .opt, images, text and natives, into a single .zip file that you can download directly from DISCO.
During all of these phases, we ensure the production is saved into your database so you don’t have to go back to update Bates numbers and production images after the fact. And since we only charge for your original unexpanded data size at ingest, this is all done programmatically and without any additional fees.
Most importantly to our users, it’s always been much faster to get a production out the door in DISCO than in any other ediscovery platform. Over the past few years, we typically saw end-to-end production speeds of tens of thousands of pages per hour.
That’s pretty good, but we knew we could do better. As a cloud-native platform built on Amazon Web Services (AWS), we're purpose-built to increase throughput as fast as we can. And while we spend many of our engineering resources on new cutting-edge features like cross-matter AI, we also have a dedicated team that does nothing but work to make our platform faster.
In 2020, speeding up productions was one of the top priorities.
How we did it
At DISCO, we’re always pushing the limits, and our approach to performance is no different. At the beginning of 2020, we asked ourselves: “Just how fast can we go?”
The fact that we’re cloud-native in AWS means we’re not constrained to physical servers. AWS allows us to instantly add more resources, with no human intervention, when there is more work to be done — meaning we can run virtually unlimited processes in parallel. The first step in this performance improvement project was to systematically target each phase of the backend production process, benchmark the speed, and determine where we could push the limits.
We focused these performance improvements on generating the production images (a.k.a. the “Produce” phase). Rather than sending documents through multiple passes to stamp, burn in redactions, etc., we now do all of those steps at once on each document — and process thousands of documents at the same time. Running the jobs in parallel on AWS lets DISCO scale up automatically so the larger your production, the more machines are running and the faster DISCO can go.
This auto-scaling also means your production never has to wait in line. If there are more users running productions, we just scale up and down to accommodate what’s happening right now — whether it’s 1,000 machines or 20,000 machines.
That’s how we are able to achieve speeds of up to 1,000 pages per second (3.6 million pages per hour), and complete over 80% of productions in less than 30 minutes.
What’s next
We’re not stopping there. In 2021, we’re delivering more production capabilities and getting even faster with more parallelization, focusing on increasing the speed of generating the production .zip file to make the end-to-end process even faster.
We know that time is of the essence when you are trying to get productions delivered. And we’re committed to constantly improving the parts of our platform that you use the most, to get you back every minute we can.
(Now get back to that happy hour!)