Achieving 100x Faster Data Ingests with DISCO’s Cloud-Native Architecture

Back to Blog Posts

At DISCO, we have an extraordinary focus on giving our users and clients a magical experience – from software usability to ease of doing business with us. DISCO was founded on the principle of combining world-class engineering with a deep love and respect for the law, and we wanted to share the groundbreaking work done by our engineering team.

#DISCOmagic is a new blog series that takes you behind the scenes to show you some examples of all the hard work that goes into making this magic happen. This installment comes from Chief Performance Architect Sujatha Kashyap.

Last month, one of our clients ingested 3.6 TB of data into their DISCO database in under 15 hours. This included automatic processing with full analytics, de-NISTing, imaging every document, deduplicating, near-duplicate detection, email threading, etc. so the data was ready and available for search and review. 

This is a whopping rate of 14.7 seconds per GB — 100x faster than the 24 minutes per GB ingest time recently touted by another well-known ediscovery software provider. A few days later, a DISCO customer ingested 682 GB of data into DISCO at a rate of 6 seconds per GB — which is 240x faster than the other ediscovery software provider.

How do we do this? 

Shifting an on-premise application to the cloud cannot take advantage of essential cloud-native features. DISCO was born in the cloud to take advantage of the continuous innovation and disruptive technological breakthroughs enabled by cloud-native architectures. All DISCO processing, computing, and storage resources are on Amazon Web Services (AWS). 

As Chief Performance Architect, my role is to identify all the places where a user of the DISCO app could potentially experience lag time, and make that wait time go away so that the experience of using the DISCO app is magical. By eliminating wait times, the user's flow of thought remains uninterrupted, allowing them to be more productive.

I track key performance metrics across the product, and find ways to improve performance by eliminating outliers to ensure a smooth, consistent user experience. I also look for new solutions to implement in places where the wait times are longer than desirable. I am constantly investigating innovations that can push the envelope on the speed and agility of our offerings. 

For example, DISCO’s ingest function is built using the serverless computing architecture provided by AWS Lambda. As Lambda became more popular and optimized, DISCO began moving over to this new architecture as it provided speed and cost benefits over our previous implementation. The serverless Lambda architecture automatically scales the compute resources with the size of the job being processed. And our microservices-based modular architecture ensures forward compatibility, so we can quickly adopt new breakthroughs that further enhance our speed and agility. At DISCO, we can run up to 45,000 concurrent computations to process a single ingest. This enables the lightning-fast ingest speeds I mentioned earlier.

Another benefit of the cloud: Saving some green

Besides the virtually unlimited scale and performance benefits of the cloud, cost is another big benefit. The cloud drives costs down by pooling resources and providing sheer economy of scale (AWS’s infrastructure houses millions of servers). 

Furthermore, the ability to automatically spin up and spin down computing power and storage in a matter of milliseconds based on detected load increases utilization and reduces costly inactive periods. An example of the cost benefit of using the cloud is that Descartes Labs built one of the fastest publicly known supercomputers in the world with an investment of only $5,000 and an AWS account.

Yet another benefit: Being green

If cost, speed, agility, and reliability are not reasons enough, all of this is achieved with a minimum carbon footprint. For example, AWS Lambda serverless computing runs code only when an ingest is triggered, and uses only the compute resources needed to get the job done. Studies show that moving an application to the cloud cuts its energy use by 87%. This would help keep in check the incredibly high energy consumption by data centers, which in 2018 used “an estimated 198 TWh, or almost 1% of global final demand for electricity.”

DISCO's cloud-native architecture allows us to effectively leverage the latest technology, providing a solution that is cost-effective, eco-friendly, and most importantly, user-friendly. Want to see just how fast DISCO performs? Schedule a demo with us today.

Subscribe to the blog
Sujatha Kashyap