Get Your Head Out of the Ediscovery Fog and into the Cloud

Back to Blog Posts

The legal technology industry has come full circle in the last several years, shifting from full-on cloud phobia (nephophobia or is it cumulonimbophobia‍?) to jumping in with both feet. This shift is exciting and serves as a pivotal moment for the evolution of the legal technology industry. It is also an opportunity for some to hitch their wagon to the cloud train and hope that their clients don’t realize that the cloud they promise may be only fog. 

This blog focuses on the difference between cloud-adapted technology and cloud-native, highlighting why you should care and the impact it has on speed and analytical robustness. There are many reasons cloud computing has reached a tipping point today, it is important to understand whether the “cloud” solution you are considering will provide all the anticipated benefits you require and what limitations certain deployments may possess. 

More cloud, less fog

Ediscovery in particular is rapidly moving to the cloud. The worries and outside counsel guidelines that previously precluded cloud adoption are falling to the wayside as the computational power, cost savings, and elasticity of cloud solutions entice practitioners facing ediscovery challenges. 

Cloud-native solutions like DISCO are purpose-built to take advantage of the benefits of the cloud architecture from day one, and able to capitalize on the unique characteristics of cloud computing, most notably infinite scalability. This is an important difference if you are hoping to fix the challenges of very large data sets and complex large volume matters faced with legacy technology. 

When the shift to the cloud began to look inevitable, some legacy ediscovery platforms began to migrate their installed on-premise applications onto virtual machines (VM) hosted in Amazon Web Services (AWS) or Microsoft Azure. This cloud-adaptive approach enabled legacy tech to capture some benefits of true cloud computing, but failed at capturing many key benefits in terms of scale and elasticity. 

Unfortunately, simply moving the hosting of a previously on-premise application into the cloud without ensuring the underlying architecture is optimized for cloud computing limits the application’s ability to fully capitalize on all the many benefits of cloud computing. On-premise applications ported to VMs in the cloud that mirror their on-prem architecture have not fixed any of the underlying limitations found in physical server-based instances. These partial solutions are more fog than cloud and have caused a good deal of market confusion.

Not all cloud is created equal

A core benefit of moving to next-gen when you make the leap to cloud-enabled ediscovery is getting a platform that is cloud-native as opposed to adaptive. What does that mean? Well, technology built for the cloud can better capitalize on rapid scalability, parallel computing power (to accelerate the entire process), and continual innovation in a way that solutions based on data center architecture and antiquated programming cannot. 

For example, DISCO’s cloud-native architecture takes advantage of the cutting-edge innovation AWS continually rolls out, and is specifically designed to maximize the unique characteristics of a distributed cloud network. 

What does that mean in non-geek-speak? We did not just copy the structure of legacy technology, complete with its scaling limitations and foibles. Instead, we started from scratch with an elastic and scalable infrastructure that allowed our platform to do in parallel what legacy tools had to do sequentially. This underlying structure is exponentially more robust than the SQL-based VM architecture some legacy providers adopted, and does not struggle with larger data sets or get bogged down by more users or tasks being performed. 

Whether you have 1,500 documents or 150 million, DISCO still renders pages and performs searches in a fraction of a second. Adding hundreds of reviewers, conducting large-scale data ingestion, deploying advanced AI modeling across multiple issues, or having multiple massive cases hit all at the same time has zero impact on speed because we can scale across the AWS ecosystem to increase compute power at a moment’s notice. 

via pinimg

SQL scaling is an Achilles heel

Legacy ediscovery technology has a key weakness that porting the application to VMs hosted in the cloud does not fix. Namely, the underlying query language the entire relational database is built upon: SQL. SQL servers are extremely sensitive to data volume, and web servers working concurrently falter with large numbers of reviewers and complex large file sizes. 

This query language represented a huge step forward when top legacy tech began using it 15 years ago, and it was more than sufficient to handle the data volumes prevalent in ediscovery matters at that time. However, as data volumes exploded from tens of gigs to tens of thousands of gigs, serious cracks in this foundation became apparent. 

SQL scalability became such a chronic problem that top legacy tools (to this day) have 20-page how-to's outlining infrastructure scaling and workflow workarounds to try to address this core limitation. Despite these complex, multi-step workarounds, the fact remains that for case volumes above 15 million documents (150 GB) or 200 users, a client will face substantial latency issues and, at times, full system errors that can waylay your matter. 

From an actual how-to

The current solution entails breaking a case into multiple smaller databases. This crude workaround creates new challenges with searching across multiple databases, applying analytics effectively on the full data set, and ensuring accurate productions. Reconstructing the frankendatabase is a manual and frustrating process many practitioners are all too familiar with. Even if you apply this complex and burdensome workaround, saturation and latency may still occur due to fundamental shortcomings of SQL as it is deployed in legacy tech. 

Same SQL, different deployment 

Any flavor of cloud deployment will offer benefits that on-premise solutions lack. In this case, something is most decidedly better than nothing. However, to capture all the benefits of cloud computing, a cloud-native solution is required. If the underlying legacy database query language and architecture of the technology remains unchanged, and frankly unchangeable, the full benefits of cloud computing remain out of reach. 

Solutions reliant on SQL searching cannot support true shared services, parallel compute, and all the elastic goodness of the cloud. That means when you have a case that demands true search elasticity and scalability, legacy-architected and SQL-based technology will fall short. Solving this problem would require legacy technologies to reengineer from the ground up. 

Throwing more servers at the problem and even hosting the SQL servers in the cloud won’t remediate this underlying issue. If you rely on this sort of legacy cloud-adapted technology, you will find yourself constantly deploying workarounds and fixes that work about as well as a band-aid on a broken leg. From the spinning wheel of death when a search is run to painful and costly latency throughout the review, the SQL foundation presents a challenge no amount of servers (virtual or otherwise) can bridge. 

The limitations of SQL-based legacy tech

Evolving with the times

Whether a legacy tool is running on tens or hundreds of servers, the underlying programming still breaks at larger data volumes. Because legacy tech is tied to antiquated foundational programming, you do not get the benefit of cutting-edge innovation like GPU for AI, Google’s BERT, or even cloud-based architecture like neural networking. 

Choosing adaptable underlying architecture instead of staying anchored in decades-old technology helps you future-proof your ediscovery practice. Our founders understood that what was cutting-edge at our inception could rapidly become antiquated in the continuously evolving dataverse legal practitioners face today. So, we built DISCO to evolve alongside you. As your data becomes bigger and messier, and as your need for computational power and cutting-edge AI increases — and DISCO can go the distance with you. 

via reddit

Less red Solo cup, more Holy Grail

What worked well 15 or even 5 years ago cannot support the exponential growth of data we are facing today as legal practitioners. Foggy claims of cloud-adapted vs. cloud-native computing are confusing the issue for many. Indiana Jones would not have been satisfied with a red Solo cup when he was on a mission to find the Holy Grail, so why should you settle for fog when you could have cloud? Yes, they may both be cups (or ways of hosting), but they are fundamentally different in nearly every other way. 

Subscribe to the blog
Cat Casey
Quick Menu