Predictive Coding for Litigation Support and Ediscovery

What is predictive coding?

Predictive coding uses machine learning to reduce the time and cost needed for manual review of documents to determine which are relevant for a legal case. It expedites the review process by quickly locating relevant documents in massive amounts of digital data.

How does predictive coding work?

It uses software to organize electronic documents. Mathematical algorithms use your input to determine which documents are most relevant to a case.

First, the machine reads through and parses each document, identifying each particular word or phrase. Then, a human legal team manually tags some of the documents as relevant or not relevant. Then the predictive coding software reviews the results and learns to recognize the distinguishing words or phrases that tend to make a document relevant or not relevant. This process repeats as human reviewers tag additional document sets and the machine learns to make increasingly reliable recommendations as to what is relevant.

What is predictive coding used for?

Predictive coding systems learn from existing review decisions to predict how your team will evaluate the remaining, unreviewed documents. You can use this information to give your human reviewers a manageable number of relevant documents to focus on, ignoring those very unlikely to be relevant. This is important because it is increasingly impossible for humans to meaningfully analyze the normal volume of data in a lawsuit without predictive coding ediscovery tools.

You can also use predictive coding to prioritize documents for review, even when you intend to review them all, and to validate your tagging decisions as a form of quality assurance.

How does predictive coding work in technology assisted review?

Predictive coding and technology assisted review (TAR) are terms that are often used interchangeably, but they are slightly different.

TAR is the integration of technology into the process of human document review. In that sense, predictive coding is just one way to integrate technology to assist review.

What Is the difference between predictive coding or TAR 1.0 and 2.0?

The process for predictive coding or technology assisted review (TAR) has changed over time. The original process, TAR 1.0, involved simple passive learning:

  • Control — Subject matter expert (SME) classifies hundreds of documents to be used for initial training.
  • Train — SME begins classifying other documents. An engine creates and tests predictions, while the SME reviews those results and continues training the model to improve performance.
  • Deploy — Once stable, the prediction model is applied to all documents in the collection.

But TAR 1.0 has the following deficiencies:

  • Requires experts to do the initial training.
  • Can’t learn from subsequent decisions.
  • Can’t handle rolling productions without having to start over.
  • Doesn’t work well when the proportion of relevant documents is low.

TAR 2.0 is a more efficient and effective process because it provides continuous active learning. TAR 2.0 uses the following steps:

  • Review/Train — All review decisions automatically train the engine.
  • Rank — The engine continuously updates the rankings.
  • Test — SMEs focus on reviewing potential mistakes, rather than providing all training input.

Graphic illustrating the process of predictive coding for litigation support and ediscovery.

How does predictive coding work for ediscovery?

Predictive coding is considered a critical tool in the ediscovery process. Thanks to predictive coding software, algorithm-reviewed documents have become more reliable than human-reviewed documents. There are reports that some courts prefer algorithm-reviewed documents and have even refused manually reviewed documents.

Judicial approval of predictive coding

In 2012, a ruling by U.S. Magistrate Judge Andrew J. Peck was the first US judicial opinion that approved the use of predictive coding. Prior to that case, it was seen as a risk. But Peck’s ruling called the practice a “desirable, efficient solution for large-scale review challenges.” This opened the door for predictive coding to become commonplace.

The 2012 case, Da Silva Moore v. Publicis Groupe, involved three million emails that needed to be reviewed in a gender discrimination case. The defendants wanted to use an older predictive coding approach that would:

  • Develop a seed set with human review of a random sample of 2,399 documents.
  • Engage in seven rounds of iterative review of at least 500 results per round.
  • Review and produce the top 40,000 results identified.

The plaintiffs objected to only the top 40,000 results being reviewed and produced. Judge Peck agreed the total should be larger. He also said the defendants could not limit machine learning software training to just seven rounds of iterative review. Judge Peck ruled that a sufficient number of rounds could not be predetermined and would have to be established after the work began. By setting these parameters, Judge Peck approved the use of predictive coding.

Is predictive coding artificial intelligence?

Predictive coding uses machine learning software, which is a form of artificial intelligence. The machine learning AI uses algorithms to learn from structured data, testing examples and multiple cycles of prediction and analysis. In predictive coding, the algorithm learns from humans who manually input data about a sample of relevant documents. Then the predictive coding software applies that learning to the full set of documents.

How accurate is predictive coding?

Predictive coding software is as accurate as the information provided and verified by human experts. If the software produces unreliable results, humans can run more iterations that the software will learn from.

Ediscovery platforms provide user-friendly tools that help legal teams get the most benefit from predictive coding.

Accessible models

Easy-to-create predictive models that bring out the most relevant predictions while ensuring quality control in a collaborative document review.

Integrated results

Predictions are incorporated throughout the ediscovery platform, including visual search and data visualization, to allow for more informed review decisions.

Rigorous performance metrics

Random samples of reviewed documents are taken to generate performance statistics. This helps provide additional statistical insight to know when the document review is complete.

When would you want to use predictive coding?

Use predictive coding software to save time and money when a massive set of electronic documents must be reviewed. Only a small portion, or subset, of the documents needs to be reviewed manually with predictive coding. It is a more efficient and cost-effective process for ediscovery.

What are concerns/issues when using predictive coding?

  • Depending on your provider, it can be more expensive than preparing documents for keyword searching. But relying only on keyword search as a culling tool can miss up to 80 percent of relevant documents, according to an academic study conducted by Blair and Maron.
  • Some lawyers may view predictive coding as a technology they don’t trust or understand because it is complex and requires knowledge of statistical sampling. But ediscovery platforms with predictive coding are easy to operate with the technology running in the background.
  • There is an upfront expense to acquiring a predictive coding software tool, but is often financially justified in cases with a large amount of electronic data to sift through and the potential to eliminate human error.
  • While predictive coding is cost-effective, it may not be worthwhile for cases with a small dataset. According the The Legal Intelligencer, a case with more than 10,000 documents is best suited for predictive coding because anything less would likely require mostly manual review anyway to train the predictive coding algorithm.

Does Everlaw offer predictive coding?

Yes. Everlaw's ediscovery platform includes a predictive coding system. Users define a model using the available ratings, codes, and document attributes in a case. Then users specify criteria and identify which documents the system should learn from to find the most relevant documents. For a deeper dive into predictive coding, visit our support page.