What is Predictive Coding and Why Should You Use It (Part 1)

Predictive coding — whether known as technology-assisted review, machine learning, or any other name, is rapidly moving from an interesting technology to a necessary tool, given the growing size of the average case.  So what is it, and why is it better than just searching for the documents you want and reviewing them all?

What Is Predictive Coding?

So, what is predictive coding? In a nutshell, predictive coding is the process of teaching a computer to make accurate predictions about how a document should be rated or coded.  In a typical predictive coding setup, document reviewers will rate a small subset of documents as “important” or “not.”  The prediction software then analyzes the documents deemed “important” along a number of dimensions to find other documents that are likely to be important as well.

Why Not Just Manual Review? Time!

As a producing party, you’re often dealing with document collections so large that it’s impractical to look at every document individually in responsiveness review.  As a party receiving large productions, you’re faced with a similar problem, needing to prioritize documents for review even if you’re planning on looking at each one eventually.

Predictive coding helps with both of these challenges.  The software can examine each document to identify the ones most likely to be responsive or relevant to your case.  That information can then be used to prioritize and speed up manual review — and sometimes even eliminate the need for it.

Why Not Just Search? Efficacy!

But document reviewers can search too, so why do you need predictive coding?  Because it does more than search: it learns the patterns and features that distinguish important from irrelevant documents.  Searching finds documents related by small portions of content, while predictive coding software identifies similar documents holistically, examining their content and context in entirety.  Replicating that kind of nuanced seeking with search terms would be immensely difficult, if not impossible.  

For example, imagine that you’re at the library, looking for information on impressionist art.  If you search the library catalog for “impressionism,” you’ll find books with “impressionism” in the name, or listed as the book’s subject.  What you likely won’t find are broader surveys of art history — of which impressionism is a part — or books about specific artists like Manet.  Searching case documents is just like that: it only turns up a small portion of what would be relevant, and misses the rest.


The Takeaway

In many ways, using predictive coding is like consulting an experienced librarian.  Her encyclopedic knowledge saves you from having to read every book to find just the one you want.  And just as her recommendations consider many parts of a book — plot, setting, tone, genre, era, and more, predictive coding software considers many parts of a document.  It is not just looking at a document’s most unique words or its recipients, but rather all of the document text and metadata.  This is a powerful approach that makes it far superior to simple search and manual review alone.

In part two of this series, we’ll look at humanizing predictive coding: what makes a tool more like a trusted librarian and less like a robot?