Tales of Proportionality and Predictive Coding

A funny thing happened in 2015: courts regularly applied proportionality analysis to the scope of discovery and discussed predictive coding like it was normal. Here is the shocking thing: both topics are “normal” and should not be considered earth-shattering.

A Case of Proportionality

One of the many 2015 cases in which proportionality intersected with predictive coding centered on the producing party seeking to limit discovery from 38 tangential custodians to 10 custodians.[1] The producing party took the position that they should only have to load emails from 10 custodians, which would save them $18,000 in review costs.[2] The producing party also argued that limiting the number of custodians would “facilitate the predictive coding process.”[3]

The requesting party was not keen on the producing party’s cost-saving argument, calling the proposal “arbitrary.”[4] Moreover, the producing party allegedly refused to disclose information about the 38 custodians, making it difficult for the requesting party to determine which custodians the producing party should review.[5] In a move that threw Federal Rule of Civil Procedure Rule 1 out the window, the producing party suggested that the requesting party take 30(b)(6) deposition to narrow the custodians.[6] Because the cost of the depositions would likely exceed $18,000, this unique cost-shifting argument effectively gutted any cost-saving argument.

Ultimately, the Court denied the producing party’s motion to limit discovery to 10 email custodians. Judge Mark Dinsmore’s explanation is illuminating:[7]

As the Court noted at the hearing, in the realm of electronic discovery, there are no guarantees that every relevant responsive document will be found. Even in the best case scenario, the process likely will not yield 100 percent production of all relevant material. But how many relevant responsive documents are too many to voluntarily walk away from? As Knauf pointed out, JM’s proposal would guarantee a zero percent recall for the 28 custodians not chosen. There is no way to predict how many non-duplicate relevant emails may be in the possession of those 28 custodians; individuals that JM itself identified as likely to possess relevant information. JM asserts, and it is reasonable to believe, that one of the key email custodians was likely copied on any relevant email sent or received by one of the more tangential custodians. Unlike with Dr. Alavi’s testimony and proposed date cutoff, there is no evidence to help the Court weigh how likely it is that ESI from the 28 tangential custodians would yield information relevant to the issues in this litigation. However, in a high value case such as this one, the burden of the additional $18,000 expense does not outweigh the potential benefit to Knauf of receiving those emails.

Proportionality should balance both cost and documents’ relevance to the case. While discovery does not require perfection in productions, it does require intelligent decisions that maximize the chance of finding responsive electronically-stored information.

The producing party’s proposal would effectively have been swinging wildlyBaseball and Proportionality at pitches in the dark, unless the requesting party was willing to pay for the lighting of a baseball stadium. That is not proportionality and definitely not in the spirit of Federal Rule of Civil Procedure Rule 1’s charge to litigate cases in a “just, speedy, and inexpensive determination of every action and proceeding.”

However, Judge Mark Dinsmore was sympathetic to the producing party’s cost-shifting request, acknowledging that  the search might find only a limited number of responsive emails. The Court ordered that the requesting party pay the $18,000 for loading the emails of the 28 tangential custodians, IF fewer than 500 responsive emails were found.[8] If more than 500 responsive emails were found, then the producing party was to bear those discovery costs.[9]

Bow Tie Law Thoughts

Every competent lawyer wants to only review electronically-stored information that is relevant to the case, not every email ever sent by a party. “Document review” can easily turn into a wildly expensive voyeuristic dissection of other people’s lives. Proportionality is one way to guard against parties going off the deep end, thinking entire email mailboxes have to be reviewed.

Narrowing data prior to review is one way to further the goals of proportionality and to keep review costs down. Parties need to go beyond drive-through meet-and-confers and identify the relevant custodians of information. The next step is not dogmatically exporting every email from that person for review, but further narrowing to what is relevant.

  • What are the date ranges in the case?
  • Who are the participants in the relevant communications?
  • What are the relevant domain names, and what are the irrelevant ones?
  • What is the subject matter of the case?
  • How do we avoid search terms that can give false-positive results, like “agreement?”

It is highly unlikely that a patent case will need emails from Amazon, eBay, restaurants, political candidates – or anything that could fall under the category of “personal interests.” In a business dispute, those emails are likely irrelevant and should not be exported for review. Determining what data to exclude from importing is a huge step in avoiding additional ediscovery costs.

Predictive Coding

This case also had a passing note that limiting the number of custodians would help “facilitate the predictive coding process.” While predictive coding technologies can vary, I agree with the concept of focusing what is exported for review.

In one case I worked on using Everlaw, I leveraged predictive coding by first conducting searches for communications that showed specific relevant conduct. We also trained the system on what was irrelevant by tagging newsletters and similar communications as “cold.” Coupling the prediction models with searches for relevant custodians, we were able to Document Rated Coldseparate out information that was potentially relevant from what was clearly irrelevant.

In a perfect world, parties would identify potentially irrelevant domains and file types prior to that information being loaded into a review application. Unfortunately, we do not live in a perfect world, so identifying irrelevant information is one way to train the prediction engine in Everlaw on what not to look at. After this information is coded as “Cold,” it can be removed from the database.

Predictive Coding

Parties should base their review on some very basic questions:

  • What is the story I want to tell the jury?
  • Who are the characters in this story?
  • What facts must I present to meet each element of the jury instructions?

An outline with this information can be created in Everlaw’s StoryBuilder Outlines tool. The new Chronology feature is also a powerful way to organize facts supporting your case story in a timeline of events.

Document review is substantially easier when you know what you are trying to prove. Once you do, you can craft searches to identify specific information that supports your argument, such as a communication between two individuals that shows a nefarious meeting (insert your favorite breach of duty here).

Being able to search effectively and code intelligently is the ideal way to train a predictive coding system on what is, and is not, relevant to a case. This empowers lawyers to identify what facts they need to support their arguments. More importantly, this workflow can be done in a way that furthers Federal Rule of Civil Procedure Rule 1 and proportionality under Federal Rule of Civil Procedure 26(b)(1).

To see how Everlaw’s predictive coding would work for your case, feel free to contact them.

[1] Knauf Insulation, LLC v. Johns Manville Corp. (S.D.Ind. Nov. 13, 2015, No. 1:15-cv-00111-WTL-MJD) 2015 U.S. Dist. LEXIS 153506, at *7-9.
[2] Knauf, at *7.
[3] Id.
[4] Id.
[5] Id.
[6] Id.
[7] Knauf, at *7-8, emphasis added.
[8] Knauf, at *9.
[9] Id.