How to Make Use of Predictive Coding and Search Terms When Producing Discovery

by Joshua Gilliland

Reading Time — 3 minutes

February 21, 2018

There are many ways to achieve a focused dataset for review using both search terms and predictive coding.

Magistrate Judge Katherine Parker issued a detailed opinion on the use of predictive coding in a recent discrimination case . In Winfield v. City of N.Y., the Plaintiffs provided 665 additional search terms to be applied to the Defendant’s review database. (Winfield v. City of N.Y., 2017 U.S. Dist. LEXIS 194413 (S.D.N.Y. Nov. 27, 2017)) The supplemental searches would have added 90,000 more records and cost approximately $248,000 to review. The Defendants agreed to run the additional searches, but stated they would use predictive coding to narrow the data set for review. The Court had actually been the first to recommend the use of predictive coding to help expedite discovery review.

Coupling the use of search terms with predictive coding in virtually any well-planned workflow can help attorneys conduct document review that is proportional to the needs of the case.

Challenging Predictive Coding

In the Winfield v. City of N.Y. case, the Plaintiffs were concerned about the reliability of the Defendants’ predictive coding workflow, claiming that the Defendants had a narrow view of what was responsive and had over-designated documents as privileged and non-responsive. The Plaintiffs’ arguments were based upon two documents that were withheld, but for which the extracted text was inadvertently produced with placeholder images. The Plaintiffs claimed the withheld records were relevant and should have been produced.

In deciding the Plaintiff’s challenge to the Defendants’ predictive coding workflow, Judge Parker provided a detailed summary of discovery production cases, ultimately finding for the Defendants. First, he found, producing parties are in the best position to “evaluate the procedures, methodologies, and technologies appropriate for preserving and producing their own electronically stored information.” (See Hyles v. New York City, 2016 U.S. Dist. LEXIS 100390, 2016 WL 4077114, at *3 (S.D.N.Y. Aug. 1, 2016) (citing Principle 6 of the Sedona Conference))

Moreover, courts have not traditionally “micro-managed parties’ internal review processes,” because 1) attorneys are officers of the court who are expected to comply with Rules 26 and 34 in connection with their search, collection, review and production of documents, including ESI;” and 2) to avoid putting attorneys in the position where they may end up disclosing work production, litigation tactics, and trial strategy. (See generally Disability Rights Council of Greater Wash. v. Wash. Metro. Transit Auth., 242 F.R.D. 139, 142-43 (D.D.C. 2007))

The Court further explained that while perfection is not required in producing discovery, a producing party must take “reasonable steps to identify and produce relevant documents.” (HM Elecs., Inc. v. R.F. Techs., Inc., 2015 U.S. Dist. LEXIS 104100, 2015 WL 4714908, at *12 (S.D. Cal. Aug. 7, 2015), vacated in part on other grounds, 171 F. Supp. 3d 1020 (S.D. Cal. 2016)) This means parties cannot put in “half hearted and ineffective efforts to identify and produce relevant documents.” (Bratka v. Anheuser-Busch Co., Inc., 164 F.R.D. 448, 463 (S.D. Ohio 1995))

Based on the above principles from prior cases, Judge Parker stated:

“…This Court is of the view that there is nothing so exceptional about ESI production that should cause courts to insert themselves as super-managers of the parties’ internal review processes, including training of TAR software, or to permit discovery about such process, in the absence of evidence of good cause such as a showing of gross negligence in the review and production process, the failure to produce relevant specific documents known to exist or that are likely to exist, or other malfeasance.”

Share this quote:

The Court rejected the Plaintiffs’ arguments that the Defendants’ training of the predictive coding system was either grossly negligent or unreasonable. The Defendants were thus ordered to conduct their supplemental review with the blended workflow of search terms and predictive coding.

Final Thoughts

Magistrate Judge Katherine Parker’s opinion on predictive coding was a thorough one, covering both the law and how the Defendants trained the predictive coding system. From the context of the opinion, the predictive coding system seems to have made use of “simple passive learning”—with the discussion of training a seed set—rather than “continuous active learning.” Either technology would be better than manually reviewing 90,000 records; however, one advantage of continuous active learning is that all review decisions automatically train the engine, which continuously updates the rankings for predictions.

The Defendants’ workflow, which involved applying search terms before using predictive coding to focus in on documents to review, is common among firms conducting document review. The prospective search terms applied to a dataset with “or” searches could have a staggering number of hits, likely with a large number of false-positive records for review. Using predictive coding to assist attorneys in maximizing their time for review and producing data is one option for review.

Other options for using search terms would be to develop search strings, so that keywords are not the sole basis for identifying a record for review. One example of such a string is proximity searching between specific keywords. Another could be a search of email messages between specific individuals, over a set timeframe, with specific content searches in the messages. There are many ways to achieve a focused dataset for review with search terms. Coupling the use of search terms with predictive coding can help attorneys conduct document review that is proportional to the needs of the case.

Judge Parker’s opinion provides guidance regarding options for challenging a party’s discovery methodology: evidence of good cause showing gross negligence in review, failure to produce relevant specific documents known to exist, or other malfeasance. This is a high bar, but not an impossible one to meet. Successful challenges would require showing a production was incomplete, contained a large volume of documents that were clearly irrelevant, or was extremely small. Such challenges would be highly fact-specific in order to demonstrate either gross negligence or malfeasance.

Joshua Gilliland is a California attorney and nationally recognized thought leader on electronic discovery with his blog Bow Tie Law. Josh is the co-creator of The Legal Geeks and has presented at legal conferences and comic book conventions across the United States.