skip to content

10 Ediscovery Terms You Should Learn Today

by Jon Kerry-Tyerman


The last 10 years have brought enormous changes in a variety of technical fields, and ediscovery is no exception.   Litigating a case today often involves reviewing thousands of documents—if not far more—ranging from CAD files to emails to social media posts, and understanding ediscovery and its related terminology is increasingly considered a basic lawyerly duty.

Today, we’re assumed to know what ediscovery project managers and other experts mean when they use terms such as de-duping, de-NISTing, and batching. But to most people it’s like a foreign language you studied for a semester in college—recognizable but not understandable.

Here are 10 ediscovery terms you should learn now if you want to keep up:

Batch processing: This is simply the processing of a large amount of data, or multiple records, in a single step.

Big data: This is a term used to describe the rapidly-growing collections of complex data sets that are hard to process using traditional database management tools.  The positive side of big data is that more data generally means more interesting information, particularly once you’re using the right tools to analyze that data.

De-duplication (de-duping): This is the process of removing from view—by either hiding or deleting—files that are largely or entirely duplicative of other files in your collection.  This can, of course, save you a great deal of expense, but can be controversial if “duplicate” is loosely defined.

De-NISTing: This refers to the removal of system files, program files, and other computer-generated data from your ediscovery collection.  The name comes from the acronym for the National Institute for Standards and Technology, the curators of the master list of such files for reference purposes.

Document family: A document family is a set of documents related to one another as parent and children.  Most of the time, this means an email and its attachments, but it can also refer to any document that has other documents embedded within (the embedded documents are considered “children” of the parent container document).

ESI: ESI stands for Electronically Stored Information.  It’s a fairly generic term that includes emails, documents, presentations, databases, voicemail, audio and video files, social media, and web sites.

MetadataMetadata is data about the data.  For a given document, the metadata might describe its characteristics, origins, usage, and/or validity.  While not typically visible when viewing a printed version or on a screen, this information provides key context for how the document came to be, where it’s been, and what it’s all about.

Native file: A native file is one still in the format of the software that originally produced it.  For example, Microsoft Office files still in .DOC, .XLS, or .PPT format, or emails still in the .PST container from Microsoft Outlook or Exchange.

Predictive coding: This is perhaps the most frequently used term in ediscovery in 2014, and it refers to the process of analyzing the contents and metadata of the documents you’ve already coded in order to predict how you might code all the other documents in your collection.

Spoliation: Spoliation is destroying or hiding relevant evidence, whether intentional or simply through negligence.  It’s a term you will only hear in a legal setting, and it can have very serious legal consequences for the “spoliator.”

Of course, no list of 10 terms can cover the whole dictionary of ediscovery jargon you’re likely to encounter in practice.  So, tell us: what have we missed?