DeNISTing in Ediscovery: What It Is and Why It Matters
by David Pemberton
Not all data is created equal in the world of ediscovery. That’s why navigating modern litigation requires a strategy for dealing with the junk data that often clogs up and slows down the ediscovery process.
Electronically stored information can include relevant and valuable data, like chat logs or email threads, alongside junk data, like application and operation files. These junk files are not user-generated, contain no relevant data, and offer no advantage in early case assessment.
Thankfully, irrelevant files can be removed before review begins with the help of a process called deNISTing.
Key Takeaways for Legal Teams
DeNISTing is an ediscovery data-filtering process that removes known, non-user-generated files such as system, application, and executable files.
DeNISTing utilizes the NIST list, which is maintained by the National Institute of Standards and Technology.
DeNISTing is different from deduping, which removes exact and near exact document copies from a dataset.
DeNISTing takes place during data ingestion and is designed to give legal teams a head start in ECA.
What Is DeNISTing?
To put it simply, deNISTing is an easy way to remove irrelevant data in the early stages of ediscovery. The process is a version of data filtering that removes system files, program files, and other immaterial data. In culling these files, deNISTing simplifies ediscovery by making ESI easier to navigate and analyze.
The files removed by deNISTing are generally essential for running operating systems and software applications, but they aren’t going to contain any valuable information for either party in a legal matter. A common example of the data culled by deNISTing would be an .exe file, which is used to install software. Anyone who has installed a new program on their computer has likely engaged with an .exe file. Once the program is installed, users are encouraged to delete the file because it’s no longer useful. When included in ediscovery, data like .exe files add unneeded complexity to review, wasting both time and money.
The NIST List
The deNISTing process is built on work provided by the National Institute of Standards and Technology, an agency of the U.S. Department of Commerce. They maintain an authoritative list of known file types that are non-user-generated with a project called the National Software Reference Library.
This list is updated quarterly, which means it stays up-to-date on new file types, thereby acting as a universal authority for the process. DeNISTing takes advantage of the NIST list by using it to identify irrelevant files.
How DeNISTing Works
Like many aspects of ediscovery, deNISTing might sound complicated at first. The important thing to remember is that the process uses the NIST list to check data against irrelevant file types. Here’s a step-by-step breakdown of how it all works:
Hash Value
DeNISTing cleans up data by using something called an “MD5 Hash Value,” which acts similarly to a physical barcode. Just as a barcode identifies a specific product on a shelf, every unique file has its own specific hash value.
Comparison
The hash value identified for each file in a dataset is then compared to the NIST list, automatically tagging any matching hash values.
Culling
The files identified will be removed during the deNISTing process, oftentimes greatly reducing data volume prior to review.
Common File Types Removed by DeNISTing
The file types culled by deNISTing are those used to install or run other programs and assets on a computer or device. Below are a few of the most common non-user-generated file types culled in deNISTing:
Executable files such as .exe, .com, .bat, and .cmd
System files such as .dll, .sysm, .ini, and .ocx
Fonts including .ttf, .fon, and .otf
Shortcut files such as .lnk and .pif
While the above list is in no way comprehensive, it’s important to remember that the NIST list is updated on a quarterly basis, which means new file types are added regularly. Thankfully, the most up-to-date version of the list is always available via the official NIST National Software Reference Library.
Why DeNISTing Matters
DeNISTing is an essential first step for reviewers facing mountains of digital evidence. Legal teams can drastically reduce data volumes before review begins by automatically filtering out the known system files identified by the NIST list.
Clearing out digital clutter also streamlines ECA by giving reviewers a considerable headstart in identifying substantive evidence. Legal teams are generally looking for specific information produced by a party in a given legal matter, information that is likely found in files like emails, documents, and chat logs. These teams don’t want, or need, access to the innumerable files that make email, word processing, and chat programs operate.
DeNISTing vs. Deduping
Deduping (or deduplication) identifies identical or near identical files within a dataset before culling redundant information. Similar to deNISTing, deduplication generates unique hashes to represent each file’s binary content. If two files share the same hash value they are considered identical, and only one copy is retained while the others are removed.
How Deduping Works
Deduping can save reviewers time and energy by avoiding the need to sift through and manually parse identical documents. Similar to deNISTing, the process begins with generating a hash value for every document.
Hash Value
Deduplication identifies redundant data by using the same MD5 Hash Value system as deNISTing. When the process begins, every file is assigned a hash based on its contents.
Comparison
During the deduplication process, the system calculates the hash value for every file and compares them against one another. When the system identifies multiple files with the exact same hash value, it flags them as duplicates.
Culling
Once duplicates are identified, the system keeps one master copy and culls the redundant versions. This process makes sure reviewers only see a unique document once, greatly reducing data volume.
Of course, deduping in no way replaces the work done by deNISTing. That’s why, when dealing with large, complex datasets, it should be considered a best practice to add both deNISTing and deduplication to ingestion workflows.
Key Similarities
Both utilize hash values to identify files quickly.
Both serve as essential filters during ECA by reducing the volume of data.
Both are best performed during the ingestion phase, after collection and before review.
Big Differences
DeNISTing removes irrelevant junk data, deduplication removes duplicate, user-created data.
DeNISTing targets non-user-generated files like .exe files, deduplication targets identical and near identical copies of documents.
DeNISTing compares file hashes against the NIST list, deduplication compares file hashes against each other.
Creating a Workflow That Works
Automatically filtering out irrelevant files is just one of the ways in which ediscovery platforms bridge the gap between complex technical processing and practical legal workflows. By utilizing deNISTing to cull massive datasets, legal teams can bypass the noise and direct their expertise toward the high-value files that actually influence case strategy.
Learn more about how Everlaw supports efficient document review workflows by helping legal teams process large amounts of data efficiently while maintaining transparency throughout pretrial discovery.
Frequently Asked Questions
What file types are removed by deNISTing?
DeNISTing targets system and application files that are not user-created. While these files (such as .exe files) are essential for running software, they almost never contain evidentiary value in standard litigation. By filtering these out during ingestion, legal teams can significantly reduce data volume without losing valuable ESI.
Is deNISTing required in ediscovery?
No, deNISTing isn’t explicitly mandated. Instead, it’s considered a best practice because it’s an automated process based on a verified national standard. Practically speaking, there are very few reasons not to take full advantage of the deNISTing process.
When should deNISTing happen?
DeNISTing takes place during the ingestion phase, after data is collected but before it is reviewed. Filtering at this stage helps to lower costs and increase efficiency by removing irrelevant system files and expediting human review.
How is deNISTing different from deduplication?
Both deNISTing and deduplication are forms of automated data filtering. Both target different file types and will result in the culling of different kinds of files. DeNISTing removes junk files that were not user-created, deduplication removes redundant copies of identical and near identical files.
David Pemberton is an associate content marketer at Everlaw. His writing explores the influence of emerging technologies on the practice of law. See more articles from this author.