What do toys stowed in playrooms, items stashed in junk drawers, and troves of documents stored electronically have in common?
It is generally much easier to find what you’re looking for in these and other locations when similar items are grouped together. Grouping like with like enables you to find what you’re looking for much more quickly and helps you see how much you have of a particular item, potentially saving both time and money. When ediscovery professionals attempt to make sense of mountains of unknown documents, knowing where to begin can be highly challenging. However, various analytic tools, such as clustering, can dramatically streamline the process.
The primary purpose of clustering is to group similar items so that users can recognize the characteristics or topics that make them similar. Clustering groups documents together based on their content, providing a higher level of understanding regarding the themes and concepts prevalent throughout the dataset. Document clustering utilizes machine learning to quickly pinpoint conceptually similar documents in a dataset without manually building a search.
How Does Clustering Work?
Clustering performs the electronic equivalent of putting documents into labeled boxes so that things only end up in the same box if they fit together. Clustering groups similar documents together and then assigns those documents to the same reviewer(s), allowing for a more efficient review because related documents can be reviewed together.
Clustering software performs three essential functions. These include:
- Examining the text contained in a set of documents
- Determining which documents are related to each other
- Grouping them into conceptually similar clusters
Clustering organizes documents according to the arrangement that occurs naturally, without query terms. Each cluster is labeled with a group of keywords, providing a quick overview of the cluster that explains what the documents have in common at a conceptual level. The keywords give an immediate indication of what each cluster contains, allowing users to identify the themes of the document set more efficiently.
With clustering, ediscovery professionals can easily filter and sort documents, allowing critical decisions regarding prioritizing and organizing documents to be made earlier in the process. Clustering allows documents with relevant themes to be prioritized and may also reveal unexpected themes that require further review. With the capability to scale millions of records, clustering allows for a more targeted review, saving teams time and resources.
Some clustering tools even have an automatic categorization capability that allows all documents sufficiently similar to a set of documents to be categorized the same way. This function dramatically reduces the amount of effort required when new documents are added to a case and enables team members to leverage the work necessary to categorize earlier documents.
Why Clustering is Critical in Ediscovery
Clustering is a powerful feature for ediscovery. Documents with similarly conceptual content are automatically grouped – with no user involvement required – allowing important documents and concepts of a case to be quickly identified. Here are some ways clustering can make ediscovery review faster and more complete:
- Enhanced focus. Clustering increases focus on a specific subject matter, even when the data is held by many custodians.
- Richer data. Clustering can improve the document set used in training the system for technology-assisted review (TAR).
- Better output. Clustering similar documents and batching them for review allows reviewers to specialize on a particular topic or type of document.
- Less clutter. One of the most effective uses for clustering eliminates irrelevant items by setting them aside or removing them from review, saving reviewer time.
- Quality control. Documents can be clustered based on critical subject matter to verify coding decisions and ensure that relevant documents in hot clusters were identified during the review.
Clustering helps ediscovery professionals separate different concepts in a way that a simple search cannot. When sifting through large amounts of data, clustering can ensure a productive workflow, a thorough process, and less irrelevant data to weed through.
Ready to Get Your Ediscovery Data Under Control?
Are you tasked with handling and understanding large sets of documents in a case or investigation? To learn the basics of how to use document clustering, register for Everlaw’s weekly live training session, Visualizing Your Data, today.