This is the second in our series on our ediscovery chapter of a legal informatics textbook. If you’re catching up, read about the reasons for the EDRM. In this series, we’re covering the ediscovery basics, including the history of the Electronic Discovery Reference Model (EDRM), core technical ediscovery concepts, the technologies powering ediscovery (encryption, machine learning, transcoding, etc.), as well as the future of ediscovery. (Download the full ebook here.)
Today we’ll dive into the left-hand side of the EDRM: information governance, identification, preservation, and collection. You can also get the ebook in full here.
The “Left-Hand Side” of the EDRM
The left-hand side of the EDRM is, for the most part, the domain of corporations. This is because, in the typical US litigation, a corporation holds the data that the parties want to discover. The goal of the tasks on this side is to prepare for potential litigation and to quickly respond when litigation arises.
As such, the technological issues here are primarily search-related rather than discovery-related. Search is what powers the management of data at its source, the identification and preservation of potentially relevant information within the source environment, and the ultimate retrieval of that information from those sources. The true discovery work—separating the wheat from the chaff, using a nuanced understanding of what’s actually relevant in the litigation—is the province of the right-hand side of the EDRM.
Information Governance refers to the process of housing and organizing information within an organization, including everything from emails and voicemails to spreadsheets and instant messages, and deciding whether and what to delete over time. This stage is so complex, however, that the EDRM organization created an entirely separate model, the Information Governance Reference Model, to frame the discussion around the activities it contains.
As you might expect, information governance is a collaborative exercise, involving close coordination between the people creating and using the data (i.e., business users), the people managing the infrastructure that contains that data (i.e., the IT department), and the people responsible for looking ahead to possible legal or regulatory risks that may involve that data (i.e., the legal and compliance teams). As the EDRM organization itself notes in the annotations to the Information Governance Reference Model, “it takes the coordinated effort of all three groups to defensibly dispose of a piece of information that has outlived its usefulness, and retain what is useful in a way that enables accessibility and usability for the business user.” In particular, they must work together to ensure that the organization is not retaining information that has no business or legal value because it is redundant, obsolete, and/or trivial (lovingly referred to as “ROT” data).
The tools in this space are wide-ranging. End users will of course use whatever they need to perform their work, from Outlook to AutoCAD, and will store their work product in myriad places, from laptops to Dropbox. The IT and legal departments work behind the scenes to tame this wilderness, using technology baked into their document management platforms (e.g., Office 365’s Security & Compliance Center, Google Vault) and/or dedicated information governance tools (e.g., Rational Governance, Sherpa Software’s Altitude IG). Whatever they use, their overriding goal is to minimize legal exposure from ROT data by defining defensible retention policies and then automating enforcement of those policies.
The Identification stage of the EDRM is primarily concerned with determining which sources of information are likely to be relevant to a given matter. That last part is key: at this stage, there is now a matter at hand, so—in contrast to the more general preparatory work that occurs as part of Information Governance—all efforts from here forward are focused on the specific, evolving demands of a particular matter.
One cannot identify potentially relevant sources of information without an inventory of those sources, and that is the purpose of a data map. The data map provides a comprehensive picture of an organization’s data sources—including everything from email servers to financial systems to backup tapes—and the devices used to access them. It accounts for legacy data from systems no longer in use, and the hardware, software, and technical expertise that may be required to access such data. It also includes an assessment of potential cloud and third-party data sources.
Much of the work here is done with the common tools used to create inventories in any domain. Spreadsheet or project management software may be used to plan and track the interviews of key witnesses and custodians; presentation or mind-mapping software may be used to visualize the data map; email or IM software may be used to communicate about interviews, timeframes and keywords; shared folders may serve as repositories for relevant data retention policies, organization charts, etc.; and the ultimate inventory of identified data sources may be kept in a spreadsheet, database, or simple list.
Once relevant data sources have been identified, Preservation involves holding on to potentially relevant data in a way that is defensible and compliant with legal obligations, but without overburdening the custodians of that data or the wider organization.
Ultimately, the goal in this step is to mitigate risk. This involves a delicate balance between the organization’s legal obligations and its desire for efficiency and minimal legal exposure. A preservation scheme that is disproportionately broad, for instance, will not only waste resources (human and otherwise), but could also lead to the unnecessary retention of risky data. This stage is therefore as much about what should be deleted as what should be preserved, with a bias toward removal whenever it is reasonable, auditable, and legally defensible.
For all of its importance, the most common preservation techniques are relatively unsophisticated. They involve contacting relevant custodians, usually by email, with instructions not to delete certain data from their systems; this is known as a legal hold notice. While some simple tools for initiating legal holds can automate this process, with administrators able to send initial and reminder emails to the appropriate people with just a few clicks of the mouse, the task of actually retaining the relevant data falls to the end user. Some Information Governance tools (e.g., Rational Enterprise’s Rational Governance, Sherpa Software’s Altitude IG) and business productivity platforms (e.g., Office 365’s Security & Compliance Center, Google Vault) offer more advanced integration, not only simplifying the process of issuing legal holds but also managing the actual retention of target data, since they can be preserved “in place” without the need for copying and archiving elsewhere.
Collection is the process of gathering relevant data from everywhere it resides—be that corporate servers, workers’ laptops, phones, or anywhere else—all in a forensically sound manner. Preservation and Collection are shown vertically stacked on the EDRM diagram to indicate that they often happen in parallel, if not entirely in lockstep. It’s easy to imagine why: at some point, preserved data will likely need to be collected and made accessible to legal teams for review, and that process itself may alter the assessment formulated in the Identification stage or expand the scope of required preservation.
The key consideration here is defensibility. Has the collection process accounted for all data? Have they been collected in a way that is forensically sound, properly preserving the original metadata to the fullest extent possible? Can we document the chain of custody? These questions are not for the faint of heart, as they go to the very credibility of the data as digital evidence. It is therefore not surprising that Collection is the province of an entire field of experts steeped in defensible data retrieval, and that such retrieval is often entrusted only to these experts.
Similarly, collection tools are often highly specialized. While anyone can transfer files from a laptop to a thumbdrive using Windows Explorer, or download email from a Gmail account using Google Takeout, collecting these data in a forensically sound manner is another matter. To meet this challenge, the digital forensics industry has created an endless array of tools, from general-purpose suites like Access Data’s Forensic Toolkit (FTK) to specialized software like Elcomsoft Cloud eXplorer, which is focused only on gathering Google account data.
Tune in next time for our favorite: the Right-Hand Side of the EDRM. We’ll cover data processing, review, analysis, and production.