skip to content

Current Ediscovery Considerations and Challenges

Managing ediscovery projects to support litigation, investigations and other use cases is more challenging than ever. Historically, ESI has been stored in traditional forms, like emails and office files stored locally where reliable workflows have been established to preserve, collect, process, review and produce that ESI.

This is no longer the case. The ESI of organizations today is stored in a wide variety of forms and locations. We’re creating ESI at an unprecedented rate and storing it on mobile devices and in the cloud within enterprise solutions, which creates unprecedented challenges that are often interrelated – for example, the increased use of emojis occurring on mobile devices and the dramatic rise of the use of hyperlinked files instead of traditional attachments within enterprise solutions.

Discovery today requires a wide variety of workflows and state of the art technology (leveraging automation, analytics and generative AI) to support the various forms and sources. This chapter discusses several current ediscovery challenges and how legal and ediscovery professionals are addressing those challenges.

Big Data

The first challenge that interrelates to all the other challenges is big data. The big data era has fundamentally transformed the landscape of ediscovery, reshaping the volume, variety, and velocity of ESI that legal teams must manage during litigation and regulatory matters. The estimated volume of global data created has risen from 2 zettabytes in 2010 to 149 zettabytes last year – which is 149 trillion gigabytes! And it’s expected to grow to more than 394 zettabytes by 2028!

Today, ediscovery is no longer confined to emails and office documents; it must encompass a vast array of data types, including structured enterprise data, chat messages, collaboration app content, cloud-based files, multimedia, and metadata-rich logs generated by modern business systems. This exponential data growth creates both strategic opportunities and operational headaches for legal teams trying to manage risk, ensure defensibility, and control costs.

One of the most significant impacts of big data in ediscovery is the sheer volume of information that must be preserved, processed, and reviewed. Organizations now generate terabytes of data daily, much of it stored across decentralized systems and personal devices. The traditional linear review model is unsustainable in this environment due to time and cost constraints.

As a result, legal teams are under increasing pressure to adopt scalable, technology-driven approaches to filter, prioritize, and review large datasets more efficiently. Predictive coding, continuous active learning, and analytics-driven ECA are now essential tools to triage data, surface key evidence, and reduce the human burden of document review.

Beyond volume, the variety of data poses a major challenge. Data is no longer uniform or easily exportable. It exists in nested, complex formats from tools like Slack, Microsoft Teams, Salesforce, and mobile apps, which lack a standardized structure for discovery. These sources generate short-form, conversational content – often lacking in context – that must still be defensibly collected and reconstructed. Legal teams must navigate complicated data schemas, time-stamped edits, emoji reactions, hyperlinks, and dynamic document links to make sense of communications. Moreover, structured data from enterprise platforms may require custom queries or targeted extractions, further complicating the collection and review process.

Big data also accelerates the velocity at which relevant data is generated and must be captured. Legal holds must now extend rapidly across a distributed digital environment to avoid spoliation. Custodians may be collaborating in near-real-time across multiple tools, platforms, and jurisdictions. This speed of communication demands an equally fast and coordinated legal response – necessitating automated workflows for hold notifications, custodian tracking, and data preservation triggers. Delays or gaps can result in sanctions or adverse inference rulings if critical data is lost.

The volume of information, the velocity (i.e., speed) at which it is created and collected, and the variety or scope of the data points being covered is commonly known as the “Three V's” of big data. Here’s a visual representation of the “Three Vs”:

Ediscovery Guide Chapter Nine - Three V's
Figure 1: Three Vs of Big Data (Source: Coforge)

 To address these challenges, ediscovery technology has evolved into a more sophisticated ecosystem of integrated capabilities. AI-powered data analytics are now routinely used to identify communication patterns, detect anomalies, and cluster documents by topic or concept. These tools allow legal teams to make sense of massive datasets and uncover key evidence more quickly. Review platforms are increasingly incorporating visualization dashboards, communication maps, and metadata analysis to support early insights and reduce document sets before first-pass review even begins.

Cloud-native ediscovery platforms also offer advantages in handling big data. These tools are designed to elastically scale with data size, provide faster ingestion and indexing of varied data types, and support collaboration across geographies. Integration with enterprise applications and APIs enables more defensible and seamless collection from complex sources. Moreover, advanced processing engines now support deduplication, near-duplicate detection, email threading, and OCR at scale – streamlining the data pipeline while preserving defensibility.

However, these capabilities must be effectively managed by legal teams applying defensible processes. Overreliance on technology without human oversight can result in missed documents, misclassified privilege, or overlooked patterns. Validation protocols, audit trails, and defensible processes must be in place to support the integrity of TAR. Courts continue to expect reasonable, proportional efforts to identify and produce relevant information – and the complexities of big data do not excuse sloppiness or inattention. It takes people, processes, and technology to address the challenges presented by big data.

Mobile Device Data

Mobile device data is playing an increasingly critical and complex role in ediscovery today. As smartphones and tablets become primary tools for business communication, collaboration, and productivity, they generate a vast and dynamic body of ESI that legal teams must contend with during litigation and investigations. The pervasiveness and fluidity of mobile devices have made them central to disputes involving employment matters, intellectual property, regulatory compliance, and beyond.

While text messages are the type of mobile device data most often sought in discovery, many other types of ESI – such as files, photos, videos, notes, voice memos, call logs and geolocation data – are frequently relevant in discovery as well. This Mobile Evidence Scorecard chart by Craig Ball illustrates common types of mobile device data and their typical role in discovery.

Ediscovery Guide Chapter Nine - Mobile Evidence Burden
Figure 2: Mobile Evidence Burden and Relevance Scorecard (Source: Craig Ball)

 One of the foremost challenges presented by mobile data is its fragmented and ephemeral nature. Unlike corporate email servers or enterprise file shares, mobile devices are highly personal, constantly changing, and often not centrally managed. Data can reside in SMS/iMessage threads, encrypted messaging apps (e.g., WhatsApp, Signal), ephemeral communications platforms (e.g., Snapchat), or within enterprise collaboration apps like Teams and Slack running on mobile. Even seemingly simple data, like an image attached to a text, can be difficult to extract in a forensically sound manner without specialized tools. This complexity is further compounded by the use of multiple devices, cloud syncs, and selective deletion features that create a moving target for preservation and collection.

Bring Your Own Device (BYOD) policies where employees use their personal devices for work have added significant complications. BYOD blurs the line between personal and corporate data, raising serious concerns about privacy, data segregation, and legal ownership. From an ediscovery perspective, the key issue becomes one of possession, custody, and control.

Organizations often face the question: Do they have legal control over data on a device they do not own? Courts have increasingly held that if the organization has a policy authorizing or requiring work-related communications on personal devices, and the data is relevant and within the party’s control, it must be preserved and potentially produced. In this environment, well-crafted BYOD policies are essential – they should clearly define acceptable use, the company’s right to access work-related content, and responsibilities for data preservation and deletion upon separation.

Preservation and collection of mobile device data must be approached with particular care. Standard enterprise legal hold tools often do not reach into personal smartphones, making early identification of mobile custodians and appropriate data sources essential. Forensic collection tools such as Cellebrite or Oxygen Forensics are typically required to extract data in a defensible manner. However, these tools come with cost, privacy, and proportionality considerations. Legal teams must balance the need for comprehensive data collection with respect for personal privacy and legal proportionality, particularly in cases where only limited categories of mobile content are relevant.

Adding to the challenge is the lack of standardized export formats. Mobile messaging data often lives in proprietary or semi-structured formats (e.g., SQLite databases) that don’t lend themselves to traditional document review workflows. Reconstructing conversations with context – with timestamps, sender/receiver info, emojis, and attachments intact – requires specialized parsing and normalization. Without careful processing, key nuances in tone, meaning, or intent may be lost. This is particularly important in cases where short-form messaging can serve as the “smoking gun” evidence.

To address these issues, ediscovery technology has evolved to include mobile-specific capabilities. Modern review platforms increasingly support ingestion and rendering of mobile message data in conversation view formats, preserving context and metadata. Some platforms offer connectors or APIs to directly collect from cloud backups or enterprise-managed mobile apps, reducing the need for full-device imaging. These tools can apply analytics such as deduplication, threading, and sentiment analysis to mobile content, helping reduce the volume and prioritize review.

Technology alone, however, is not a silver bullet. A proactive information governance strategy is critical. Organizations should review and update their BYOD policies to reflect current legal expectations and technology realities. They should educate employees on their responsibilities, especially concerning the preservation of work-related messages and content. Clear procedures for legal holds, device returns, and employee offboarding can mitigate downstream discovery risks.

Courts are also growing more sophisticated in their understanding of mobile data. Litigants are expected to be able to demonstrate reasonable efforts to preserve and collect relevant content. Failure to do so can lead to sanctions, including adverse inference rulings or even case dismissal. That means counsel must be prepared to address mobile data issues during Rule 26(f) conferences, meet and confers, and ESI protocol negotiations.

Mobile device data is rich in evidentiary value but fraught with technical, legal, and privacy complexities. Addressing the mobile device data challenge successfully requires a blend of policy, process, and platform. Legal teams that embrace mobile-savvy discovery practices, supported by defensible technology and proactive governance, will be better positioned to manage risk, comply with obligations, and uncover key facts in the modern digital workplace.

Enterprise Solutions

Enterprise solutions – such as Microsoft 365, Google Workspace, Salesforce, ServiceNow, and other cloud-based platforms – have revolutionized how organizations collaborate, communicate, and manage information. These tools offer powerful benefits in terms of productivity and scalability, but they have also fundamentally changed the game for ediscovery. Instead of retrieving emails and office files stored on corporate servers or local drives, legal teams must increasingly contend with dynamic, distributed data that lives across complex, interlinked enterprise ecosystems. These systems generate vast volumes of content, store it in proprietary formats which change constantly – complicating legal hold, collection, and review workflows.

Enterprise solutions are a significant contributor to the big data challenge that ediscovery professionals must address. Collaboration tools like Microsoft Teams, Google Chat, and Slack create threaded, time-stamped, and media-rich conversations that often span hundreds or thousands of messages. Shared documents can be co-authored in real time, edited after being sent, or embedded with dynamic links and comments. Cloud storage platforms allow users to share files via hyperlinks rather than attachments, creating a challenge in capturing and preserving the context of communications. At the same time, enterprise platforms like Salesforce, Workday, or Jira generate structured data that is transactional and frequently updated. Specialized queries are typically required to extract relevant information in a meaningful form.

Enterprise solutions introduce several key challenges for ediscovery:

  • Data Accessibility and Control. In many enterprise environments, data is stored in third-party cloud infrastructures, which complicates direct access for legal holds or collection. Standard IT tools may not provide the granularity needed for defensible preservation, particularly for shared channels, private chats, or deleted items.

  • Preservation Complexity. Legal teams must understand how each platform stores, retains, and deletes data. For example, Microsoft 365 uses retention policies, versioning, and user-configured settings that may allow data to be deleted or altered before legal holds are applied. If the organization hasn’t configured these settings properly, key evidence can be deleted before a legal hold can be applied.

  • Hyperlinked Content. Increasingly, enterprise users send links to cloud-stored documents rather than traditional attachments. This shift breaks the conventional document family structure used in ediscovery. Preserving, collecting, and reviewing these links requires new technical and legal strategies.

  • Authentication and Context. Many enterprise tools enable collaborative editing, comments, and version histories, which complicate the question of authorship and intent. Identifying “who said what” becomes more difficult when users can edit or overwrite content, so platform logs are often needed to reconstruct actions.

  • Privilege and Privacy Risks. Enterprise systems often contain highly sensitive data, including HR records, financial data, or privileged legal communications. Identifying and protecting this content during collection and review is more difficult when data is comingled, versioned, or resides in structured databases.

To address these challenges, ediscovery technology must be tightly integrated with modern enterprise platforms and capable of handling both unstructured and structured content. Leading platforms have developed enterprise-grade connectors to Microsoft 365, Google Workspace, Slack, and others, enabling targeted and defensible collection of cloud-based content, including metadata, permissions, and file relationships. These integrations enable ediscovery professionals to perform targeted collection, ensuring proportionality and minimizing overcollection.

Artificial intelligence (including generative AI) and analytics also play a growing role in taming the complexity of enterprise data. Entity extraction, sentiment analysis, and communication mapping can help identify key actors, spot anomalies, and prioritize review sets. AI-driven tools are increasingly being used to identify hyperlinked documents, trace sharing histories, and reconstruct custodial timelines. These capabilities not only accelerate review but also help legal teams understand the business context of communications and documents, which is essential in complex disputes or investigations.

Enterprise solutions have significantly increased the complexity of ediscovery by decentralizing, diversifying, and dynamically changing the nature of ESI. But with the right combination of technology, governance, and cross-functional planning, these challenges can be transformed into an opportunity to modernize and future-proof the legal discovery process. Legal teams that build deep technical fluency with enterprise platforms (and invest in the tools and partnerships to extract, process, and review their data) will be far better positioned to navigate today’s digital legal landscape with speed, defensibility, and strategic insight.

Collaboration Apps

Collaboration apps are a subset of enterprise solutions, but the unique challenges associated with them warrants discussion. Apps like Slack, Microsoft Teams, Zoom and Google Chat have become central to modern workplace communication – replacing or supplementing email with real-time messaging, video conferencing, file sharing, and integrated workflows. These tools have driven major gains in productivity and flexibility, particularly in remote and hybrid work environments.

However, for legal and compliance teams, they’ve also introduced significant complexity into the ediscovery process. The fluid, conversational, and often decentralized nature of data generated by collaboration platforms requires a fundamental shift in how ESI is preserved, collected, reviewed, and produced in legal matters, creating several challenges for ediscovery professionals.

Fragmented Communications

One of the most significant impacts of collaboration apps on ediscovery is the sheer volume and fragmentation of communication. These platforms create sprawling data sets that often include short, rapid-fire messages, emojis, gifs, hyperlinks, and reactions – all of which could be potentially relevant in litigation and investigations. Although emails, which often exist as self-contained conversations with attachments, a single conversation thread within collaboration apps is decentralized, making it difficult to isolate relevant content. These conversations also may include embedded links to documents, meeting recordings, calendar invites, or ticketing systems, all of which can contain discoverable information.

Collection

Collection and export limitations also hinder defensible discovery. While some enterprise versions of collaboration tools offer ediscovery APIs or compliance export tools, the quality, granularity, and format of these exports can vary widely. For example, Slack’s standard export may not include private channels or direct messages unless the organization subscribes to Enterprise Grid and has enabled specific legal hold or compliance tools. Microsoft Teams data is stored across multiple Microsoft 365 services (Exchange, SharePoint, OneDrive), requiring nuanced understanding to assemble a full picture. Additionally, the exported data may be presented in JSON or other formats that are not readily reviewable without specialized processing.

Searching and Reviewing

Searchability and reviewability present further issues. Many collaboration platforms lack robust native search capabilities, and their exports are not structured for efficient keyword searching or linear document review. Reconstructing conversations in a meaningful, chronological, and review-friendly way requires advanced processing tools that can parse metadata, recreate threads, and support features like speaker attribution, timestamp normalization, and message linking. Without this, reviewers are left sifting through incomplete, disjointed data that can obscure key facts and increase review costs.

To address these challenges, modern ediscovery technology is evolving to meet the unique needs of collaboration data. Advanced processing engines can now ingest native exports from Slack, Microsoft Teams, and other platforms, parse complex data structures, and present conversations in a reviewable “chat view” that preserves the original look and feel. These tools can maintain conversational integrity, apply metadata normalization, and even flag edits or deletions – helping reviewers see what was said, when, and by whom.

Additionally, ediscovery platforms are integrating AI and analytics tools to surface relevant content more efficiently. These include features such as entity recognition (to identify key participants), sentiment analysis (to detect tone and potential misconduct), and communication mapping (to visualize interactions across custodians and time). These technologies are especially valuable in internal investigations or regulatory inquiries where the volume of communications is high, but the signal-to-noise ratio is low.

Organizations should work with legal, compliance, and IT teams to understand the data flows, retention settings, and export capabilities of their collaboration platforms. Legal hold processes must be updated to ensure timely preservation of chats, files, and metadata. ESI protocols should explicitly address the treatment of collaboration data – including what constitutes a “document,” how messages will be produced, and how hyperlinked content or embedded files will be handled. Failure to preserve or produce collaboration content can lead to sanctions or adverse rulings, just as it can for emails. Organizations that invest in the right discovery infrastructure and governance frameworks will not only be better equipped to respond to legal demands but also reduce risk and gain faster insight into the facts of a case.

Hyperlinked Files

Hyperlinked files are rapidly becoming one of the most disruptive challenges in ediscovery today. As organizations increasingly rely on cloud-based collaboration platforms, the traditional model of attaching a document to an email is being replaced with hyperlinks to cloud-hosted versions of those files. While this shift has brought greater efficiency and version control to business workflows, it has introduced serious complexity for legal and ediscovery professionals.

Historically, ediscovery workflows were built around the idea of a document family, where an email and its attachments were grouped and reviewed together. This concept breaks down in the hyperlinked world. A user sending a message that reads “please see the attached report” may actually be referring to a OneDrive or Google Drive link that points to a live, shared document.

Ediscovery Guide Chapter Nine - Email
Figure 3: Email with Traditional Attachment and Email with Hyperlinked File

That document may contain edits from multiple collaborators and may be stored in a location with access permissions that differ from the communication itself. If the linked file is not preserved at the time of the message, it may be changed or deleted before collection, resulting in a loss of potentially relevant evidence. As a result, hyperlinked files challenge long-standing assumptions about what constitutes an attachment, which has led to several case law disputes where requesting parties say they should be treated as modern attachments and produced along with the email containing the link while producing parties say they are not attachments, and it is overly burdensome to do so.

Preservation

One of the most pressing challenges with hyperlinked files is preservation. Legal hold and information governance practices must now extend beyond static email and file servers to cloud storage platforms where linked documents reside. It is no longer sufficient to preserve the message alone; organizations are forced to consider preservation of the specific version of the hyperlinked document as it existed at the time the link was shared. Without doing so, they risk spoliation claims or failing to produce critical evidence that may have been substantively different in later versions.

Collection

There are also technical challenges associated with collection. Hyperlinked documents are often stored in distributed cloud repositories that require authentication, specific user credentials, and granular permission management. Simply knowing a hyperlink exists is not the same as having access to the underlying document.

In Microsoft 365, “cloud attachments” (Microsoft’s term for hyperlinked files) may not be collected automatically unless the organization uses specialized tools or configures its compliance platform correctly. Google Workspace presents similar hurdles. Additionally, even when documents are accessible, exported hyperlinks may not resolve properly in review platforms unless collection tools capture the file, its metadata, and the access pathway in a structured and traceable way.

Context and Relationship Preservation

Another major issue is context and relationship preservation. Hyperlinked files often lose their connection to the message or conversation that referenced them once they are exported or processed. This breaks the logical grouping between the message and its content, making it harder for reviewers to understand the relevance or significance of the information. Reviewers may see a vague reference to a topic in a chat or email, without realizing that the important details associated with that topic reside in an uncollected or mislinked file. Without context, the discoverability and probative value of the message diminish, and key facts can be overlooked.

To address these challenges, ediscovery technology is evolving in several ways. Modern legal hold and collection platforms increasingly offer cloud-native connectors that can collect not just the communication, but also identify and retrieve the specific versions of hyperlinked files at the time they were shared. For example, platforms that integrate with Microsoft Purview or Google Vault can collect both the message and linked file – along with metadata such as file owner, access rights, last modified date, and version history. These tools can also preserve the relationship between the message and the linked content, helping to reestablish the "document family" concept in a cloud-based world.

Additionally, ediscovery review platforms are developing capabilities to reconstruct and present hyperlinked files as modern attachments in context. This includes showing reviewers the email or chat message alongside a snapshot or copy of the hyperlinked file, along with any relevant comments, permissions, and versioning information. Advanced platforms may also allow for redlining or comparison of different versions to flag substantive changes over time.

To address the hyperlinked files challenge proactively, organizations must update their ESI protocols and data governance policies to address this content explicitly. During Rule 26(f) conferences and negotiations with opposing counsel, it is essential to define whether and how hyperlinked files will be treated as attachments, how versions will be handled, and what constitutes a “complete” production. Legal and IT teams should also collaborate to understand the default behaviors of their platforms – such as whether cloud-stored documents are preserved when emails are retained, or whether legal holds extend to linked files automatically.

Hyperlinked files represent a fundamental shift in how digital evidence is created, stored, and transmitted. While they improve business productivity, they introduce serious complications for ediscovery – especially if organizations rely on legacy workflows or incomplete preservation strategies. Legal teams that recognize the significance of this shift, and invest in cloud-aware tools, updated protocols, and collaborative governance, will be far better prepared to meet their discovery obligations in a cloud-first world. Hyperlinks may be just one click away – but in litigation, that click can lead to a maze of complexity unless properly mapped and preserved.

Structured Data

Structured data is one of the most misunderstood types of ESI in ediscovery today. Unlike traditional unstructured data such as emails, PDFs, or Word documents, structured data is generated and stored within databases and applications in a highly organized format. It is the backbone of enterprise systems like Salesforce, SAP, Oracle, Workday, and ServiceNow, as well as internal databases that track customer interactions, transactions, HR records, project workflows, and more. As organizations increasingly rely on these systems to manage business operations, the relevance of structured data in litigation, investigations, and regulatory matters has grown significantly.

Structured data presents several unique technical, legal, and strategic challenges in the discovery process. One of the primary difficulties is that structured data is not inherently document based. It often resides in relational databases, consisting of tables, rows, and fields that are interrelated through keys and logic. There is no native record that can be printed or reviewed like a traditional document. This means that when structured data becomes relevant to a case – such as sales pipeline history in Salesforce, or incident logs in ServiceNow – legal teams must determine what data to extract, how to format it for review, and how to present it in a way that is both meaningful and defensible.

A common misstep in the early days of structured data discovery was the bulk export of massive spreadsheets or data dumps from back-end systems. These exports often stripped away important metadata, broke the relationships between fields and records, and failed to reflect how the data was actually used in context. As a result, the collected ESI was often unwieldy, incomplete, and/or misleading. Today, the focus has shifted toward targeted, defensible extractions based on specific queries or timeframes – often developed in close collaboration between legal teams, subject matter experts, and IT professionals who understand the database structure.

Another major challenge is scope and proportionality. Enterprise applications often hold years’ worth of data across thousands of users, many of whom are irrelevant to the legal matter at hand. Pulling everything is not only costly and time-consuming, but doing so can also potentially violate privacy regulations or create an undue burden under FRCP 26(b)(1). As a result, structured data discovery must be approached with custom SQL queries or leveraging application-specific APIs that lead to targeted collections that help ensure relevance while managing scope.

Structured data also raises complex issues around formatting and usability. Once exported, the data must be transformed into a format that reviewers can understand. This may involve normalizing records into readable reports, flattening data from multiple tables into a single output, or converting it into reviewable images or PDFs. In some cases, data may be loaded into analytics platforms that can visualize trends, detect anomalies, or identify outliers – helpful in investigations or class-action litigation. More sophisticated review platforms can now ingest structured data in native or semi-structured formats and allow reviewers to filter, search, and analyze it within a database-like interface.

Legal defensibility is another important concern. It’s not enough to collect and produce structured data – the process must be documented and reproducible. This means maintaining detailed documentation related to the extraction methodology: who conducted the export, what tools were used, and what filters or criteria were applied. Courts have increasingly emphasized the need for transparency in handling structured data, particularly when it forms the basis of evidence or damages calculations. In some cases, parties may be required to produce not just the data, but also the schema, data dictionary, and even sample screenshots to explain how the data was used in its native environment.

To meet these challenges, some tools now offer direct integrations with enterprise platforms, allowing for secure, compliant extraction of structured data with metadata and user permissions intact. Others provide visualization dashboards, which help teams make sense of complex data sets before formal review. Data transformation tools can map fields, reformat records, and preserve relational integrity, while AI-powered analytics can detect key patterns and correlations that might not be visible in raw exports.

Effective structured data discovery requires legal teams to work closely with IT, business stakeholders, and forensic experts to understand how data is generated, used, and stored. It also demands an iterative approach: identify what’s relevant, test queries, validate results, and adjust as needed. Building this capability in-house or working with external providers who specialize in structured data can significantly improve defensibility and reduce costs.

Emojis

Historically the playful language of text messages, emojis have become a pervasive part of modern digital communication, with significant implications for ediscovery. As business communication increasingly occurs over chat and collaboration platforms, emojis are no longer limited to informal exchanges. They now appear in professional contexts, influencing the tone, intent, and even legal interpretation of communications. As a result, emojis are routinely becoming critical elements of evidence, with courts, litigators, and review teams having to grapple with their meaning, admissibility, and discoverability. This chart from Eric Goldman illustrates the rise of importance of emojis in litigation today.

Ediscovery Guide Chapter Nine - Emojis
Figure 4: Cases Involving Emojis and Emoticons (Source: Eric Goldman)

A simple thumbs-up can signify agreement in one context (as a court ruled in this case) and passive-aggressive dismissal in another. A winking face might imply humor or flirtation. In legal disputes, especially those involving harassment, discrimination, defamation, or intent, these subtleties can be pivotal. Courts are increasingly recognizing that emojis can materially affect the interpretation of digital evidence, as reflected in recent rulings where emojis were admitted to establish state of mind or corroborate claims.

However, legal teams face several technical hurdles when dealing with emojis in discovery. First, not all platforms store or export emojis in a consistent way. Some tools replace emojis with text-based Unicode equivalents (e.g., 😊 becomes U+1F60A), while others may render emojis as images or strip them out entirely during export. This inconsistency makes it difficult to preserve the original appearance and meaning of the message unless specialized technology is used to recognize and normalize emoji data during processing.

Emojis also often appear differently across platforms and devices, depending on the operating system, app version, or vendor-specific emoji set (e.g., Apple vs. Android vs. Slack). The same emoji code can look cheerful in one interface and sinister in another. This visual variation complicates legal interpretation, as the sender’s and recipient’s views of the message may have differed based on how the emoji rendered on their devices. In high-stakes cases, parties may need to include screenshots or platform-specific renderings to demonstrate the emoji’s appearance at the time of communication.

Another major challenge is searchability. Traditional keyword search methods are not well-suited for finding emojis. Unicode values must often be used for accurate search strings, and many review platforms are not configured to support emoji-specific indexing out of the box. Without proper handling, emojis may be overlooked entirely in review workflows, which can create blind spots in privilege, responsiveness, or issue coding. Worse, legal teams may fail to appreciate the evidentiary importance of emojis until it’s too late to recover or contextualize them.

To address these challenges, leading ediscovery platforms are beginning to build emoji-aware functionality into their search and review capabilities. This includes recognizing and indexing emojis during ingestion, rendering them visually in review interfaces, and allowing users to filter or search by emoji type.

In terms of best practices, legal teams should take proactive steps to incorporate emoji considerations into their discovery workflows. During ECA, they should identify data sources where emojis are likely to be used – such as chat logs, collaboration apps, or mobile communications – and confirm whether their tools can handle emoji extraction and rendering. In custodial interviews, counsel may even ask how and where emojis were used, particularly in cases involving interpersonal conduct or sensitive conversations. Review protocols should include guidance on evaluating emojis in context, and privilege or issue coding may need to account for their potential implications.

AI-Generated Content

Content created by generative AI is one of the fastest growing sources of ESI, leading to increasingly complex challenges in ediscovery. As legal professionals, businesses, and individuals adopt tools like ChatGPT, Google Gemini, and Microsoft Copilot to draft emails, summarize documents, generate reports, or respond in chats, this synthetic content becomes part of the ESI landscape. The implications for ediscovery are significant because this AI-generated content can influence decisions, shape communications, or even fabricate information, and it must be considered in the context of legal obligations around preservation, review, and production.

One of the primary challenges posed by generative AI content is authorship and attribution. Unlike traditional communications that reflect the voice and intent of a human author, AI-generated content may lack a clear origin or be partially composed from automated suggestions. This raises questions about who is responsible for the content, whether it reflects deliberate action, and how to assess credibility or intent particularly in cases involving contractual obligations, compliance violations, or internal investigations.

Preservation and searchability are also problematic. AI content may exist only in drafts, ephemeral chat windows, or dynamic platform environments (e.g., Copilot in M365), making it hard to identify, preserve, and collect. Every platform may export AI generated content in a different form; for example, ChatGPT enables users to export their chat history in a compressed ZIP file, containing the chat history in both HTML and JSON format files, with images individually exported as separate files.

Ediscovery Guide Chapter Nine - HTML File
Figure 5: Example of HTML File Containing Chat History Exported from ChatGPT

To address these challenges, ediscovery technology must evolve to recognize and manage AI-generated content. This includes platform integrations that can capture AI interaction logs or identify metadata signals from tools like Microsoft 365 Copilot. Additionally, legal hold strategies must be updated to ensure that work product from AI systems (especially in regulated environments) is preserved along with human-authored content.

Generative AI is not only reshaping how ediscovery is conducted, but it’s also reshaping the digital evidence landscape by introducing a new class of content that is harder to track, interpret, and attribute. Legal teams must adapt their tools and workflows to ensure that this content is discoverable, defensible, and appropriately understood within the broader narrative of a case.