Why Machine Transcription Matters in Ediscovery
This is another post in our ediscovery series covering the ediscovery basics, core technical ediscovery concepts, the technologies powering ediscovery, and the future of ediscovery. Check out previous posts covering topics like the EDRM, cloud computing, machine learning, and machine translation. Or get the ebook in full here.
We’ve covered in depth some of the modern technologies being applied to ediscovery and today we’ll look at yet another—machine transcription.
While technology like transcoding might make media files more accessible, it still does not unlock what is often the most valuable content within them: human speech. Enter machine transcription. This technology converts spoken words in audio or video files into written text, making them searchable and thus just as accessible as other documents.
Like translation, transcription is a task once exclusively performed by humans. And, like translation, humans remain the gold standard when it comes to the quality of the transcription. Machine translation has introduced levels of efficiency, however, that are simply unmatchable by humans, opening up affordable transcription at scale with relatively minimal sacrifice in quality.
The latest machine transcription approaches are also based on deep learning through neural networks. Both Google’s Cloud Speech API and Microsoft’s Bing Speech API use deep learning to deliver more accurate results. Many of these services can also deliver additional information, such as time codes, speaker identification, and speaker sentiment analysis.
Ultimately, it is searchability that matters most in ediscovery. As the volume of data in a litigation today makes it increasingly infeasible to manually review each document, it becomes even more important to ensure that all content is indexed and searchable. Only with machine transcription does it become cost-effective to meet that need in even the largest of matters.
Critical to all those PDFs you need to upload into the next system, our next piece in the series will explore Optical Character Recognition, the task of recognizing text in images.