eDiscovery Terminology and Definitions

Author: Andy Reisman 2-minute read

Although eDiscovery preservation obligations typically begin when a party reasonably anticipates litigation, organizations often begin planning before such needs arise. At a high level, the eDiscovery lifecycle consists of the following stages, which often can overlap or recur within a particular case.

The eDiscovery Lifecycle

  1. Information Governance: This phase is not specific to a particular case, but rather is the process of planning for eDiscovery and management of electronically stored information (“ESI”) in order to best manage eDiscovery risks and costs.
  2. Identification: The identification stage involves determining the scope of ESI for a matter, such as custodians who possess ESI, data categories, and ESI storage locations.
  3. Preservation: The preservation stage ensures that ESI cannot inappropriately be altered or destroyed. This typically should occur when litigation reasonably is anticipated.
  4. Collection: The collection phase involves copying ESI for subsequent use in the discovery process, for example through the use of a digital forensics provider who can collect ESI in a verifiable manner without altering potentially relevant metadata.
  5. Processing: The processing stage typically includes reducing the volume of ESI through automated means, such as application of date restrictions and search terms, and conversion of ESI into a format that can be more easily reviewed and analyzed, such as load files that can be imported into a hosted review platform.
  6. Review: The review stage involves reviewing ESI for responsiveness, such as to discovery requests, and privilege. Many lawyers use a hosted document review platform to conduct more efficient document reviews, such as Relativity, iConect, and others.
  7. Production: The production stage consists of turning over relevant non-privileged ESI for review by other parties, ideally based on agreed-upon production specifications.
  8. Presentation: The presentation stage involves displaying ESI that has been reviewed in various forums, such as depositions, hearings, mediations, and/or trials. Electronic evidence can be presented to assist witness testimony, demonstrate key facts, or persuade the finder of fact.

Glossary of eDiscovery Key Terms

Assisted Review: This method of review utilizes technology such as predictive coding and advanced machine learning to apply reviewers’ coding decisions to a broader data set, thereby decreasing review time and costs.  

Batching: The process of dividing data sets into groups for processing or review, often organized by a single custodian or issue.

Bates Stamps: Alpha-numeric identifiers for produced documents, utilized so that produced documents easily can be identified, e.g. DEF0000001.

Boolean Search: Searching to connect sets of keywords or phrases with a single query, such as AND/OR/NOT. When a single word or phrase generates an unexpectedly large number of hits, Boolean searches can be an effective technique to more effectively pinpoint documents of interest.

Child Document: Refers to a file that is attached to or embedded within another document. Examples include an email attachment or a graph embedded in a word processing document.

Clawback Agreement: An agreement that provides a mechanism to retrieve inadvertently produced privileged documents, and to preclude their usage.

Coding: Entering fields of information from a document into a database, so that a set of documents can be more easily sorted and searched.  Coding can be objective or subjective. Objective coding is coding that can be applied by anyone able to read the language of a document, such as the date of a document. Subjective coding is coding that requires understanding the document, such as the legal issues dealt with in a document.   

Container File: This is a single file that contains multiple other files or documents, often in a compressed format. Examples of container files are Microsoft Outlook OST and PST files, ZIP files, and forensic image evidence files. File counts in eDiscovery can be underestimated if container files are not expanded prior to generating such counts, as a single container could contain thousands of files.

Cost Shifting: Typically the producing party bears the cost of document production, but under certain circumstances courts can order that the requesting party bear some or all such costs. Cost shifting can be a Solomonic remedy that courts impose in order to incentivize a requesting party to self-limit the breadth of discovery requests.

Culling: The process of limiting ESI after collection but prior to review, typically through automated means such as indexing the data and applying search criteria.

Custodian: Refers to an individual from whom ESI has been or will be collected, who possesses potentially relevant data.

Data Extraction: Refers to extraction of searchable fields of information and data from documents, such that they can be populated into a review database. For example, in an email message extracted data includes, among others, the following information: to, from, cc, bcc, subject, date, attachment count, attachment names, and body text.

Data Mapping: The process of creating a “map” to identify and record the locations and types of information within an organization. Organizations that work with multiple law firms can find data maps particularly useful, so that they quickly can inform counsel where potentially relevant data could reside and streamline the eDiscovery identification process.

De-Duplication: The process of comparing electronic records based on their characteristics to identify and remove duplicate records from data sets, thereby reducing review time and increasing coding consistency.

De-NISTing: Filtering out files that appear on lists of files that are common across operating systems and programs that are compiled by the National Institute of Standards and Technology (NIST). The NIST lists include digital fingerprints of files that are not user-generated, which can be compared to eDiscovery data sets in order to eliminate known irrelevant files prior to review.

Digital Forensics: Digital forensics is a branch of forensic science that involves the collection and investigation of data found on devices and accounts that store electronic data, typically performed in a verifiable manner using specialized software tools. Common devices that are the subject of digital forensic analysis include laptops, PCs, tablets, smart phones, servers, external storage devices, email accounts, social media accounts, and web-based storage accounts, among others. The terms “computer forensics” and “digital forensics” often are used interchangeably, but “digital forensics” more accurately reflects the diverse nature of devices and accounts that experts collect and analyze.

Discovery: The process of identifying, collecting, processing, reviewing, and producing potentially relevant evidence.

Document Family: A group of documents that is logically connected, such as an email and its attachments.

Email Threading: The process of compiling all emails within a review tool so that all emails from a chain can be viewed together as a single conversation.

Early Case AssessmentA method of performing an initial review of potentially relevant data in a cost-effective manner for the purpose of getting an initial sense as to the merits and potential costs associated with a legal matter.

eDiscovery / Electronic Discovery: The process of discovery in litigation involving ESI. Refer to “The eDiscovery Lifecycle” above.

Electronically Stored Information (ESI): ESI is information that exists in electronic (i.e. not paper) format, such as emails, word processing documents, presentations, spreadsheets, and text messages, among a vast array of other ESI categories.

Filtering: The process of using certain parameters to identify or exclude documents, typically in order to identify a narrower set of documents to review. Filtering often requires indexing data and then using search criteria such as keywords, phrases, Boolean expressions, proximity expressions, dates, and custodians as mechanisms to narrow the universe of documents to be reviewed.

FRCP: An abbreviation for the Federal Rules of Civil Procedure. The FRCP governs civil procedure in United States federal courts. Although some rules relating to eDiscovery are set forth in the FRCP, others are established by local rules, court orders, case law, and agreement of the parties in specific litigation matters. State law rules can vary, but often are informed by federal rules and case law.

Hash Value: A digital fingerprint of a document created through the use of a standardized algorithm, such that if the same document from different systems is analyzed, its hash values will match. Hash values are computed in order to de-duplicate documents. Examples of hash value algorithms include MD5 and SHA1.

Harvesting: Also referred to as the collection of ESI, harvesting is the method of gathering electronic data for future use in investigations or lawsuits, preferable while maintaining file and system metadata. 

Hosting: Litigation data that is loaded onto a review platform, often provided by an eDiscovery data hosting provider. Data hosting through an eDiscovery vendor allows legal teams to review large quantities of data remotely in an efficient manner, without needing to purchase their own software/hardware or to employ personnel experienced in administering such environments.

Legacy Data: Data whose format has become obsolete, potentially making the data difficult to access or process.

Legal Hold: Also known as a “preservation order” or “hold order,” a legal hold is the temporary interruption of a company’s document retention and/or destruction policies for data that might be relevant to a lawsuit.

Load File: A database file created using eDiscovery software that enables processed data to be loaded to a review tool in a manner that can be sorted and searched.

Meet and Confer: A meeting in which lawyers discuss discovery disputes in an attempt to resolve differences, typically conducted in an effort to resolve differences without the need for judicial intervention.

Metadata: Data about electronic data, including for example dates, file names, authors, and other electronic characteristics. Metadata itself can be as relevant as document content, and collecting ESI in a manner that preserves document metadata often is important in litigation.

Native Format: The format in which ESI originally was created. A native file format sustains metadata and other details that can be absent when documents are converted to other formats, such as conversion to PDF and TIFF images.

Near-duplicate: Documents that contain a high percentage of the same content are referred to as near-duplicates. Certain review tools enable near-duplicate identification, which can expedite review of similar documents.

Normalization: The process of reformatting data so that it can be stored in standardized format.

OCR: An abbreviation for Optical Character Recognition, OCR is the process of identifying and extracting searchable text from electronic files, such as PDFs and TIFF images. OCR often is utilized to increase the efficacy of text searches, but can be limited by the quality and nature of the documents lacking searchable text. Although OCR can lead to more documents being identified, the process of applying OCR can increase the time and/or cost of eDiscovery processing.

Parent Document: A document to which other documents and files are attached.

Predictive Coding: The process of combining machine-learning technology, work flows, and human review to apply decisions about the relevance of reviewed documents to a larger set of unreviewed documents, thereby reducing review time and cost.

Processing: The extraction of data from collected ESI and assembly of same into load file databases, so that ESI can be more easily searched and sorted within review software.

Production: The delivery of documents and electronically stored information to other parties in a litigation matter, typically performed after a review for relevance and privilege. Often productions include bates stamped PDFs or TIFF files with an accompanying eDiscovery load file.

Redaction: Deliberately covering portions of documents that are considered privileged, proprietary, or confidential.

Spoliation: The destruction or alteration of relevant ESI. Rules regarding evidence spoliation vary, but the cost of litigating evidence spoliation issues (much less the possible sanctions) often ae significantly higher than the costs would have been to properly preserve and collect responsive data.

Structured Data: Data stored in a structured format such as a database.

System Files: An electronic file that is part of the operating system or other control program. These files are created by the computer, not the user of the computer.

Tagging: The process of assigning classifications, such as by relevance or privilege, to one or more documents.

TIFF: An acronym for Tagged Image File Format, a TIFF is a common graphical file format to which hard copy documents are scanned, or ESI is converted for purposes of bates stamping and production. PDFs can fulfill an equivalent function.

Unitization: The process of assembling individually scanned pages into documents. Unitization can be physical, such as through the use of staples and binders, or logical, which involves human review to determine which pages belong together as a single document.

Unstructured Data: Documents not stored in a database format. Examples include emails, word processing files, spreadsheets, presentations, and various other documents.

ELIJAH provides end-to-end eDiscovery expertise to assist law firms and corporate legal departments in efficiently meeting their discovery needs. At every step in the electronic discovery lifecycle, ELIJAH delivers the solutions and expertise you need to successfully manage litigation data.


Recent Posts

Posts by Tag


Drop us a line at 866-354-5240, email info@elijaht.com, or send us a message below. We’d love to hear from you!