Computer Technology News
Our twice weekly email newsletter
Sign up or see the
current issue

Subscribe to CTN

 


Syndicate

Features

Leveraging Content Management Systems for e-Discovery Print E-mail

By Brad Harris

The slow and relentless march to gain control over the information assets of a corporation…enabling users to more efficiently collaborate and share information…facilitating records management principles to the world of electronic data…

Over the last few years, we’ve seen the emergence of enterprise-class content management and e-mail archiving as the grand solution to controlling the ever-burgeoning information explosion.

Content management and e-mail archiving applications have clear and compelling value propositions for consolidating data management practices, optimizing storage and facilitating more effective use of information assets. In many cases, such tools also enable records and information management, as well as enhanced compliance and risk management. But are they the panacea for legal discovery readiness?

To determine the right mix of people, process and technology, IT should work proactively with their legal counterparts, ask questions and review current methodologies for accessing content management and e-mail archives. In doing so, IT can be much better prepared to identify, preserve and collect the electronic evidence required as a part of commercial litigation or governmental investigation.

Evaluating the Systems
When evaluating a content repository’s ability to support e-discovery, it is important to consider, first and foremost, the use of the repository in its normal course of business. Most often, end users create and receive content in the form of word documents, spreadsheets, images and e-mail, which are then saved as files to a content repository. At that point, metadata, or the data about the data, is created or modified as a routine course of doing business. This metadata, such as the date the file was added to the repository, the person responsible, and the logical file location, could become relevant evidence during legal discovery.

The first step in responding to a preservation or production obligation is to identify which files in the repositories are potentially relevant. It is useful to know how the files are logically organized in a given repository, such as by folders or classification, and how consistently this organization is applied. It’s important to know how file ownership is defined from an end-user perspective, where the files are physically stored and whether custodians can be tracked back to the files. Counsel will also need to know:

  • Who added the document to the repository or who last accessed the files
  • Whether metadata is accessible for filtering on certain file types, file or record classes, or date ranges
  • The capabilities or limitations of performing metadata and keyword queries in the repositories
  • Whether the content is reliably indexed for conducting keyword filtering, including indexing the text from embedded objects within files, attachments to e-mail or files within compressed files1
  • If files with content that cannot be indexed, such as image files without text or encrypted documents, are identified separately when attempting to filter based on keyword hits
  • Can search parameters be logged and accessed later for audit tracking, if required

When identifying, collecting and preserving evidence for discovery, the search protocol needs to be clearly documented and include such details as:

  • Who conducted the search, when, and for what purpose
  • How the search index was created
  • The capabilities and limitations of the search engine
  • How the repository or search index was accessed
  • The parameters or attributes that were used when conducting the search
  • The actual results and how they were preserved for later auditing

Once specific files or documents have been identified, the discovery team, comprised of both legal and IT, must then be able to tag and categorize the files for the particular discovery matter. Can this be done by adding a new attribute to the file or by creating a new record in the database? In some systems, this may be as simple as a “drag-and-drop” into a special folder, where a pointer links back to the original file. In other instances, an actual copy of the file might be created. Regardless of which approach, it is imperative to prevent spoliation by insuring the original metadata associated with the file remains preserved. For example, will the tagging action change the last modified date or when copying the item to a new folder, will the original path be lost?

Preserving the Evidence
Once potentially relevant files have been identified, the evidence needs to be preserved to prevent inadvertent modification or deletion. Many repository systems allow declaration of a legal hold to suspend a document’s information lifecycle and prevent routine deletion based on a retention schedule.2 It’s important to determine whether the systems in use make items immutable, such as declaring as a record and then freezing the item. The counsel also needs to understand:

  • If applied to a “stub” or a link, does the preservation extend to the original item?
  • Does preservation include the item's metadata?
  • Is the authority to implement and manage such holds controlled by access rights and permissions?
  • Can multiple legal holds be effectively managed, such as when the same document is relevant to more than one legal matter and ensure that when releasing the “hold” from one matter, it doesn’t override the holds from others?

When responding to a request for production, or preemptively collecting potentially relevant evidence to ensure compliance with pending litigation, content must be exported from the repository in a legally-defensible manner. To be defensible, a process needs to be predictable (repeatable and testable results), transparent (well-understood and articulated), and trusted (non-repudiation of the end result). The process needs to:

  • Maintain a clear “chain-of-custody” that captures the actions taken to assure authenticity of the copy being exported, such as an audit log that includes access rights, selection parameters, time stamps and list of results
  • Consistently manage, or at least log, any errors encountered, such as a fault condition that prevents a selected file from being written to a target drive
  • Be reasonably efficient to meet discovery timeframes, as requests may involve hundreds of thousands of records spanning multiple servers and physical storage locations, and involve numerous tables containing relevant metadata elements.

Other questions to ask include:

  • Is hierarchical storage management (HSM)3 utilized? If so, how does search and retrieval differ as an item is moved from online, to near online, to offline, and eventually archival storage? The physical location and storage medium can change during the document retention lifecycle, impacting the relative ease and speed of access.
  • Can relevant metadata also be extracted along with files from the repository, such as the original file name, path and creation date, rather than being left simply with the date and location where the file is being written to an external share drive for collection?
  • If files are physically stored elsewhere, such as when accessing a link or “stub document” that points to another repository, does the export process include the linked document?
  • If the file is a compound document, such as an e-mail with attachments, does the export maintain or recreate the parent-child relationships?

Understanding the Context
Oftentimes, e-discovery hinges not only on a document’s content but its context as well. Where the document was filed, who had access to it, and the format of the document when viewed may be relevant. In assessing this information, counsel and IT must be prepared to discuss the following:

  • Do access rights and user permissions affect which documents a specific custodian can see?
  • If more than one user has access to a document, how is custodial ownership defined?
  • If different custodians organize the same document differently, such as by virtual folder views, which logical path structure is relevant?
  • If such “context” is important to a legal matter or investigation, do search and retrieval processes account for specific end-user roles, permissions and views of the repository?

Content management systems can also add a whole new dimension to discovery if document versioning is enabled. For example, when exporting a file for discovery, will the most current version be exported or will it include all previous versions and history? If a matter concerns the state of the repository at some time in the past, will the content management system enable the retrieval of a particular version of the file from a specific point in time, including deleted or archived files? Does the search and retrieval processes account for deleted or archive history? If legally obligated, will the system be able to produce such metadata, including the version history, which custodians accessed the files and what changes were made?

Similarly, e-mail archiving systems also add new questions. Many archiving systems add value by actively managing storage parameters. For instance, single-instance storage offers s a compelling advantage by enabling a single version of duplicate e-mails or attachments to be stored, using links and pointers to keep everything transparent to the end-user.4 Some e-mail management systems take this a step further by allowing e-mail senders to avoid attaching a document altogether, instead embedding on a link back to the repository where the “attached” file is actually stored. When exporting e-mails for discovery, understanding what happens to these links is critical. For example, does the export process replace logical links with actual file attachments?

File Migration Before or After a Duty to Preserve Exists
When deploying a new document management or e-mail archiving system, it is not uncommon to transfer existing files being stored elsewhere into the new content repository. Documents formerly stored on personal hard drives, network shares or loosely managed file share systems can be retained, shared and governed far better when retained in an enterprise content repository and eliminated from “from the wild.”

However, typical migration utilities do little to retain the original metadata from the source repository, such as file create date, last modified date or original file path. Therefore, it is not uncommon at all to see hundreds or thousands of files in the repository all showing the same “create date,” since the repository typically records the date the file was added to the repository as the original date.

If a data migration is being done absent an active matter or preservation obligation, such alteration or loss of the original metadata may not be an issue. But if information that existed in the original source repository remains relevant, the organization could become exposed to serious sanctions for metadata spoliation.

It is because of a need to preserve metadata that most content repositories or archiving systems cannot typically be used as a repository for preserving potentially relevant evidence stored “in the wild” once a duty to preserve has arisen. Doing so should only been done once legal implications have been fully vetted and only if special migration methodologies are in place to ensure legal defensibility.

Brad Harris is the Director of the Discovery Center of Excellence for Fios, Inc. www.FiosInc.com

1 Most search engines rely on indexes to perform keyword searching, where the content of a file is accessed to extract its textual content. A search index, or more precisely a full-text index, is the database which a full-text search engine uses to respond to the query issued by the user.

2 Retention schedules are a key component of Records Management or Information Lifecycle Management (ILM) systems, where a document is assigned a file plan which articulates how long the record is to be retained and where. A typical lifecycle may be triggered from the date it a file is declared a record, defining how long it is then retained in active storage, when it should be moved to archive storage, and ultimately when it should be disposed of.

3 HSM, or hierarchical storage management, allows for optimizing storage of electronic documents based on use and access needs. As a file matures, it is oftentimes accessed far less frequently than a brand new record. Thus, more sophisticated DR systems allow for offloading older files to less accessible (and therefore less costly) storage mediums.

4 When several files in a repository contain exactly the same data, single instance storage (SIS) can replace the references to these identical files by references to a single stored copy of the file. This can potentially save large amounts of disk space in systems with many copies of the same file.

 

Sitemap ITSecurityJournal.com | Information Security & Network Security Management/font>