Feb 10 2012

How to Develop a Sound Data Loss Prevention Strategy

Pattern matching and document tagging are often used in parallel to create the most effective approaches to data loss prevention.

No matter what your role within your school or district, you no doubt have some data in your possession that’s critical to you, your colleagues or students, or other stakeholders. Inadvertently releasing that information could lead to financial loss, reputational damage or even criminal sanctions. How, then, are you ensuring that you maintain control over your school or district’s sensitive data?

Data loss prevention systems help keep tabs on an organization’s sensitive data by building an inventory and then maintaining control over the flow of information both inside and outside of the network. There has been a rapid adoption of DLP over the past few years, and now most organizations either already have or are considering a DLP system deployment.

What Does DLP Technology Do?

There are three critical roles that DLP products play in the enterprise. First, they help build an inventory of the sensitive information in an environment. It’s common for organizations to have only vague ideas about where this data resides, if they even have a clear definition of what constitutes sensitive data. Many organizations go to great lengths to protect their centralized stores of sensitive information, such as that stored in an ERP system or on enterprise databases, only to be shocked when they discover that users are copying that data into shadow systems stored on notebook computers. DLP can help identify those unsanctioned sensitive data repositories and either eradicate them or implement effective controls around them.

The second role that DLP plays is to monitor the flow of sensitive information throughout an organization. DLP products can tag sensitive information and then document how that data is transferred across networks and between systems. This can help identify business processes that work with sensitive information and implement appropriate security controls.

Finally, DLP allows organizations to proactively block the use of sensitive information that doesn’t meet its security policies. When the DLP system identifies a flow of sensitive data, it can optionally terminate the network connection, redact the sensitive data or apply additional security controls, such as encryption. These actions take place in real time, preventing an unintended leak of sensitive data.

Identifying Sensitive Data

There are two major techniques that DLP systems use to identify sensitive information: pattern matching and document tagging. These are commonly used in parallel to create the most effective approaches to DLP.

With pattern matching approaches, the DLP system uses regular expressions to identify information that might be sensitive. This technique is typically used to identify sensitive numbers that follow a regular pattern, such as Social Security numbers (using the format xxx-xx-xxxx) and credit card numbers (using the format xxxx-xxxx-xxxx-xxxx). Pattern matching is a good way to identify this type of information, but it is highly prone to false positives. For example, if an unformatted nine-digit number is on a system, a pure pattern-matching system has no way of telling whether it is a Social Security number, a dollar amount or a CUSIP number used to identify financial securities.

For this reason, many DLP products add contextual information to their pattern matching algorithms. In fact, this is one of the major ways that DLP manufacturers differentiate their products. Examples of ways that DLP can use context to improve the accuracy of pattern matching include:

  • Prioritizing formatted data over unformatted data (for example, the hyphens in 123-45-6789 make it much more likely to be a Social Security number than 123456789);
  • Looking at the header row in spreadsheets to determine whether it contains clues to the field’s nature;
  • Using field-specific knowledge to eliminate false positives. (For example, credit card numbers contain a checksum digit calculated using the openly available Luhn algorithm; 16-digit numbers that fail the Luhn check are not valid credit card numbers and may be ignored. Similarly, there are no valid Social Security numbers with “00” in the middle position.)

Pattern matching also might be used to search for specific keywords in documents. For example, a district with a rigorous classification policy might configure its DLP to monitor for documents leaving the organization with “Confidential” in the header.

With document tagging, security administrators build an inventory of specific documents that contain sensitive information. The DLP system then has two possible approaches for detecting the attempted sharing of those documents outside of the organization. In the first approach — document fingerprinting — the system computes a cryptographic hash of the file and then compares that digital fingerprint with the fingerprints of all documents leaving the organization. In the second approach, the system stores the entire content of the sensitive files and then watches for data leaving the organization matching those patterns.

DLP Environments

There are two major environments monitored by DLP products: endpoints and networks. Many districts start with one approach or the other and then eventually expand their DLP implementation to include both environments.

Host-based DLP solutions target the weakest link in the security chain: the notebooks, desktops and servers that serve as endpoints. These products can help identify those unknown data stores that contain sensitive information through the use of an agent-based approach. In these cases, a software agent residing on an organization’s endpoints takes an inventory of sensitive information and monitors for policy violations. Most host-based DLP products also provide users with the ability to digitally “shred” unwanted data and securely encrypt sensitive data that they do not wish to delete.

Network-based DLP products sit at the perimeter of a district’s network and scan all outbound network traffic for potential policy violations. These systems don’t have the ability to build an inventory of sensitive information, but they do provide a last line of defense capable of stopping the flow of sensitive information before it leaves a network.

Adopting a data loss prevention strategy requires selecting appropriate data identification strategies and then building a monitoring environment that can effectively watch for identified data that is being used in violation of a district’s security policies.

Building DLP Processes

Data loss prevention products can play an important role in a security infrastructure, but they can also prove a disruptive force if not appropriately managed. When beginning an effort to deploy DLP in your district, think about the supporting policies and processes that will make DLP effective in the long term. Here are a few items to consider when deploying DLP:

  • Who will be responsible for monitoring your production DLP system?
  • Who has the authority to set and modify DLP security policies?
  • Who can make an exception to a DLP rule? What is the process when an exception is requested, and what is the service level agreement for the timeliness of exception implementation?
  • What training will be available for both DLP administrators and end users who are subject to the DLP system?
  • What is the communication plan for informing users that a DLP system is being deployed and how it might affect their work?
  • How will funds be allocated for maintaining and upgrading the DLP platform?

Spending time up front to answer these questions will ensure that a DLP project gets off to a great start.