How to Use Microsoft’s Data Classification Toolkit
Most organizations have been faced at one time or another with the need to organize unstructured data stored across many servers. Microsoft’s new Data Classification Toolkit, a free download for use with Windows Server 2008 R2 and the File Classification Infrastructure, allows organizations to report on the classification status of files across multiple servers, develop classification policy and implement appropriate controls to help manage storage more efficiently.
For a primer on the Windows Server 2008 R2 File Classification Infrastructure, see the BizTech article "Control Data Sprawl with File Classification in Windows Server 2008 R2".
Before working with the tool kit, you should decide how you’re going to classify files. This is usually determined by the data’s value and any regulatory requirements, and should be carried out in coordination with auditors to make sure it meets your organization’s needs. Then you can apply classification policy to files, which can include assigning NTFS permissions, retention or auditing parameters, expiration or backup, or moving to lower-cost storage.
Getting Started with the Tool Kit
The Data Collection Toolkit provides a set of PowerShell commandlets that can be used to manage and automate classification and reporting. It’s also useful to have the File Server Resource Manager (FSRM) console installed on your server so that you can configure and manage file classification using a graphical interface. To install FSRM, log on to Windows Server 2008 R2 as a local administrator, open a PowerShell command prompt from the task bar and run the following two commands:
IMPORT-MODULE SERVERMANAGER
ADD-WINDOWSFEATURE FS-RESOURCE-MANAGER
Next, configure PowerShell to run unsigned scripts supplied in the tool kit by issuing the following command and choose Yes when prompted to change the signing policy:
SET-EXECUTIONPOLICY REMOTESIGNED
Also, run the ENABLE-PSREMOTING command on target servers if you want to run PowerShell commandlets from a remote machine. All the command line examples in this article assume that you’re making changes only to the local server.
Download the Data Classification Toolkit from Microsoft’s website. The tool kit runs on Windows 7 or Server 2008 R2 SP1 and requires PowerShell 2.0 and the .NET Framework 3.5 or later. Once installed, open the tool kit console as an administrator from All Programs > Microsoft Data Classification Toolkit on the Start menu.
Importing File Classification Packages
The tool kit comes with four predefined configuration packages (for NIST SP 800-53 and PCI DSS regulatory codes), which contain properties, tasks, rules and report jobs that can be copied and modified to suit your needs. They’re stored as .xml files in the tool kit directory, which by default is C:\PROGRAM FILES (X86)\MICROSOFT\DATA CLASSIFICATION TOOLKIT.
Figure 1 – Data Classification Toolkit configuration packages
Copy the four packages to a convenient location on the local disk, such as C:\PACKAGES. You might want to open one of the files using Notepad to get a feel for the syntax. Now import one of the packages to FSRM so that you can edit it using the GUI. In the tool kit console window, run IMPORT-FILECLASSIFICATIONPACKAGE and then specify the full path to one of the copied .xml packages. You’ll also need to specify the SCOPE parameter, which determines whether rules and tasks in the package are applied to all shares (ALLSHARES) or explicitly as defined in the package file (EXPLICIT).
Before running the command, make sure you have at least one file share on the target server and create an additional share for a folder called C:\EXPIRED (as it’s referred to in the package examples) so tasks can automatically move files to C:\EXPIRED based on certain criteria. Alternatively, you can edit the copied package files to use a different file share for expired files. Here’s an example of an import command:
IMPORT-FILECLASSIFICATIONPACKAGE -PATH “C:\PACKAGES\PCI-DSS CLASSIFICATION
PACKAGE EXAMPLE.XML” -SCOPE ALLSHARES
Repeat the process and import PCI-DSS CLASSIFICATION TASKS EXAMPLE.XML. Now that the rules and tasks have been imported, open FSRM from Administrative Tools on the Start menu. In the left pane, expand Classification Management and click on Classification Properties. You’ll see the classification properties imported from the package example — and similarly, rules, tasks and reports if you click Classification Rules, File Management Tasks and Storage Reports Management, respectively.
Figure 2 – Classification Properties imported from the package file example
Once you’ve made any modifications to the package configuration using FSRM, you can then export the results, either as a backup or to use on other file servers. A sample export command might look like this:
EXPORT-FILECLASSIFICATIONPACKAGE –PATH C:\PACKAGES\OUTPUT.XML
Reporting
FSRM is the easiest way to work with reports because you can manage and run reports for the local server. Reports can be run by selecting Storage Reports Management in the left pane, right-clicking the desired report in the central pane and then selecting Run Report Task Now from the menu.
The tool kit has more flexibility and can store reports from multiple remote file servers in a database and output the data using pivot tables in Excel. There are four steps required to set up Excel reporting with the tool kit:
- Create a new reporting database in SQL.
- Publish data from previously run reports to the database.
- Create an Excel template to extract information from the database.
- Open the template in Excel and view the reports.
Start by creating a new database. You must have SQL already installed somewhere on your local network. In the command below, replace [SERVERNAME] with the name of the server where SQL is installed and [INSTANCE] with the name of the SQL instance, which by default is SQLEXPRESS when using SQL 2008 R2 Express edition.
NEW-FILECLASSIFICATIONREPORTTEMPLATE –CONNECTIONSTRING “DATA
SOURCE=[SERVERNAME]\[INSTANCE];INITIAL CATALOG=;INTEGRATED SECURITY=TRUE” –
DATABASENAME REPORTINGDATABASE
Now that the database has been created, you need to publish classification report data to the database using the PUBLISH-FILECLASSIFICATIONREPORTDATA commandlet. This requires that at least one report has been run from FSRM:
PUBLISH-FILECLASSIFICATIONREPORTDATA –CONNECTIONSTRING “DATA
SOURCE=[SERVERNAME]\[INSTANCE]; INITIAL CATALOG=;INTEGRATED SECURITY=TRUE” –
DATABASENAME REPORTINGDATABASE
Next create an Excel template that contains information about the data source. Note that the connection string must contain the PROVIDER (SQLOLEDB.1) parameter; and INITIAL CATALOG, which should be the name of the database created in the first step:
NEW-FILECLASSIFICATIONREPORTTEMPLATE -CONNECTIONSTRING
“PROVIDER=SQLOLEDB.1;DATA SOURCE=[SERVERNAME]\[INSTANCE];INITIAL
CATALOG=REPORTINGDATABASE; INTEGRATED SECURITY=TRUE” -PATH
“C:\TEMPLATE.XLSX”
Figure 3 – User-friendly reporting with the Data Classification Toolkit in Excel
Now open TEMPLATE.XLSX to work with the data in Excel. You’ll notice the Excel reporting is more flexible than the reporting engine in FSRM. For more information on the PowerShell commandlets included in the tool kit, see the reference table in the Data Classification Toolkit User Guide.