File Classification Operations

Azure Data Expert provides advanced file classification capabilities allowing one to classify and categorize files stored in one or more storage containers by the file type and show a logical hierarchy of file types and file categories sorted by the amount of the used disk space. The user is provided with a number of file classification plug-ins capable of recognizing 3,000 types of files and allowing one to perform different types of file classification operations.

Azure Data Expert File Classification

In order to perform a simple file classification operation, select one or more storage containers in the Azure Data Expert navigator and press the 'Classify' button located on the main toolbar. Azure Data Expert will classify and categorize data blobs stored in the selected storage containers and display a file classification results dialog.

Azure Data Expert File Classification Results

The file classification results dialog shows categories of data blobs sorted by the used disk space and provides the ability to categorize and filter file classification results by the file extension, file type, size and last modification date, display various types of pie charts, download or delete specific categories or types or data blobs and save file classification reports to a number of standard formats including HTML, PDF, Excel, text, CSV and XML.

Categorize and Filter File Classification Results

The file classification results dialog shows in the top results pane categories of files and data blobs sorted by the amount of used disk space. In order to open a file category double click on the required item in the results view. The bottom pane shows second-level file categories according to the currently selected file categorization mode.

Azure Data Expert Categorize File Classification Results

The user is provided with the ability to categorize and filter the file classification results by the file extension, file size, the last modification time and the last modification date. In order to change the current file categorization mode, click on the file categories combo box and select an appropriate file categorization mode.

Azure Data Expert Filter File Classification Results

For example, in order to categorize the file classification results by the last modification date, click on the file categorization combo box and select the 'Categorize By Modification Date' item. Now, in order to filter the file classification results by a specific date, double click on the required date item and the file classification results dialog will display file classification results matching the selected filter. In addition, all types of disk space analysis reports and pie charts will be generated according to the currently selected file categorization mode.

Save File Classification Reports

Azure Data Expert allows one to save file classification reports into a number of standard formats including HTML, PDF, Excel, XML, text and CSV. In the simplest case, perform a file classification operation and press the 'Save' button located on the file classification results dialog. On the save report dialog, select an appropriate report format, enter a report file name and press the 'Save' button.

Azure Data Expert Save File Classification Report

For the HTML, PDF, Excel, text, CSV and XML report formats, the user is provided with the ability to save a short summary report or a longer detailed report, which may be very long for large storage accounts containing millions of files. By default, Azure Data Expert will save a short, summary file classification report in the HTML report format, which will include a list of top-level file categories according to the selected file classification plug-in and a list of tables showing the disk space usage and the number of files per file extension, file type, last modification date, top-level directory name, etc.

Azure Data Expert File Classification HTML Report

A detailed file classification report includes a list of file categories according to the currently selected second-level file categorization mode followed by an hierarchy of file groups and file classes sorted by the amount of the used disk space. In order to configure the number of hierarchy levels exported to the detailed file classification report, press the 'Advanced Options' button located on the 'Save Report' dialog and customize the report for your specific needs.

Microsoft Excel Reports

Sometimes, it may be required to perform additional analysis of file classification results using external tools such as Microsoft Excel. In order to export file classification results to the Excel report format, perform a file classification operation, press the 'Save' button located on the file classification results dialog, select the 'Excel Summary' report format for a short summary report or the 'Excel Report' format for a detailed file classification report.

Azure Data Expert Save File Classification Excel Report

A summary Excel report includes a list of top-level file categories according to the selected file classification plug-in and a number of tables with pie charts showing the used disk space and the number of files per file extension, last modification date, top-level directory name, etc.

Azure Data Expert File Classification Excel Report

A detailed Excel report includes a list of file categories according to the currently selected file categorization mode and an hierarchy of file groups and file classes sorted by the amount of the used disk space, which may be very long for large storage accounts containing millions of files. In order to control how many hierarchy levels and how many files per level are exported in the detailed report, press the 'Advanced Options' button located on the 'Save Report' dialog and customize the file classification report according to your specific needs.

Graphical PDF Reports

One of the most useful ways to export file classification results is to use the PDF summary or the PDF report formats. Both of these report formats include various types of graphical pie charts showing disk space usage and the number of files per file extension, file category, last modification date, top-level directory name, etc. In order to save file classification results to a PDF report file, press the 'Save' button located on the file classification results dialog and select the 'PDF Summary' report format for a short, summary report or the 'PDF Report' format for a detailed file classification report.

Azure Data Expert Save File Classification PDF Report

A summary PDF report includes a list of top-level categories of files according to the selected file classification plug-in sorted by the amount of the used disk space followed by a number of pie charts showing the disk space usage and the number of files per file extension, file type, last modification date, top-level directory name, etc. A detailed PDF report includes an hierarchy of file groups and file classes sorted according to the used disk space, which may be very long for large storage accounts containing millions of files.

Azure Data Expert File Classification PDF Report

In addition to the hierarchy of file types sorted by the used disk space, detailed PDF reports include pie charts showing the disk space usage per file category and the number of files per file category according to the currently selected file categorization mode. For example, if the second-level file categories mode is set to categorize file classification results by the file extension, the PDF report will display pie charts showing the used disk space and the number of files per file extension.

Classify Specific Types of Files

Azure DEX Pro provides the ability to classify and categorize specific types of files or groups of files by the file type, category, size, name, extension, last modification date, etc. The user is provided with the ability to configure a number of file matching rules for each file classification operation allowing one to classify the required files and precisely focus on the required data. For example, the user can classify all types of images with the file size more that 10 MB that were modified during the last month.

Azure Data Expert File Classification Rules

In order to configure one or more file matching rules for a file classification operation, open the classification command dialog, press the 'Options' button, select the 'Rules' tab, press the 'Add' button and select the required rule type. During the file classification process, Azure Data Expert will scan storage containers and evaluate the processed data blobs using the specified file matching rules. Data blobs matching the specified rules will be displayed in the file classification results and data blobs not matching the specified rules will be excluded from the file classification results.

Azure Data Expert Negative File Matching Rules

In addition to positive file matching rules allowing one to include specific files in the file classification process, Azure Data Expert provides negative file matching rules allowing one to exclude specific files from the file classification process. For example, in order to exclude all types of images from the file classification process, add a file category rule, select the 'Images, Pictures and Graphic Files' file category and select the 'Not Categorized' rule operator. During the file classification process, Azure Data Expert will evaluate the processed files and skip all types of image files.

Exclude Directories From the File Classification Process

Sometimes, it may be required to exclude one or more subdirectories from the file classification process. For example, if you need to classify files in a storage container excluding one or two special directories, you may specify the storage container to be classified and add the directories that should be skipped to the exclude list.

Azure Data Expert Classification Exclude Directories

In order to add one or more directories to the exclude list, open the file classification command dialog, press the 'Options' button, select the 'Exclude' tab and press the 'Add' button. All files and subdirectories located in the specified exclude directory will be excluded from the file classification process. In addition, advanced users are provided with a number of exclude directories macro commands allowing one to exclude multiple directories using a single macro command.

Azure Data Expert provides the following exclude directories macro commands:

  • $BEGINS <Text String> - this macro command excludes all directories beginning with the specified text string.
  • $CONTAINS <Text String> - this macro command excludes all directories containing the specified text string.
  • $ENDS <Text String> - this macro command excludes directories ending with the specified text string.
  • $REGEX <Regular Expression> - this macro command excludes all directories matching the specified regular expression.

For example, the exclude macro command '$CONTAINS Temporary Files' will exclude all directories with 'Temporary Files' at any place in the full directory path and the exclude macro command '$REGEX \.(TMP|TEMP)$' will exclude all directories ending with '.TMP' or '.TEMP'.