Duplicate Files Search Operations

Azure Data Expert allows one to search duplicate files in one or more Microsoft Azure cloud storage accounts. In order to perform a simple duplicate files search operation, select one or more storage containers in the Azure Data Expert navigator and press the 'Duplicates' button located on the main toolbar.

Azure Data Expert Duplicate Files Search

Azure Data Expert will scan the selected storage containers, find duplicate files and display the duplicate files search results dialog showing the list of duplicate file sets sorted by the amount of duplicate disk space. The duplicate files search results dialog provides the ability to categorize and filter detected duplicate files by the file extension, file type, size and last modification date.

Azure Data Expert Duplicate Files Search Results

In addition, the results dialog allows one to display various types of pie charts, download or delete specific duplicate files and save duplicate files search reports to a number of standard formats including HTML, PDF, Excel, text, CSV and XML. The user is provided with the ability to browse the list of duplicate file sets, perform various types of file management operations on duplicate files and search specific duplicate files by the blob name, file type, file size and the last modification date.

Categorize and Filter Duplicate Files

The duplicate files search results dialog shows in the top results pane the list of duplicate file sets sorted by the amount of duplicate disk space. In order to open a duplicate files set, double click on the required item in the results view. The bottom pane shows categories of duplicate files according to the currently selected file categorization mode.

Azure Data Expert Categorize Analysis Results

The user is provided with the ability to categorize and filter duplicate files by the file extension, file type, file size, the last modification time and the last modification date. In order to change the current file categorization mode, click on the file categories combo box and select an appropriate file categorization mode.

Azure Data Expert Filter Duplicate Files

For example, in order to categorize duplicate files by the last modification date, click on the file categorization combo box and select the 'Categorize By Modification Date' item. Now, in order to filter duplicate files by a specific date, double click on the required date item and the duplicate files search results dialog will display duplicate files matching the selected filter. In addition, all types of duplicate files search reports and pie charts will be generated according to the currently selected file categorization mode.

Save Duplicate Files Search Reports

Azure Data Expert allows one to save duplicate files search reports into a number of standard formats including HTML, PDF, Excel, XML, text and CSV. In the simplest case, perform a duplicate files search operation and press the 'Save' button located on the duplicate files search results dialog. On the save report dialog, select an appropriate report format, enter a report file name and press the 'Save' button.

Azure Data Expert Save Duplicate Files Search Reports

For the HTML, PDF, Excel, text, CSV and XML report formats, the user is provided with the ability to save a short summary report or a longer detailed report, which may be very long for large storage accounts containing many thousands of duplicate files. By default, Azure Data Expert will save a short, summary duplicate files search report in the HTML report format, which will include a list of top 20 duplicate file sets sorted by the amount of duplicate disk space and a list of tables showing the amount of duplicate disk space and the number of duplicate files per file extension, file type, top-level directory, last modification date, etc.

Azure Data Expert Duplicate Files Search HTML Report

A detailed duplicate files search report includes a list of file categories according to the currently selected second-level file categorization mode followed by a full list of duplicate file sets sorted by the amount of the duplicate disk space. In order to configure the number of duplicate file sets exported to the detailed duplicate files search report, press the 'Advanced Options' button located on the 'Save Report' dialog and customize the report for your specific needs.

Microsoft Excel Reports

Sometimes, it may be required to perform additional analysis of duplicate files search results using external tools such as Microsoft Excel. In order to export duplicate files search results to the Excel report format, perform a duplicate files search operation, press the 'Save' button located on the duplicate files search results dialog, select the 'Excel Summary' report format for a short summary report or the 'Excel Report' format for a detailed duplicate files search report.

Azure Data Expert Save Duplicate Files Search Excel Reports

A summary Excel report includes a list of top 20 duplicate file sets sorted by the amount of duplicate disk space and a number of tables with pie charts showing the amount of duplicate disk space and the number of duplicate files per file extension, file category, last modification time, top-level directory name, etc.

Azure Data Expert Duplicate Files Search Excel Report

A detailed Excel report includes a full list of duplicate file sets sorted by the amount of the duplicate disk space followed by lists of duplicate files in each set, which may be very long for large storage accounts containing many thousands of duplicate files. In order to control how many duplicate file sets are exported in the detailed report, press the 'Advanced Options' button located on the 'Save Report' dialog and customize the duplicate files search report for your specific needs.

Graphical PDF Reports

One of the most useful ways to export duplicate files search results is to use the PDF summary or the PDF report formats. Both of these report formats include various types of graphical pie charts showing the amount of duplicate disk space and the number of duplicate files per file extension, file category, last modification time, top-level directory name, etc. In order to save duplicate files search results to a PDF report file, press the 'Save' button located on the duplicate files search results dialog and select the 'PDF Summary' report format for a short, summary report or the 'PDF Report' format for a detailed duplicate files search report.

Azure Data Expert Save Duplicate Files Search PDF Reports

A summary PDF report includes a list of top 20 duplicate file sets sorted by the amount of the duplicate disk space followed by a number of pie charts showing the amount of duplicate disk space and the number of duplicate files per file extension, file category, last modification date, top-level directory name, etc. A detailed PDF report includes a list of duplicate file sets sorted by to the amount of duplicate disk space followed by lists of duplicate files in each set, which may be very long for large storage accounts containing many thousands of duplicate files.

Azure Data Expert Duplicate Files Search PDF Report

In addition to the list of duplicate file sets sorted by the amount of duplicate disk space, detailed PDF reports include pie charts showing the duplicate disk space per file category and the number of duplicate files per file category according to the currently selected file categorization mode. For example, if the second-level file categories mode is set to categorize duplicate files search results by the file extension, the PDF report will display pie charts showing the amount of duplicate disk space and the number of duplicates per file extension.

Delete Duplicate Files

In order to reclaim the wasted disk space, Azure Data Expert provides the ability to delete duplicate files. In the simplest case, double click on a duplicate files set in the duplicate files search results dialog, select one or more duplicate files, press the right mouse button, select the 'Delete Files' menu item and the selected files will be permanently deleted from the Azure storage container.

Azure Data Expert Duplicate Files Set

Before deleting any duplicate files, make sure that all the references to the duplicate files are replaced with references to the original file in the same duplicate files set. In order to obtain a URL pointing to the original file in the duplicate files set, select the original file, which is marked with the 'Lock' icon, press the right mouse button and select the 'Copy URLs to Clipboard' menu item.

Azure Data Expert Delete Duplicate Files

In order to delete all duplicate files in a number of duplicate file sets, select all the required duplicate file sets in the duplicate files search results dialog, press the right mouse button and select the 'Delete Duplicate Files' menu item.

Search Specific Types of Duplicate Files

Azure DEX Pro provides the ability to search specific types of duplicate files by the file type, category, size, name, extension, last modification date, etc. The user is provided with the ability to configure a number of file matching rules for each duplicate files search operation allowing one to search the required files and precisely focus on the required data. For example, the user can search all types of duplicate images with the file size more that 10 MB that were modified during the last month.

Azure Data Expert Duplicate Files Search Rules

In order to configure one or more file matching rules for a duplicate files search operation, open the duplicate files search command dialog, press the 'Options' button, select the 'Rules' tab, press the 'Add' button and select the required rule type. During the search process, Azure Data Expert will scan storage containers and evaluate all data blobs using the specified file matching rules. Duplicate files matching the specified rules will be displayed in the duplicate files search results and data blobs not matching the specified rules will be excluded from the duplicate files search results.

Azure Data Expert Negative File Matching Rules

In addition to positive file matching rules allowing one to include specific files in the duplicate files search process, Azure Data Expert provides negative file matching rules allowing one to exclude specific files from the duplicate files search process. For example, in order to exclude all types of images from the duplicate files search process, add a file category rule, select the 'Images, Pictures and Graphic Files' file category and select the 'Not Categorized' rule operator. During the duplicate files search process, Azure Data Expert will evaluate the processed files and skip all types of image files.

Exclude Directories From the Analysis Process

Sometimes, it may be required to exclude one or more subdirectories from the duplicate files search process. For example, if you need to search duplicate files in a storage container excluding one or two special directories, you may specify the entire storage container to be searched and add the directories that should be skipped to the exclude list.

Azure Data Expert Duplicate Files Exclude Directories

In order to add one or more directories to the exclude list, open the duplicate files search command dialog, press the 'Options' button, select the 'Exclude' tab and press the 'Add' button. All files and subdirectories located in the specified exclude directory will be excluded from the duplicate files search process. In addition, advanced users are provided with a number of exclude directories macro commands allowing one to exclude multiple directories using a single macro command.

Azure Data Expert provides the following exclude directories macro commands:

  • $BEGINS <Text String> - this macro command excludes all directories beginning with the specified text string.
  • $CONTAINS <Text String> - this macro command excludes all directories containing the specified text string.
  • $ENDS <Text String> - this macro command excludes directories ending with the specified text string.
  • $REGEX <Regular Expression> - this macro command excludes all directories matching the specified regular expression.

For example, the exclude macro command '$CONTAINS Temporary Files' will exclude all directories with 'Temporary Files' at any place in the full directory path and the exclude macro command '$REGEX \.(TMP|TEMP)$' will exclude all directories ending with '.TMP' or '.TEMP'.