Use the Scanner CLI utility
The Scanner CLI utility allows you to analyze the file system or NAS shares and get insight into the file and directory structure. Before you run the Scanner CLI utility for the NAS shares, mount the NAS shares manually. This utility is bundled with the Hybrid Workloads agent. So, when you install the latest version of the Hybrid Workloads agent, you get access to the Scanner CLI utility. You can get information, such as the number of folders and files present, directory and file level, and data changed rate. When you run the Scanner CLI utility for the first time, a full scan is performed and all the subsequent scans may be incremental or full based on the configuration parameter specified in the configuration file. You will notice a significant improvement in the incremental scans performed after the first full backup.
- Create a configuration file in the YAML format by copying the following snippet to a text file and saving the file in the YAML format. Or, you can download the following sample config.yml.
root_paths: [<root path1>, <root path2>] fset_dir: <Specify fset directory path> scan_worker_count: 50 sqlite_n_conns: 8 results_threshold: 10000 results_file: <Specify the location where the result file will be saved> processed_data_file = <Specify the location where the output file with formatted data will be saved> use_usn: false smart_scan: false force_scan: false ss_age_threshold: 0 skip_acl: true statemap: false log_file: <Specify the location where the log file will be saved> db_file_path: <Specify the database file path> filters: exclude_folders: [<folder name>, <folder name>] exclude_extensions: "" include_extensions: ""
Parameter Description Default root_paths Specify the absolute or full path of the directories that you want to scan. For NAS, the root path should be the path where the share is mounted. [ ] fset_dir Specify the drive letter.
- For Windows and CIFS share, the fset directory is '<Drive letter:\>', and
- For Linux and NFS Share, the fset directory is '/'
NA scan_worker_count The number of threads that are to be used for scanning. 50 sqlite_n_conns The number of connections to be established with SQLite. 8 (recommended)
Minimum batch size using which the output will be displayed on the console when the utility is run.
10000 (recommended) results_file Specify the location of the results file, which will contain information about the changed data. A timestamp is appended to the results file name after each Scanner CLI utility run. ResultsFile_<FsetDir>_<Timestamp> use_usn Windows USN false smart_scan Set to 'true' if you want to enable smart_scan. Smart scan optimizes the scanning duration for backup. To know more about smart scan, refer to Enable Smart Scan. This is not applicable for the first full scan. false force_scan Set to 'true' if you want to run a full scan forcefully instead of an incremental scan. The recommended value is 'false'. This is not applicable for the first full scan. false ss_age_threshold Age threshold (in days) for the directory to be eligible for a smart scan. To know more about smart scan, refer to Enable Smart Scan. This is not applicable for the first full scan. 0 skip_acl Set to 'false' to skip detecting the Access Control Lists (ACLs) changes. This is not applicable for the first full scan. false log_file Specify the location where you want to save the scanner log files. ScannerLog_<FsetDir>.log statemap Set to false.
Note: This improves scan performance. If you are planning to run an incremental scan after the first full scan, this parameter needs to be set to 'true' for all runs including the first full run.
false db_file_path Specify the location of the file which will be used to store the persistent state of the scanner. DBFile_<FsetDir>.db
List of folders to be excluded from the scan. For example,
exclude_folders: [dev, /proc, /etc, Phoenix]
"/sys", "/dev", "/tmp", "/lost+found", "/etc/Phoenix",
"/var/Phoenix", "/selinux", "$Recycle.Bin", "ProgramData",
"Recovery", "System Volume information",
"RECYCLER", "C:\\Program Files (x86)",
"C:\\Program Files", "C:\\Windows",
".snapshot", ".Snapshot", ".SNAPSHOT"]
List of file extensions to be excluded from the scan. The extensions must be separated by a semicolon. For example,
List of file extensions to be included in the scan. The extensions must be separated by a semicolon. For example,
Note: If a file extension is added to the include list and exclude list, then the file extension will be excluded as the exclusion takes precedence over inclusion.
You can override the value of any of these parameters by adding it to the yaml file. Make sure to also add the “fset_dir: <Specify fset directory path>” in the yaml files, as it is a mandatory parameter.
If you do not provide the values for the results_file, processed_data_file, log_file, and db_file_path parameters in the yml file, then the files with the default names will get created at the location from where the utility is run.
Note: The statistics in the ProcessedDataFile is applicable only in the first run.
- Download and install the latest version of the Hybrid Workloads agent from the Downloads page.
- In case of Linux, increase the file descriptor (FD) limit by using the following command:
ulimit -n 65000
- Run one of the following command:
scanner-cli.exe <Configuration file path>Or
scanner-cli.exe <Directory path for analysis> <Directory in which to create the output files>
If you use this command, then the default parameters will be used. To override these parameters, you must create a configuration file and run it.
Review scan result
Once the scan is complete,
- A result file is generated at the location specified in the configuration file. This result file contains the following information about the changed data:
ChangeType - Indicates the type of change, such as file added, file modified, file deleted.
ItemType - Indicates the type of file: 'F' indicates a file, 'D' indicates a directory, and 'L' indicates link.
Mode - Indicates the Standard OS File Mode (uint32).
MTime - Indicates the modification time of the file or the folder.
Size - Indicates the size of the file in bytes.
Path - Indicates the full path of the file.
- A log file is generated at the location specified in the configuration file. The output file contains the following telemetry information.
- An output file (processed_data_file) with the formatted data is generated that contains the following telemetry information.
Scanned directory: D:\ Include path(s):  Exclude folders: /proc, /sys, /dev, /tmp, /lost+found, /etc/Phoenix, /var/Phoenix, /selinux, $Recycle.Bin, ProgramData, Recovery, System Volume information, RECYCLER, C:\Program Files (x86), C:\Program Files, C:\Windows, .snapshot, .Snapshot, .SNAPSHOT Exclude extensions: NA Include extensions: NA Summary Total Count (files and folders): 326407 Directories/Folders Count: 60683 Files Count: 265724 Softlink Files Count: 0 Total Size of the files: 16620968402 Bytes, or 15.48 GB Average file size: 62549.74 Bytes, or 61.08 KB Directory modification age distribution: Age distribution Count Count % 0-90 Days 1 25.00 % 90-180 Days 0 0.00 % 180-270 Days 0 0.00 % 270 Days-1 Year 0 0.00 % 1-2 Years 3 75.00 % > 2 Years 0 0.00 % Total Folders Count: 4 File size distribution: Size distribution Count Count % Size Size % Avg Size 0-1KB 94486 35.56 % 30.46 MB 0.19 % 338.06 B >1-10KB 118334 44.53 % 436.61 MB 2.75 % 3.78 KB >10-100KB 44421 16.72 % 1.16 GB 7.47 % 27.29 KB >100KB-1MB 7619 2.87 % 2.19 GB 14.15 % 301.54 KB >1-16MB 760 0.29 % 2.43 GB 15.72 % 3.28 MB >16MB 104 0.04 % 9.24 GB 59.71 % 91.01 MB File modification age distribution: Age distribution Count Count % Size Size % 0-90 Days 80156 30.17 % 3.88 GB 25.06 % 90-180 Days 45794 17.23 % 1.39 GB 9.00 % 180-270 Days 3274 1.23 % 127.01 MB 0.80 % 270 Days-1 Year 10248 3.86 % 1.69 GB 10.92 % 1-2 Years 60932 22.93 % 1.67 GB 10.78 % > 2 Years 65320 24.58 % 6.72 GB 43.43 % Total Files Count: 265724 Extensions list sorted by files count: Large ext: Files with >=5 chars filename extension No ext : files with no extension to the filename File extension Count Count % Size Size % .go 115489 43.46 % 1.95 GB 12.57 % .py 35434 13.33 % 362.03 MB 2.28 % No Ext 30585 11.51 % 459.28 MB 2.90 % .json 19842 7.47 % 603.52 MB 3.81 % .js 9356 3.52 % 85.59 MB 0.54 % Large Ext 8711 3.28 % 222.93 MB 1.41 % .md 3335 1.26 % 18.36 MB 0.12 % .sh 2660 1.00 % 4.86 MB 0.03 % .txt 2495 0.94 % 545.18 MB 3.44 % .html 2466 0.93 % 13.01 MB 0.08 % .png 2407 0.91 % 31.35 MB 0.20 % .mod 2325 0.87 % 441.99 KB 0.00 % .a 1796 0.68 % 295.13 MB 1.86 % .s 1699 0.64 % 6.46 MB 0.04 % .yml 1598 0.60 % 2.23 MB 0.01 % .dat 1280 0.48 % 30.87 MB 0.19 % .h 1279 0.48 % 26.20 MB 0.17 % .lock 1260 0.47 % 205.96 KB 0.00 % .pyc 1169 0.44 % 16.11 MB 0.10 % .rst 1132 0.43 % 4.19 MB 0.03 % .xml 1078 0.41 % 4.98 MB 0.03 % .svg 873 0.33 % 13.64 MB 0.09 % Extensions list sorted by the size of files: File extension Count Count % Size Size % .pack 188 0.07 % 5.13 GB 33.17 % .go 115489 43.46 % 1.95 GB 12.57 % .zip 783 0.29 % 1.24 GB 7.98 % .rar 4 0.00 % 1.17 GB 7.57 % .lib 108 0.04 % 606.59 MB 3.83 % .json 19842 7.47 % 603.52 MB 3.81 % .dll 156 0.06 % 600.24 MB 3.79 % .exe 543 0.20 % 599.50 MB 3.78 % .txt 2495 0.94 % 545.18 MB 3.44 % No Ext 30585 11.51 % 459.28 MB 2.90 % .py 35434 13.33 % 362.03 MB 2.28 % .a 1796 0.68 % 295.13 MB 1.86 % Large Ext 8711 3.28 % 222.93 MB 1.41 % .tgz 45 0.02 % 193.59 MB 1.22 % .pdb 465 0.17 % 193.24 MB 1.22 % .db 3 0.00 % 112.15 MB 0.71 % .idx 188 0.07 % 104.10 MB 0.66 % .bmp 152 0.06 % 98.91 MB 0.62 % .tar 101 0.04 % 97.18 MB 0.61 % .c 703 0.26 % 91.81 MB 0.58 % .js 9356 3.52 % 85.59 MB 0.54 % .so 93 0.03 % 70.69 MB 0.45 % Average Width: 5 (average number of files in each directory) Average Depth: 8 (average directory depth) Maximum Depth: 20 (max directory depth found during the scan) Maximum Width: 3559 (max number of files found in a single directory) Scanning Rate: 9636 (files scanned per second) Scanning Time: 33 (total scan time in seconds)
Parameter Description Scanned directory Directory path for analysis. Include path(s) Path(s) to include under scanned directory. Exclude folders Shows the folders to be excluded. Exclude extensions Shows the extensions to be excluded. Include extensions Shows the extensions to be included. Total Count (files and folders) Shows the total number of files and folders. Directories/Folders Count Shows the count of all the folders/directories. Files Count Shows the total count of files. Softlink Files Count Count of soft link files Total Size of the files Shows the total size of all the files in the directory. Average file size Shows the average size of a file in the directory. File size Distribution Shows the size distribution of files in a backup set. A [0-1KB : 1] indicates that only a single file with a file size between 0 to 1 KB was encountered during the scan. File modification age distribution Distribution of files according to their modification age. Directory modification age distribution Distribution of directories according to their modification age. Extensions list sorted by files count Shows the list of file extensions sorted by file count. Large ext Shows the files that have greater than or equal to five characters in the filename extensions. No ext Shows the files that have no extensions to the filename. Extensions list sorted by the size of files Shows the list of file extensions sorted by file size. Average Width Shows the average number of files in each directory. Average Depth Shows the average depth of the directory tree. Maximum Depth Shows the maximum depth of the directory tree during the scan. Maximum Width Shows the maximum number of files found in a single directory. Scanning Rate Shows the rate (in files per second) with which the files were scanned. Scanning time Shows the total scan duration (in seconds).