Skip to main content

How can we help you?

Druva Documentation

Scanner CLI utility

Use the Scanner CLI utility

The Scanner CLI utility allows you to analyze the file system or NAS shares and get insight into the file and directory structure. Before you run the Scanner CLI utility for the NAS shares, mount the NAS shares manually. This utility is bundled with the Hybrid Workloads agent. So, when you install the latest version of the Hybrid Workloads agent, you get access to the Scanner CLI utility. You can get information, such as the number of folders and files present, directory and file level, and data changed rate. When you run the Scanner CLI utility for the first time, a full scan is performed and all the subsequent scans may be incremental or full based on the configuration parameter specified in the configuration file. You will notice a significant improvement in the incremental scans performed after the first full backup.

You can run the Scanner CLI utility using:

Command line interface procedure

Run one of the following commands:

scanner-cli.exe <Configuration file path>

Or

scanner-cli.exe <Directory path for analysis> <Directory in which to create the output files>

If you use this command, then the default parameters will be used. To override these parameters, you must create a configuration file and run it.

Configuration file procedure

Perform the following: 

  1. Create a configuration file in the YAML format by copying the following snippet to a text file and saving the file in the YAML format.  Or, you can download the following sample config.yml.
    root_paths: [<root path1>, <root path2>]
    fset_dir: <Specify fset directory path>
    scan_worker_count: 50
    sqlite_n_conns: 8
    results_threshold: 10000
    results_file: <Specify the location where the result file will be saved>
    processed_data_file: <Specify the location where the output file with formatted data will be saved>
    use_usn: false
    force_scan: false
    ss_age_threshold: 0
    skip_acl: true
    statemap: false 
    log_file: <Specify the location where the log file will be saved>
    db_file_path: <Specify the database file path>
    filters: 
        exclude_folders: [<folder name>, <folder name>]
        exclude_extensions: ""
        include_extensions: ""
    
    
    • root_paths: Specify the absolute or full path of the directories that you want to scan. For NAS, the root path should be the path where the share is mounted. 
      Default: [ ]
    • fset_dir: Specify the drive letter. 
      • For Windows and CIFS share, the fset directory is '<Drive letter:\>', and
      • For Linux and NFS Share, the fset directory is  '/'
      • Default: NA
    • scan_worker_count: The number of threads that are to be used for scanning.
      Default: 50
    • sqlite_n_conns: The number of connections to be established with SQLite.
      Default: 8 (recommended)
    • results_threshold: Minimum batch size using which the output will be displayed on the console when the utility is run.
      Default: 10000 (recommended)
    • results_file: Specify the location of the results file, which will contain information about the changed data.  A timestamp is appended to the results file name after each Scanner CLI utility run.
      ResultsFile_<FsetDir>_<Timestamp>
    • use_usn: Windows USN
      Default: false
    • force_scan: Set to 'true' if you want to run a full scan forcefully instead of an incremental scan. The recommended value is 'false'.  This is not applicable for the first full scan.
    • Default: false
    • skip_acl: Set to 'false' to skip detecting the Access Control Lists (ACLs) changes.  This is not applicable for the first full scan.
      Default: false
    • log_file: Specify the location where you want to save the scanner log files.
      Default: ScannerLog_<FsetDir>.log
    • statemap: Set to false.
      Note: This improves scan performance. If you are planning to run an incremental scan after the first full scan, this parameter needs to be set to 'true' for all runs including the first full run.
      Default: false
    • db_file_path: Specify the location of the file which will be used to store the persistent state of the scanner.
      Default: DBFile_<FsetDir>.db
    • exclude_folders: List of folders to be excluded from the scan. For example, 
      exclude_folders: [dev, /proc, /etc, Phoenix]
      Default: ["/proc",
                  "/sys", "/dev", "/tmp", "/lost+found", "/etc/Phoenix",
                  "/var/Phoenix", "/selinux", "$Recycle.Bin", "ProgramData",
                  "Recovery", "System Volume information",
                  "RECYCLER", "C:\\Program Files (x86)",
                  "C:\\Program Files", "C:\\Windows",
                  ".snapshot", ".Snapshot", ".SNAPSHOT"]
    • exclude_extensions: List of file extensions to be excluded from the scan. The extensions must be separated by a semicolon. For example,
      exclude_extensions: "*.log;*.bat"
      Default: " "
    • include_extensions: List of file extensions to be included in the scan. The extensions must be separated by a semicolon. For example, 
      include_extensions: include_extensions:
      If a file extension is added to the include list and exclude list, then the file extension will be excluded as the exclusion takes precedence over inclusion. 
      Default: " "  
    Note:  

    You can override the value of any of these parameters by adding it to the yaml file. Make sure to also add the “fset_dir: <Specify fset directory path>” in the yaml files, as it is a mandatory parameter.

    If you do not provide the values for the results_file, processed_data_file, log_file, and db_file_path parameters in the yml file, then the files with the default names will get created at the location from where the utility is run.

    Note:  The statistics in the ProcessedDataFile is applicable only in the first run.
  1. Download and install the latest version of the Hybrid Workloads agent from the Downloads page.
  2. In case of Linux, increase the file descriptor (FD) limit by using the following command:
    ulimit -n 65000

Review scan result

Once the scan is complete, 

  • A result file is generated at the location specified in the configuration file. This result file contains the following information about the changed data:
    Changed_data_info.png
    ChangeType - Indicates the type of change, such as file added, file modified, file deleted.
    ItemType - Indicates the type of file: 'F' indicates a file, 'D' indicates a directory, and 'L' indicates link.
    Mode - Indicates the Standard OS File Mode (uint32).
    MTime - Indicates the modification time of the file or the folder.
    Size - Indicates the size of the file in bytes.
    Path - Indicates the full path of the file.
  • A log file is generated at the location specified in the configuration file. The output file contains the following telemetry information.

ScannerCLI_New_Output.png
 

  • An output file (processed_data_file) with the formatted data is generated that contains the following telemetry information.
     
    Scanned directory:    D:\ 
    Include path(s):      []
    Exclude folders:      /proc, /sys, /dev, /tmp, /lost+found, /etc/Phoenix, /var/Phoenix, /selinux, $Recycle.Bin, ProgramData, Recovery, System Volume information, RECYCLER, C:\Program Files (x86), C:\Program Files, C:\Windows, .snapshot, .Snapshot, .SNAPSHOT 
    Exclude extensions:   NA 
    Include extensions:   NA 
            
    Summary 
    Total Count (files and folders):            326407               
    Directories/Folders Count:                   60683                
    Files Count:                                265724               
    Softlink Files Count:                            0                     
    Total Size of the files:               16620968402 Bytes, or 15.48 GB  
    Average file size:                        62549.74 Bytes, or 61.08 KB
     
    Directory modification age distribution:  
    Age distribution        Count   Count %  
    0-90 Days                   1   25.00 %  
    90-180 Days                 0    0.00 %  
    180-270 Days                0    0.00 %  
    270 Days-1 Year             0    0.00 %  
    1-2 Years                   3   75.00 %  
    > 2 Years                   0    0.00 %  
    Total Folders Count:        4
     
    File size distribution: 
    Size distribution       Count   Count %    Size       Size %  Avg Size  
    0-1KB                   94486   35.56 %   30.46 MB    0.19 %  338.06 B  
    >1-10KB                118334   44.53 %  436.61 MB    2.75 %    3.78 KB 
    >10-100KB               44421   16.72 %    1.16 GB    7.47 %   27.29 KB 
    >100KB-1MB               7619    2.87 %    2.19 GB   14.15 %  301.54 KB 
    >1-16MB                   760    0.29 %    2.43 GB   15.72 %    3.28 MB 
    >16MB                     104    0.04 %    9.24 GB   59.71 %   91.01 MB 
                                                                           
    File modification age distribution: 
    Age distribution        Count   Count %    Size       Size % 
    0-90 Days               80156   30.17 %    3.88 GB   25.06 % 
    90-180 Days             45794   17.23 %    1.39 GB    9.00 % 
    180-270 Days             3274    1.23 %  127.01 MB    0.80 % 
    270 Days-1 Year         10248    3.86 %    1.69 GB   10.92 % 
    1-2 Years               60932   22.93 %    1.67 GB   10.78 % 
    > 2 Years               65320   24.58 %    6.72 GB   43.43 % 
    Total Files Count:     265724 
                                                             
    Extensions list sorted by files count:
    Large ext: Files with >=5 chars filename extension
    No ext : files with no extension to the filename
    File extension          Count   Count %    Size       Size % 
    .go                    115489   43.46 %    1.95 GB   12.57 % 
    .py                     35434   13.33 %  362.03 MB    2.28 % 
    No Ext                  30585   11.51 %  459.28 MB    2.90 % 
    .json                   19842    7.47 %  603.52 MB    3.81 % 
    .js                      9356    3.52 %   85.59 MB    0.54 % 
    Large Ext                8711    3.28 %  222.93 MB    1.41 % 
    .md                      3335    1.26 %   18.36 MB    0.12 % 
    .sh                      2660    1.00 %    4.86 MB    0.03 % 
    .txt                     2495    0.94 %  545.18 MB    3.44 % 
    .html                    2466    0.93 %   13.01 MB    0.08 % 
    .png                     2407    0.91 %   31.35 MB    0.20 % 
    .mod                     2325    0.87 %  441.99 KB    0.00 % 
    .a                       1796    0.68 %  295.13 MB    1.86 % 
    .s                       1699    0.64 %    6.46 MB    0.04 % 
    .yml                     1598    0.60 %    2.23 MB    0.01 % 
    .dat                     1280    0.48 %   30.87 MB    0.19 % 
    .h                       1279    0.48 %   26.20 MB    0.17 % 
    .lock                    1260    0.47 %  205.96 KB    0.00 % 
    .pyc                     1169    0.44 %   16.11 MB    0.10 % 
    .rst                     1132    0.43 %    4.19 MB    0.03 % 
    .xml                     1078    0.41 %    4.98 MB    0.03 % 
    .svg                      873    0.33 %   13.64 MB    0.09 % 
     
    Extensions list sorted by the size of files:
    File extension          Count   Count %    Size       Size % 
    .pack                     188    0.07 %    5.13 GB   33.17 % 
    .go                    115489   43.46 %    1.95 GB   12.57 % 
    .zip                      783    0.29 %    1.24 GB    7.98 % 
    .rar                        4    0.00 %    1.17 GB    7.57 % 
    .lib                      108    0.04 %  606.59 MB    3.83 % 
    .json                   19842    7.47 %  603.52 MB    3.81 % 
    .dll                      156    0.06 %  600.24 MB    3.79 % 
    .exe                      543    0.20 %  599.50 MB    3.78 % 
    .txt                     2495    0.94 %  545.18 MB    3.44 % 
    No Ext                  30585   11.51 %  459.28 MB    2.90 % 
    .py                     35434   13.33 %  362.03 MB    2.28 % 
    .a                       1796    0.68 %  295.13 MB    1.86 % 
    Large Ext                8711    3.28 %  222.93 MB    1.41 % 
    .tgz                       45    0.02 %  193.59 MB    1.22 % 
    .pdb                      465    0.17 %  193.24 MB    1.22 % 
    .db                         3    0.00 %  112.15 MB    0.71 % 
    .idx                      188    0.07 %  104.10 MB    0.66 % 
    .bmp                      152    0.06 %   98.91 MB    0.62 % 
    .tar                      101    0.04 %   97.18 MB    0.61 % 
    .c                        703    0.26 %   91.81 MB    0.58 % 
    .js                      9356    3.52 %   85.59 MB    0.54 % 
    .so                        93    0.03 %   70.69 MB    0.45 % 
                                                                                    
    Average Width:             5  (average number of files in each directory)       
    Average Depth:             8  (average directory depth)                         
    Maximum Depth:             20  (max directory depth found during the scan)       
    Maximum Width:           3559  (max number of files found in a single directory) 
    Scanning Rate:           9636  (files scanned per second)                        
    Scanning Time:             33  (total scan time in seconds)
    
    
     
    Parameter Description
    Scanned directory Directory path for analysis.
    Include path(s) Path(s) to include under scanned directory. 
    Exclude folders Shows the folders to be excluded.
    Exclude extensions Shows the extensions to be excluded.
    Include extensions Shows the extensions to be included.
    Total Count (files and folders) Shows the total number of files and folders.
    Directories/Folders Count Shows the count of all the folders/directories.
    Files Count Shows the total count of files.
    Softlink Files Count Count of soft link files
    Total Size of the files Shows the total size of all the files in the directory.
    Average file size Shows the average size of a file in the directory.
    File size Distribution Shows the size distribution of files in a backup set. A [0-1KB : 1] indicates that only a single file with a file size between 0 to 1 KB was encountered during the scan.
    File modification age distribution Distribution of files according to their modification age.
    Directory modification age distribution Distribution of directories according to their modification age.
    Extensions list sorted by files count Shows the list of file extensions sorted by file count.
    Large ext Shows the files that have greater than or equal to five characters in the filename extensions.
    No ext Shows the files that have no extensions to the filename.
    Extensions list sorted by the size of files Shows the list of file extensions sorted by file size.
    Average Width Shows the average number of files in each directory.
    Average Depth Shows the average depth of the directory tree.
    Maximum Depth Shows the maximum depth of the directory tree during the scan.
    Maximum Width Shows the maximum number of files found in a single directory.
    Scanning Rate Shows the rate (in files per second) with which the files were scanned.
    Scanning time Shows the total scan duration (in seconds).