Skip to main content

How can we help you?

Druva Documentation

Scenarios when the performance of an incremental VM backup is impacted and the job fails with the Backup Window Expired error.

 

Problem description

A VM incremental backup takes longer than expected, and the backup eventually fails with the Backup Window Expired error.

Phoenix works on ever incremental backups. The first backup will always be a full backup for the selected virtual machines. The subsequent backup requests are incremental backups if the CBT flag is enabled in the backup policy. The backup proxy uses quiescing to create a virtual machine snapshot. For the first automatic backup, the backup proxy creates a snapshot of a full backup. For all subsequent backups, the backup proxy creates snapshots of the incremental backups. In the job workflow, the transport mode also plays a significant role in improving the performance of the VM backup job link. 

Changed Block Tracking

See Changed Block Tracking for a checklist of the CBT settings that can impact job performance. Phoenix uses the VMware CBT technology to track changes during incremental backups. We need to verify the amount of change CBT reports during backup and find out if it is unusual. Use the following traceback in the Job logs to determine the total CBT reported changes and check for discrepancies if any.

Traceback

  1. From logs you can check the size presented by CBT as follows:

    Line 5001: [2020-05-23 22:01:41,416] [INFO] blocks changed for disk [ABC.vmdk] reported by VMWare :1272315904

    Line 5009: [2020-05-23 22:01:41,505] [INFO] blocks changed for disk [ABC_1.vmdk] reported by VMWare :158867390464

    Line 5023: [2020-05-23 22:01:41,670] [INFO] blocks changed for disk [ABC_2.vmdk] reported by VMWare :363299078144

    Line 5037: [2020-05-23 22:01:41,763] [INFO] blocks changed for disk [ABC_3.vmdk] reported by VMWare :14992670720

    The values are in bytes, as per the example above all the disks reported a change of (Bytes : 1272315904+158867390464 +363299078144 +14992670720) 500 GB

  2. Another way of checking this in the logs is checking for Scan Type - This is to confirm that the scheduled incremental  backup is running as incremental and not full.  The job logs must report the SCAN TYPE: Incremental disk scan 

    [2020-04-20 22:01:34,723] [INFO] roboSyncer: scheck abort check is disabled

    [2020-04-20 22:01:34,723] [INFO] roboSyncer: Sending log to Phoenix server with message : 'Requested disk scan type: Incremental disk scan.'

Transport Mode

See the Transport modes article to understand all the transport mode settings and how they impact job performance. Phoenix uses the HotAdd transport mode in most cases. However, there are scenarios where the transport mode falls back to NBDSSL or NBD transport mode. The change in transport mode can impact the job performance, and the job can fail with the backup window expired error.

Traceback

You will see the following message in the logs that can help you identify the transport mode used.

[2020-04-20 22:01:37,449] [INFO] Actual VM transport mode pass to vddk lib: hotadd:nbdssl:nbd

5C:EC:ED:F1:14:BB:0C:E4:42:7D:86:6C:47:DD:A5:A7:38:33:51:0F, nfc host port: 0

[2020-04-20 22:01:40,064] [INFO] roboSyncer: Sending log to Phoenix server with message : 'Transport mode used for disk ABC.vmdk: HOTADD’

Resolution

To resolve performance issues due to Change Block Tracking

 If the CBT reported changes are unusual or huge, there could be an issue with the CBT setting of the VM.

  1. CBT Soft reset: You can disable and enable CBT on the associated virtual machines to resolve this issue. 
  2. CBT Hard Reset: If the issue persists even after a CBT Soft reset, proceed with a CBT Hard reset. Follow the instructions in the Resetting Changed Block Tracking for VMware vSphere virtual machines article after receiving confirmation from VMware support.  
    Determine if there is any Storage Optimization (defrag) running on the VM, thereby causing the thin disk to inflate, resulting in the CBT reporting high changes. Log a case with the VMware/OS team to troubleshoot further into the cause.

To resolve performance issues due to the transport mode

The HotAdd transport mode is the fastest amongst all transport modes. Ensure that you meet the prerequisites for HotAdd and that the backup job uses this transport mode over the others.

 

  • Was this article helpful?