Skip to main content

How can we help you?

Druva Documentation

Troubleshooting Instant Restore issues

Phoenix Editions: File:/tick.png Business File:/cross.png Enterprise File:/tick.png Elite

 

This topic describes common workarounds for the issues that you might encounter while performing the following tasks:

  • Restoring virtual machines instantly
  • Migrating instantly restored VMs to production

  • Deleting instantly restored VMs

  • Other issues

Commands for debugging instant restore jobs

You can use the following commands to debug the issues related to instant restore job:

Command Description
ps -ef | grep PhoenixIRAgent Use to check if the IRAgent process is spawned on the Backup proxy.
ps -ef | grep PhoenixIRService Use to check if the IRService is running on Phoenix CloudCache.
ps -ef | grep PhoenixIRFS Use to check if the IRFS process is spawned  on Phoenix CloudCache.
/mnt/instantrestore NFS mount share on Backup proxy location.
sqlite3 CCBmap.db Use to view the sqllite db CCBmap.db on CloudCache location /mnt/data/instantrestore/{internal_job id}/bmap. 

Common issues

The following are some of the common issues that you might face while performing instant restore of VMs, and migration, or deletion of instantly restored VMs:

Issue Resolution
Migration of an instantly restored VM fails if you manually migrate the instantly restored VM to another datastore.  On the Instant Restored VMs page, select the VM for which migration failed and perform the  manual cleanup steps. For more information, see Steps for cleaning the datastore manually.
Migration of an instantly restored VM fails if another datastore is attached to the instantly restored VM apart from the instantly restored datastore. Do either of the following:
  • Detach the disk and then trigger migration. 
  • Delete the instantly restored VM.
Instant restore or migration to production fails when the operating system buffers consume a big portion of RAM due to which the NFS servers do not start, and the following traceback is shown on the terminal:

root@cloudcache:~# service nfs-server start
Job for nfs-server.service canceled.

root@cloudcache:~# service nfs-server status
● nfs-server.service - NFS server and services
     Loaded: loaded (/lib/systemd/system/nfs-server.service; enabled; vendor preset: enabled)
    Drop-In: /run/systemd/generator/nfs-server.service.d
             └─order-with-mounts.conf
     Active: failed (Result: exit-code) since Wed 2021-06-09 12:08:22 UTC; 1min 27s ago
    Process: 3311885 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS)
    Process: 3311887 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited, status=1/FAILURE)
    Process: 3311889 ExecStopPost=/usr/sbin/exportfs -au (code=exited, status=0/SUCCESS)
    Process: 3311890 ExecStopPost=/usr/sbin/exportfs -f (code=exited, status=0/SUCCESS)
   Main PID: 3311887 (code=exited, status=1/FAILURE)

Jun 09 12:08:22 cloudcache systemd[1]: Starting NFS server and services...
Jun 09 12:08:22 cloudcache rpc.nfsd[3311887]: error starting threads: errno 12 (Cannot allocate memory)
Jun 09 12:08:22 cloudcache systemd[1]: nfs-server.service: Main process exited, code=exited, status=1/FAILURE
Jun 09 12:08:22 cloudcache systemd[1]: nfs-server.service: Failed with result 'exit-code'.
Jun 09 12:08:22 cloudcache systemd[1]: Stopped NFS server and services.
root@cloudcache:~# service nfs-server start

 

Run the following command to verify and fix the issue:
root@cloudcache:~# free -m
             total        used        free      shared     buff/cache   available
Mem:         12003        3072         433           0        8497        8634
Swap:        4095          23        4072

      Drop buffer
    sync && echo 3 > /proc/sys/vm/drop_caches

Verify
root@cloudcache:~# free -m
                      total        used        free      shared  buff/cache   available
Mem:          12003        3085        8662           0         255        8714
Swap:          4095          23        4072

 

The instant restore and migration job fails if the PhoenixIRFS (Fuse) process is not able to start and the log file shows entries similar to this:


level=debug ts=2021-05-21T11:44:57.677353062Z filename=fsm.go:228 message="Fuse Process Creation Failed" ExportPath=/mnt/data/instantrestore/75/mnt Outputofsearch="root        6392  0.0  0.0   5648  3064 pts/0    S    11:44   0:00 /bin/bash -c ps -aux | grep -i '/mnt/data/instantrestore/75/mnt'\nroot        6394  0.0  0.0   5216  2528 pts/0    S    11:44   0:00 grep -i /mnt/data/instantrestore/75/mnt\n"

  • If there are multiple unused IRed datastores present on ESX then delete the datastore.
    • First delete the instantly restored VM from the datastore.

    • If the VM is inaccessible, then you can’t delete the VM. In that case, remove it from the inventory by using the VM setting (Action button or right click option).

    • Unmount the datastore by right clicking datastore and selecting the unmount option.

  • Additionally, you can also  update the NFS max limit on the ESXi host. For more information, see the follow KB article Increasing the default value that defines the maximum number of NFS mounts on an ESXi/ESX host (2239).
Instant restore fails during exporting NFS share
  • Remove the stale entries present at present at /etc/export and restart NFS service “service nfs-server restart” on CloudCache
  • Re-trigger the instant restore job.
Delete custom command fails if the instantly restored datastore is not attached to the instantly restored VM. This indicates that the VM is already migrated manually or by the migration job. Incase of manual migration, the instantly restored datastore gets detached but not deleted. You must manually clean up the datastore. For more information, see Steps for cleaning the datastore manually
If CloudCache service is restarted during instant restore or migration, the ongoing job might fail.For existing running instantly restored VMs which are not migrated or deleted, stopping or restarting the Phoenix Cache Server service kills all the IRFS processes running on CloudCache due to which the instantly restored VM and datastore goes in an inaccessible state.
  • Either migrate the instantly restored VMs or delete them before restarting Cache service. IRService is restarted as part of Phoenix Cache Server service restart.
  • If the VMs go in an inaccessible state, delete those VMs by using the delete custom command from Phoenix Management Console.

After installing the latest rpm manually, you get an error while migrating credentials during a client upgrade or phoenix service restart. Set credentials again using vCenterDetail set command.
For validating communication between the Backup proxy, IRAgent, and CC IRService, a token is generated in cacheconfig and passed in each request triggered by IRAgent. IRService decrypts this token and  validates the request based on the cache id and time within the token. The request might fail during token decryption or token expiry. Retrigger the instant restore or migration job.

Unable to mount the datastore due max NFS datastore limit.

Failed to mount to server 10.x.x.x mount point /mnt/nfs-share/subdir/subdir. NFS has reached the maximum number of supported volumes.

This is due to a VMware/ESXi configuration.  Please refer to VMware documentation for more details.

Maximum supported volumes reached

Increasing the default value that defines the maximum number of NFS mounts on an ESXi/ESX host

Set custom export path on cloudcache

Perform the following steps to set the custom export path on Phoenix CloudCache:

  1. Open the /etc/PhoenixCloudCache/PhoenixCloudCache.cfg file.

  2. Set your desired path against the variable IR_CUSTOM_EXPORT_PATH.

  3. Save the PhoenixCloudCache.cfg file and trigger the instant restore job.

Steps for cleaning the datastore manually

This topic covers the actions that you need to perform while cleaning the datastore manually.

Migration is successful with some errors

The following issues may occur after migrating an instantly restored VM to production:
 

Issue Resolution
Deletion of datastore fails
  • Delete the respective datastore from the datastore listing page. Make sure you note the datastore summary such as the full path of NFS and the NFS version before deletion.
  • Migrate the VM if present on the instantly restored datastore.
  • Perform CloudCache cleanup. For more information, see Steps to cleanup the CloudCache machine.
Clean up of the CloudCache fails Perform CloudCache clean up manually. For more information, see Steps to cleanup the CloudCache machine.

Deletion of instantly restored VM fails

The deletion of an instantly restored VM can fail in the following scenarios:

  • The instantly restored VM is already migrated to a different datastore.

  • Customer has attached a disk of a different datastore.

Resolution

  • Delete the instantly recovered VM from the  vCenter if it is not migrated to production environment

  • Delete respective datastore from the  datastore listing page.

  • Migrate the VM if present on the instantly restored datastore.

Steps to cleanup the CloudCache machine

After the instantly restored VM is deleted or migrated, perform the following steps to remove the respective entries from the cloudcache machine:

Step Action
Remove specific export path from the /etc/exports file. If the instantly restored datastore exists in the vCenter, check the path in datastore summary page:
DeleteProcess1.PNG
Unmount the export path by killing the IRFS process.
  • Search the process as shown in the following example:
    ps -aux | grep /mnt/data/instantrestore/390/mnt 

  • Delete the respective folder as shown in the following example:
    rm -rf /mnt/data/instantrestore/390

Steps to recover a virtual machine

  1. Spawn the IRFS process manually.
    Search for the fuse command for respective command line params for fuse in the IRService log file.
    export ROOT_DIR=”<path from log>”
    export BMAP_DIR=”<path from log>” 

    Execute the command from log directly
    Sample command - 
    nohup /usr/bin/PhoenixIRFS  -f -o allow_other -o ccstoremapstr='{\"1\":\"/mnt/data/PhoenixCacheStore\"}' -o cachekey='MSK3AgSvLg91AQ/SErTniPuyv/+B4AzzOuLPNfktXn9mqIWmcYEZDf/u1VRjfv597x1uPvBGFjl3W7ELsRInBWuDz1eeFtLNu4wj96BJFhLYQisY+cpRMoffXtx9m+rHJ4PTC6BSq4dW+ygOka+cX4eC297IWxBtuM1kU47lKjLiHvAwv4EWSo7BVTT/ek8r' -o cacheid=1 -o storageid=3 -o csetid=1 -o jobid=5 -o cache_disabled=0 -o fips_enabled=0  -o auto_unmount /mnt/data/instantrestore/5/mnt &> /var/log/PhoenixCloudCache/irfs/1/libphoenixfs-5.log &

  2. Perform the following steps depending on the VM power state:
     

    VM Power state Steps
    OFF No action to be taken.
    ON
    1. Power OFF VM from the  vsphere console.
    2. If the VM goes to an orphan invalid state, remove and add the VM back to inventory by performing the following steps:
       
      1. Check the  datastore and VM folder that has the vmx file of the  instantly restored VM.

      2. Remove this VM by clicking Actions > Remove from Inventory.

      3. From the datastore view, go to the respective VM folder.

      4. Select the vmx file and click Register VM.

    3. Power ON the VM.

     

  • Was this article helpful?