Skip to main content

How can we help you?

Druva Documentation

Possible causes of Cloud Cache Decommissioning process gets stuck

This article applies to:

  • Product edition: Phoenix

Problem description

The Cloud Cache decommissioning process gets stuck.

Causes and Resolution

This article describes the different scenarios that can cause this issue. The stages of decommissioning are as follows:

  1. Backup sets are unmapped from CloudCache.
  2. Backups and restores to and from the Phoenix CloudCache stop.
  3. Phoenix CloudCache waits for the next scheduled synchronization operation to flush the unsync data from CloudCache to Phoenix Cloud.
  4. Phoenix removes the data blocks from Cache Store.
  5. Phoenix removes the Cache Entries from Phoenix UI and database.

Scenario-1: When the Cloud Cache is disconnected

Ensure the Phoenix Cloud Cache status is connected to the Phoenix Cloud until the entire decommissioning process and removal from the UI are complete. Do not disconnect the CloudCache when the decommissioning is in progress.

Traceback

[2019-03-11 14:47:52,650] [ERROR] [wpid 91-4228-1548748451] Failed to connect, error:Failed to connect to client. (#100000011)
[2019-03-11 14:47:52,650] [INFO] [wpid 91-4228-1548748451] CacheFlush activity disconnected. wid 0
[2019-03-11 14:47:52,650] [INFO] Cache CacheFlush activity disconnected. Bytes read 0.00 B. Bytes written 0.00 B.
[2019-03-11 14:47:55,650] [ERROR] [wpid 226-940-1481039643] Error <class 'inSyncLib.inSyncError.SyncError'>:Failed to connect to client. (#100000011). Traceback -Traceback (most recent call last):
  File "roboCacheWorker.py", line 284, in runserver
  File "inSyncLib\inSyncRPCServer.pyc", line 557, in serve
SyncError: Failed to connect to client. (#100000011)
Snip

Resolution

Ensure that the Phoenix Cloud is connected to the Phoenix CloudCache. Fix the connectivity issues if any to resume the decommissioning.

The Cache status remains stuck on Decommissioning in Progress if the CloudCache is disconnected.

Scenario-2: Network bandwidth allocated and sync schedule is insufficient

According to the decommissioning workflow, Phoenix synchs the data pending for synchronization to Phoenix Cloud in the next cache schedule cycle. Since this takes time, CloudCache displays Decommissioning in Progress depending on:

  • The size of the data to be synchronized
  • The available network bandwidth
  • The duration specified in the sync schedule

Configure the CloudCache synchronization schedule for 24 hours for 7 days to un-interrupted decommission. Ensure that you select the Max Available Bandwidth in your environment.(The bandwidth is measured in Megabits/second)

Scenario-3: Phoenix CacheStore is unavailable

The decommission can get stuck if the customer has initiated a decommission process and all the data has been synced to the cloud; however, the PhoenixCacheStore is unavailable. This can occur when:

  • The volume on which the CacheStore resides has been formatted.
  • The disk on which the CacheStore resides has crashed.

Traceback

[2019-03-11 15:47:56,732] [ERROR] Error <type 'exceptions.Exception'>:CRITICAL: Cache store folder E:\PhoenixCacheStore does not exist on file system. Exiting. Traceback -Traceback (most recent call last):

File "roboCacheServer.py", line 236, in server_main

File "roboCacheServer.py", line 389, in _server_main

Exception: CRITICAL: Cache store folder E:\PhoenixCacheStore does not exist on file system. Exiting.

Resolution

Do not format the volume where the Cache Store is residing until the decommissioning process is complete. Contact Druva Support to further troubleshoot this scenario.

Scenario-4: Storage Mapped does not exist

Cloud Cache decommissioning can get stuck when the process is initiated but the storage mapped to the Cloud Cache is deleted.  In this case, unflushed data accumulates in the DB that is never synced.

Resolution

Contact Druva Support to troubleshoot this scenario.