This article describes the best practices that must be following while using Phoenix Disaster Recovery As a Service (DraaS).
Table of contents
Phoenix AWS Proxy
Phoenix AWS proxy, also referred to as DR proxy, is an EC2 instance that runs in the customer’s AWS account. The Phoenix AWS proxy runs the Phoenix Disaster Recovery service and is responsible for orchestrating the DR Restore, DR failback, and DR failover. The DR proxy is deployed using the AWS CloudFormation template. The DR proxy deployment takes less than 10 minutes.
- Druva recommends that you deploy at least two DR proxies in separate availability zones for high availability.
Note: Each DR proxy can run three DR restore jobs concurrently.
- The recommended EC2 instance size for the Phoenix AWS proxy is c5.2xlarge.
|Instance type||vCPU||Memory(GiB)||Instance Storage(GiB)||Network Bandwidth (Gbps)||EBS Bandwidth( Mbps)|
|c5.2xlarge||8||16||EBS-Only||Upto 10||Upto 4,750|
The DR proxy must have access to the following services:
- EC2-API, and
Druva CloudFormation template creates endpoints that provide connectivity to these services over AWS private network.
- Ensure that the EC2 key pair assigned to the Phoenix AWS Proxy is stored in a secure location. The key pair is used to access the Phoenix AWS Proxy for troubleshooting only.
While defining networking mappings in a DR plan, we need you to map the vCenter source network to a VPC and subnet on the target AWS account.
- If you create a new Amazon VPC, you don’t need to attach an Internet Gateway(IGW) to it, as the Phoenix AWS Proxy uses the AWS private link for all communication.
- Ensure that DNS hostnames and DNS resolution are enabled within the VPC.
DR prerequisite checks
DR prerequisite checks run while the VM backup is in progress and ensures that the VM meets all the DR failover and failback requirements. Ensure that all the prerequisite checks are successful for a successful DR failback or failover.
When the DR prerequisites checks do not execute
The DR prerequisite checks may not execute at all for one or more of the following reasons:
- The VMware Backup proxy is unable to communicate with the ESX host on port 443. Enable communication between the backup proxy and the ESX host on port 443.
- The VMware Backup proxy is on a version older than 4.8.11. Upgrade the VMware backup proxy to the latest version, and ensure that the first VM backup after the proxy upgrade is successful.
- The DR prerequisite checks may not execute at all while the VM backup is in progress if the VM cannot connect to Druva download portal at https://downloads.druva.com/phoenix/ to download the prerequisite check executables. If the VM is unable to connect to Druva download portal and download the prerequisite check executables, ensure that:
The URL https://downloads.druva.com/phoenix/ or *.druva.com is allowed through the network firewall.
If the VM in question is a Windows VM, disable UAC on the VM.
Exclude the DR prerequisite check executables from any antivirus software running on the VM. The following table lists the DR prerequisite check executables that must be excluded depending upon the VM operating system.
|Operating system||Prerequisite check executable|
Resolving prerequisite check errors
If the prerequisites checks fail or pass with warnings, resolve the errors or warnings before re-running the backup job.
- Credentials: Ensure that the VMs whose disaster recovery you want to perform have credentials assigned to them. If credentials are not assigned to virtual machines or are invalid, Phoenix will not perform prerequisite checks. The user account must have the following privileges:
Windows virtual machines
- The account must have local administrative privileges.
- UAC must be disabled on the virtual machine. See disabling UAC on Windows server for more information.
Linux virtual machines
- The account must have sudo privileges.
- Virtual Machines
- The VM must be running for the prerequisite check to work.
- The VM must have VMware tools installed on it.
- The VM must have at least 1 GB of free space on the boot partition.
- Ensure that all Druva processes are whitelisted in any antivirus software running on the virtual machine.
Here are all the 14 Windows files that must be whitelisted:
Here are all the Linux files that must be whitelisted: (The /opt/druva files are installed by Druva as part of the DR Failover operation)
- Pending reboots and check disks
The DR prerequisite checks detect pending Windows updates and scheduled chkdsk operations, and generate warnings. You must act upon these warnings and resolve them at the earliest. Phoenix triggers a DR replication copy (also referred to as a DR copy) depending upon the DR plan's replication frequency irrespective of the warnings. The DR copy job fails immediately with the error DR24577 if the prerequisite checks detected pending Windows updates. The DR copy job completes if pending chkdsks were detected. However, DR failovers from this copy may fail when the VM reboots and the timeout value is reached while running Windows chkdsk.
Add virtual machines to DR plan
A DR plan includes a group of virtual machines, the DR restore frequency and all the disaster recovery settings that help you perform a single click failover.
- When a VM is added to a DR plan, Phoenix automatically assigns a few default failover settings. The default settings are:
- instance_type = t2.micro
- public_ip = None
- private_ip = Auto Assign
These settings can be used to spin up the VM from the DR copy in case of a failover. You can update these settings based on source VM configuration for optimum failover times.
- While configuring failover settings for VMs added to the DR plan, ensure that the instance type is not smaller than the virtual machine you are trying to failover. You can also use the auto-suggest instance type feature to let Phoenix choose the appropriate instance type.
- Ensure that the Recovery Point Actual (RPA) does not exceed the backup frequency duration. RPA is the time elapsed since the last successful VM snapshot that is available for failover. For more information, see Managing Recovery Point Actual.
DR restore (also referred to as DR copy) is the process where the Phoenix AWS proxy reads the VM backup data from Druva cloud, replicates it to an EBS volume in the customer's AWS account, and creates an EBS snapshot of the EBS volume. The frequency with which the data is replicated is defined in the DR plan.
- Ensure that the retention period for backups of large virtual machines is longer than the time it can take to create the first full DR copy, that is, transfer the VM backup data from Druva cloud to the customer AWS account. The first DR restore can take longer. Subsequent incremental DR restores are faster.
Failover is the process where the DR proxy creates an EC2 instance in the customer’s EC2 account, creates an EBS volume from the EBS snapshot, attaches it to the EC2 instance, and finally spins up the instance after redirecting the network traffic to the IP addresses of the EC2 servers. A Linux VM failover can take between 15 to 30 minutes on average, while a Windows VM failover can take between 45-75 minutes. A failover can complete within the stipulated time provided the E2 instance type that is spawned from the EBS snapshot is the same type and size as the source virtual machine.
Druva recommends using the Test Failover option to periodically test VM failovers. You specify the production and test failover settings while creating the DR plan. As part of Test Failover Settings, you specify the instance type, the IAM role, Volume Type and Instance Tags. You can also use the same failover settings as used in Production.
On the Disaster Recovery page, select the DR Plan. On the Overview Page, click Failover > Test Failover. For more information, see Manage disaster recovery failover.
When you initiate a DR failback, the VMware backup proxy creates a target VM in the on-premise infrastructure. This target VM connects to the failed over EC2 instance and copies the data onto itself. Phoenix then boots up this VM.
- Ensure that the target virtual machine in your on-premise environment to which you will failback has connectivity to the EC2 instance.
- Ensure that the target virtual machine in your on-premise environment used for failback is reachable from the VMware backup proxy.
- Ensure that the following ports are open on the target virtual machine:
- Linux: Port 22 for SSH
- Windows: Ports 445 (Used for preflight checks and control messaging) and 50000 (Used for actual data transfer in failback operation).
Note: You must manually enable the SMB port for communication. See, DR8263.
- Ensure that the administrative shares of the source EC2 instance are reachable before attempting a failback. For more information, see error DR8263 and its resolution.
Billable AWS services
The following AWS services are deployed in your AWS account during the Phoenix AWS proxy deployment and are billable.
- The Amazon EC2 instance type (c5.2xlarge - recommended) used for the Phoenix AWS proxy.
The following AWS VPC endpoints that are configured as part of proxy deployment:
Druva Backup Service Endpoint
Druva Node Service Endpoint