SQS is unreachable phase-1 has failed
This article applies to:
- Product edition: Phoenix
Problem description
DR failover fails with DR4098 error code, with no SQS message in phase-1.
SQS message is found in Phoenix-SQS-xxx.log
Cause
Phoenix Failover settings like subnet, VPC, or IP address is incorrect.
Traceback
[2019-04-29 17:38:08,515] [ERROR] Max attempts reached while waiting for SQS events, conversion_id = 281_2029_553602. Exiting
[2019-04-29 17:38:08,516] [INFO] Updated orch_status_info = {'dr_failover_state': 3,
'sqs_queue_name': 'phoenix_281_2029',
'sqs_url': 'https://queue.amazonaws.com/48604719...oenix_281_2029',
'status': 'started',
'steps': [{553602: {'ami_progress': 0,
'error_code': 4295692290,
'error_msg': 'AWS SQS is unreachable from failover instance',
'instance_id': 'i-0b21073a51f69b5dc',
'phase1_progress': 100,
'phase2_progress': 0,
'post_phase2_progress': 0,
'pre_phase1_progress': 100,
'private_ip': '172.24.21.56',
'restore_job_id': 1117,
'rm_conversion_id': '281_2029_553602',
'rp_name': u'Sun Mar 24 06:39:52 2019',
'status': 'failed',
'version': 'ebs',
'vm_failover_state': 99,
'vm_name': u'XYZ',
'volumes_info': [{'device': '/dev/sda1',
'volume_id': 'vol-0c17a30495a0dace1'},
{'device': '/dev/sdf',
'volume_id': 'vol-028e164ca85b6a115'},
{'device': '/dev/sdg',
'volume_id': 'vol-0e480d86ad9129909'}]}}]}
Resolution
If VPC Endpoint for SQS is not configured:
- Note the subnet, security-group, public-ip settings chosen for the failover instance.
- In the customer’s AWS account, go to the VPC service.
- Under the Subnets section, enter the subnet-id.
- Under the Route Table tab, check the Target value corresponding to the Destination value 0.0.0.0.
If the target is igw-xxxx, then the subnet is a public subnet. For public subnets, the Public-IP settings must be Auto-Assign or <an_elastic_ip>.
If all the above findings are correct, then the issue might be related to RM conversion.
If VPC endpoint for SQS is configured:
- Note the subnet, security-group settings chosen for the failover instance.
- Check if the chosen subnet is present in the subnets chosen for SQS Endpoint.