Skip to main content

Command Palette

Search for a command to run...

🛡️ Achieve High Availability: A Deep Dive into AWS Route 53 DNS Failover

Published
5 min read
🛡️ Achieve High Availability: A Deep Dive into AWS Route 53 DNS Failover
B

👋 Hi there! I'm Balaji S, a passionate technologist with a focus on AWS, Linux, DevOps, and Kubernetes.

💼 As an experienced DevOps engineer, I specialize in designing, implementing, and optimizing cloud infrastructure on AWS. I have a deep understanding of various AWS services like EC2, S3, RDS, Lambda, and more, and I leverage my expertise to architect scalable and secure solutions.

🐧 With a strong background in Linux systems administration, I'm well-versed in managing and troubleshooting Linux-based environments. I enjoy working with open-source technologies and have a knack for maximizing performance and stability in Linux systems.

⚙️ DevOps is my passion, and I thrive in bridging the gap between development and operations teams. I automate processes, streamline CI/CD pipelines, and implement robust monitoring and logging solutions to ensure continuous delivery and high availability of applications.

☸️ Kubernetes is a key part of my toolkit, and I have hands-on experience in deploying and managing containerized applications in Kubernetes clusters. I'm skilled in creating Helm charts, optimizing resource utilization, and implementing effective scaling strategies for microservices architectures.

📝 On Hashnode, I share my insights, best practices, and tutorials on topics related to AWS, Linux, DevOps, and Kubernetes. Join me on my journey as we explore the latest trends and advancements in cloud-native technologies.

✨ Let's connect and dive into the world of AWS, Linux, DevOps, and Kubernetes together!


​In today's digital landscape, uptime is non-negotiable. A sudden outage can lead to lost revenue, frustrated customers, and damage to your brand reputation. This is where AWS Route 53 DNS Failover becomes your critical defense mechanism, ensuring your users are always routed to a healthy endpoint.

​This blog post will walk you through the concept of Route 53 DNS failover, how it works, and a detailed step-by-step guide to setting up a robust Active-Passive failover configuration.

What is Route 53 DNS Failover?

​AWS Route 53 DNS Failover is a feature that automatically redirects your traffic from an unhealthy primary resource to a healthy secondary (backup) resource using DNS resolution. It relies on Route 53 Health Checks to continuously monitor the health and availability of your endpoints.

Active-Passive vs. Active-Active

​The most common implementation is Active-Passive Failover, which we'll focus on:

Active-Passive: You designate one endpoint as Primary and one as Secondary. Route 53 directs all traffic to the Primary as long as its health check passes. If the Primary fails its health check, Route 53 automatically switches to the Secondary.

Active-Active: Both resources are considered active. Route 53 routes traffic to all healthy resources based on other routing policies (like Weighted or Latency). If one resource becomes unhealthy, traffic is simply routed to the remaining healthy resources.

Prerequisites for Setup

​Before you begin configuring failover, you need:

​A Public Hosted Zone in Route 53 for your domain (e.g., example.com).

​Two endpoints—a Primary and a Secondary—in different regions or Availability Zones (e.g., two Application Load Balancers (ALBs) or two EC2 instances with public IPs).

​⚙️ Step-by-Step Configuration Guide

​The process involves two main stages: creating health checks and creating the failover record set.

Step 1: Create Route 53 Health Checks

  • ​The health check is the 'brains' of the operation. It periodically monitors your primary endpoint.

  • Navigate to the Route 53 Console and click on Health checks.

  • ​Click Create health check.

​Name: Give it a clear name, e.g., Primary-ALB-HealthCheck.

​Endpoint: Choose Endpoint and provide the DNS name or IP address of your Primary resource (e.g., your Primary ALB's DNS name).

Protocol: Choose the appropriate protocol (HTTP, HTTPS, or TCP). For a web application behind an ALB, HTTP/HTTPS is typical.

​ - Advanced configuration: Keep the defaults (30-second interval, 3 failure threshold) or adjust them based on your failover speed requirements.

​ - Create health check.

Note: You only need a health check for the Primary endpoint in an Active-Passive setup, as the Secondary resource is assumed healthy or checked via a different, less frequent mechanism.

Step 2: Create the Failover Record Set

  • ​Now, you will create two records (a Primary and a Secondary) under the same domain name.

  • Navigate to Hosted zones and select your domain's public hosted zone.

  • Click Create record.

  • For the domain you want to configure (e.g., www.example.com or leave blank for the apex domain):

​Record Name: Enter the subdomain (e.g., www) or leave blank (@).

Record Type: Choose A (for IPv4) or AAAA (for IPv6).

Alias: Select Yes.

Route traffic to: Select the Primary resource (e.g., your Primary ALB).

Routing Policy: Choose Failover.

​Primary Record Configuration:

  • Failover record type: Select Primary.

  • Set ID: Enter a unique identifier (e.g., Primary-Endpoint-Record).

  • Evaluate Target Health: Select Yes.

  • Health check: Select the health check you created in Step 1 (e.g., Primary-ALB-HealthCheck).

  • Click Create records.

​Repeat the process to create the Secondary Record (add the second record to the batch if available in the UI, or create a new record set):

Record Name: Use the EXACT SAME name as the Primary record.

​Record Type, Alias, Route traffic to, Routing Policy: Same as above, but point the Route traffic to the Secondary resource.

Secondary Record Configuration:

​Failover record type: Select Secondary.

​Set ID: Enter a unique identifier (e.g., Secondary-Endpoint-Record).

Evaluate Target Health: Select No (unless you want to add a health check to the secondary as well).

Health check: Leave this blank or select No for Associate with Health Check. The secondary acts as the last resort.

  • ​Click Create records.

​How Route 53 Manages Failover

​Once configured, the flow is simple yet powerful:

Steady State: Route 53's nameservers receive a DNS query for your domain. It checks the health of the Primary record using the associated health check. If the Primary is Healthy, Route 53 returns the Primary resource's IP address.

Failure Event: The health check determines that the Primary endpoint has failed (e.g., the web server is down or returns an error code).

Failover: Route 53's nameservers detect the Primary is Unhealthy and automatically begin returning the Secondary resource's IP address for all incoming DNS queries.

​Failback (Automatic): When the Primary endpoint recovers and starts passing its health check again, Route 53 will automatically update the DNS response to point back to the Primary resource, returning to the steady state.

​🔑 Key Takeaways

​TTL Matters: Use a low TTL (Time-To-Live) on your record set (e.g., 60 seconds) to ensure DNS resolvers quickly pick up the change during a failover event.

Health Checks are Crucial: The accuracy and configuration of your Route 53 Health Check directly determine the reliability and speed of your failover.

Test Your Failover: Always test your failover configuration by intentionally stopping or blocking access to your Primary endpoint to confirm traffic successfully shifts to the Secondary.

​Implementing Route 53 DNS Failover is an essential step toward achieving a truly highly available and fault-tolerant architecture on AWS.