Scalable Cloud Infrastructure modernization using amazon eks,ec2 databases and aws devops

2025-12-09
Cloud Modernisation

Case Study: Scalable Cloud Infrastructure Modernization Using Amazon EKS, EC2 Databases, and AWS DevOps 

Customer Background 

The customer is a fast-growing technology solutions provider offering CRM-based digital services, data-driven applications, and workflow automation for multiple industries. Their platform delivers critical operational capabilities such as user management, transactional processing, communication logs, and real-time service tracking. 

With expanding customer demand, the organization required a modern, scalable, and secure cloud environment capable of supporting large-scale workloads and high user concurrency. They aimed to move towards containerized microservices, improve application reliability, and establish a strong DevOps foundation to support continuous delivery. 

Customer Challenge 

The existing on-premise/legacy hosting setup struggled with scalability, manual deployments, and unreliable database performance. The CRM application required backend services such as authentication, caching, messaging, and business logic to be deployed reliably with secure internal connectivity. 

Additionally, the customer needed secure access to internal databases through a VPN, automated backups, and centralized monitoring for production stability. 
 
The legacy infrastructure lacked scalability, struggled under peak load, and required manual deployment processes. Internal services were tightly coupled, database performance suffered, and there was no centralized monitoring. Additionally, secure access to private database servers was missing, making administration difficult. 

The customer sought a modern cloud architecture that was containerized, scalable, secure, and fully observable. 

A major incident occurred during their production workload, where the entire environment went down—including all nodes and pods within the Kubernetes cluster. After investigation, it was found that incorrect IAM roles were attached to multiple resources, causing widespread permission failures and service crashes. Immediate remediation was required to restore stability. 

The customer needed a robust, fault-tolerant AWS environment with better security isolation, autoscaling capabilities, and end-to-end monitoring. 

Assessment 

Ancrew Global conducted a comprehensive assessment of their CRM workload, database architecture, internal networking, high availability requirements, and CI/CD processes. 

After evaluating multiple approaches, the team identified Amazon EKS, EC2-based database clusters, and AWS DevOps tools as the ideal architecture. It offered scalability, flexibility, secure internal routing, and automation aligned with the customer’s operational goals. 

A fully automated infrastructure-as-code approach using Terraform, combined with AWS native services, was selected to simplify long-term management and accelerate deployments. 

Business Objectives 

  • Build a scalable, secure, and highly available AWS environment for CRM workloads. 
  • Containerize workloads and deploy them on Amazon EKS with internal-only service exposure. 
  • Implement secure EC2-based MongoDB and MySQL clusters with replication and backups. 
  • Enable remote administrators to access internal services securely using VPN. 
  • Establish CI/CD pipelines for automated builds and deployments. 
  • Implement strong observability using CloudWatch, Grafana, and Prometheus. 
  • Ensure strict IAM governance and security compliance to prevent outages. 

Proposed Solution 

Ancrew Global designed and deployed a fully automated, production-ready cloud environment using Amazon EKS, EC2, VPC networking, and AWS DevOps services. 

Key Features Delivered 

  • Fully configured VPC with public and private subnets across multiple AZs. 
  • MongoDB and MySQL replication clusters deployed on EC2 private subnets. 
  • Pritunl VPN deployed for secure internal access to DBs and services. 
  • Automated encrypted backups for MongoDB and MySQL stored in Amazon S3 with lifecycle policies. 
  • Dynamic autoscaling using Karpenter for efficient node provisioning. 
  • Microservices deployment (Auth, Cache, Core, Ledger, Assignment, Email) using Kubernetes manifest files. 
  • Application Load Balancer with path-based routing across services. 
  • Internal ALB for Email Service and Lambda integration. 
  • Monitoring stack using CloudWatch, Prometheus, and Grafana. 
  • AWS WAF for enhanced application security. 
  • CI/CD pipeline for the Email Service using CodePipeline & CodeBuild. 
  • Full Terraform automation for EKS and infrastructure setup. 

 Production Escalation & Resolution (Major Incident) 

During one critical production cycle, the entire environment went down unexpectedly—every EKS node, pod, and application service stopped functioning. All customer-facing APIs and internal workflows were impacted. 

The Ancrew Global team immediately initiated an emergency investigation, checking cluster logs, node health, pod restarts, networking, and load balancer behavior. After a step-by-step deep-dive analysis, the root cause was identified: 

Root Cause: Misconfigured IAM Roles 

Multiple AWS resources—including EKS, nodes, Lambda functions, and services—were assigned incorrect or conflicting IAM roles, causing permission failures that cascaded across the entire environment. 

Resolution 

  • All IAM roles and policies were audited and corrected. 
  • Proper least-privilege access was reconfigured for each resource. 
  • New IAM governance and approval workflows were implemented. 
  • Additional monitoring alerts were added for IAM permission failures. 

The environment was restored successfully, with improved security and stability. 

Architecture Components Used 

AWS Services 

  • Amazon EKS 
  • Amazon EC2 
  • Amazon S3 
  • Application Load Balancer (ALB) 
  • Amazon DynamoDB (internal metadata) 
  • AWS IAM 
  • VPC, Subnets, NAT, IGW 
  • CloudWatch Logs & Metrics 
  • AWS WAF 
  • Amazon API Gateway (Email Service integration) 
  • AWS CodePipeline & CodeBuild 
  • Amazon Lambda (Email integration) 

Design Factors 

Scalability 

Dynamic workload scaling using Karpenter + Kubernetes HPA. 

Security 

Private subnets, VPN-only DB access, WAF, IAM least privilege. 

High Availability 

Multi-AZ deployment, replicated databases, resilient node provisioning. 

Cost Efficiency 

Autoscaling, S3 lifecycle, serverless email processing. 

Automation 

Terraform, CI/CD pipelines, auto backups, auto node provisioning. 

Outcomes & Metrics 

After implementation, the customer achieved: 

  •  Improved infrastructure stability with proper IAM governance. 
  •  Highly scalable EKS environment handling peak workloads seamlessly. 
  •  Secure internal DB access with VPN-based control. 
  •  End-to-end visibility through CloudWatch, Prometheus, and Grafana. 
  •  Faster deployments through CI/CD automation. 
  •  Stronger database resilience through replication and automated backups. 
  •  Zero-downtime recovery from the major IAM-based production issue. 

 

Conclusion 

By leveraging Amazon EKS, EC2, AWS networking, and DevOps automation, Ancrew Global delivered a robust, secure, and production-grade cloud platform for the customer’s CRM application. The environment now supports rapid scaling, improved resilience, and continuous deployments with high operational visibility. 

The resolution of the major production outage further reinforced strong IAM governance, preventive monitoring, and infrastructure reliability. This engagement demonstrates how well-architected cloud environments can transform traditional workloads into scalable, secure, and future-ready platforms. 

Share This On

Leave a comment