HEALTH-TREND-ANALYSIS-USING-GLUE

2025-12-15
Private

Executive Summary 
This customer is a healthcare provider with a chain of hospitals and clinics across different cities. They have migrated patients and other hospital management data into AWS cloud. It significantly accelerated patient diagnosis by unifying historical lab reports and current clinical data. Their initial solution, while effective, relied on AWS S3 for storing all patient records and unmanaged ETL with Redshift. The diagnosis and treatment were reactive based on the patient’s interaction and when the need arises. 

To overcome this and incorporate new, real-time data streams from patient smart wear, the provider architected a modern, serverless data pipeline using AWS IoT Core, Kinesis, and AWS Glue. This evolution resulted in a robust, scalable infrastructure that empowered doctors with a holistic, near real-time view of patient health, leading to faster, more informed diagnoses and improved patient outcomes. 

 

The Challenge 
1. Limitations in scalability: The existing Health care solution, while functional, was limited to providing care to the patient based on ongoing health issues. Though the system was able to improve the diagnosis by reducing the analysis time, the entire process was not proactive. Managing these services requires significant operational overhead related to provisioning, scaling, and maintenance. The ETL jobs could not scale efficiently with growing data volumes and increasing patient intake. 

2. Integrating Real-Time Patient Data: To enhance diagnostic accuracy and proactiveness, the hospital management decided to integrate continuous health data from patient-owned smartwatches and smartwear. 

This introduced complexities like high velocity data ingestion from millions of near time data from different devices and data unification, combining high frequency IoT data with traditional batch-oriented lab reports to create a single source of truth. 

 

The AWS Solution 
Ancrew Global along with hospital management architected a comprehensive, serverless solution on AWS that augmented their existing infrastructure and seamlessly integrated new data streams. The solution is built on primary data pipelines that converge in an Amazon S3-based data lake. 

1. Real-Time Data Ingestion Using AWS IoT Core and Amazon Kinesis: Smartwatches and wearable devices are securely connected to AWS IoT Core using MQTT protocol. IoT Rules forwarded the telemetry streams to Amazon Kinesis Data Firehose. Then Firehose delivered near real-time data into Amazon S3 in a compressed, partitioned Parquet format. 

 
Amazon S3 served as a unified repository for Lab reports in different formats like PDFs, CSV and JSON, Smartwatch telemetry streams and transformed datasets from ETL.  

3. Scalable ETL Using AWS Glue: AWS Glue crawlers automated schema detection for both lab reports and wearable data. ETL Glue jobs are a) Cleaned, normalized, and enriched IoT telemetry, b) Merged historical lab reports with wearable trends for contextual insights, c) Converted all datasets into optimized Parquet formats and wrote curated datasets back to S3 Data Lake 

4. Analytics and Visualization Using Redshift & QuickSight: Curated datasets from S3 were loaded into Amazon Redshift for advanced SQL analytics. Amazon QuickSight dashboards provided near real-time patient health trend visualization, comparison graphs of lab reports vs. smart device data and Predictive insights for high-risk patients. 

Architecture Diagram: 

undefined 

 

The Business Outcome & Results 

1. Automatic schema detection reduced the need for continuous data engineering effort. 

2. Serverless ETL eliminated the burden of maintaining custom scripts or EC2-based data pipelines. 

3. Combining lab and wearable data created a comprehensive “360-degree view” of patient health. 

4. Enhanced analytics enabled physicians to prescribe medication with greater confidence. 

5. AWS Glue and S3-based data lake allowed the ingestion of millions of IoT events without disruption. No manual scaling or server management.  

6. This solution also empowered product and business teams with immediate access to analytics. This subsequently led to more informed and strategic decision making. 
7. The solution resulted in improved data reliability.. 
8. More Data driven products were launched. 
9. Data pipeline automation freed up data engineers from maintenance to focus on innovation. 

Summary 

This scalable serverless solution reduces data processing time from hours to minutes, enables proactive health monitoring, and provides healthcare organizations with population-level insights. The pipeline handles massive data volumes cost-effectively and securely. This finally creates a seamless data journey from wearable devices to actionable health intelligence for both consumers and medical professionals. 

 

 

Share This On

Leave a comment