· This case study describes how a Bengaluru-based 3D and augmented reality (AR) technology company modernized its data analytics platform to support ML-integrated 3D scanning, rendering, and AR visualization across multiple industries.
· The customer enables enterprises to digitally scan physical environments and objects and deliver immersive AR experiences for interior design, furniture and appliance visualization, and automobile use cases such as used-car inspection and virtual showrooms.
· Each scan produces extremely large volumes of semi-structured and deeply nested data, including spatial coordinates, meshes, depth maps, object metadata, and ML-derived attributes, stored as BSON documents in MongoDB Atlas.
· The customer’s initial analytics approach relied on infrequent bulk processing and monolithic ETL jobs, leading to high processing costs, long data availability delays, and limited scalability as data volumes and schema complexity increased.
· As an AWS Partner, we designed and implemented a serverless, event-driven analytics architecture using AWS native services, introducing incremental CDC ingestion, multi-stage AWS Glue transformations, centralized orchestration, and a governed analytics layer in Amazon Redshift.
· The redesigned platform reduced data availability latency from hours to minutes, lowered analytics infrastructure costs by approximately 30–40%, and enabled analytics and ML teams to consistently access analytics-ready datasets for dashboards, feature engineering, and model training without increasing operational overhead.
· Name: AR 3D Industry Leader
· Sector: 3D Visualization, Augmented Reality (AR), and AI-driven Spatial Computing
· Business Overview:
The customer is a Bengaluru-based technology company specializing in ML-integrated 3D scanning, digital modelling, and AR visualization solutions. Their platform enables enterprises to digitize physical environments and objects and deliver interactive AR experiences at scale.
· Industry Use Cases:
The customer serves businesses across interior design, furniture and appliance retail, and the automotive sector. Their solutions support use cases such as virtual interior planning, product placement visualization, used-car inspection, and immersive digital showrooms.
· Data Footprint:
The platform generates and processes extremely large volumes of semi-structured scan data, including spatial coordinates, meshes, depth maps, and ML-derived attributes. This data is primarily stored as BSON documents in a NoSQL datastore and continuously grows with each scan and customer engagement.
· Massive data volume from 3D scanning workflows: Each 3D scan generated large BSON documents containing spatial coordinates, meshes, depth maps, point clouds, object metadata, and ML-derived attributes. At scale, this resulted in tens to hundreds of gigabytes per ingestion cycle, with rapid week-over-week growth.
· Highly nested and variable data formats: The scan data followed no fixed schema. Nested objects and arrays varied by object type, scan session, and ML model version, making downstream analytics and schema evolution difficult to manage.
· Distributed data sources across environments: Core scan data was stored in MongoDB Atlas outside AWS, while operational and reference data resided in relational databases on AWS. These environments were not analytically integrated, creating data silos.
· Batch-oriented ingestion strategy: The customer initially relied on infrequent bulk data dumps from MongoDB to object storage, followed by large batch ETL executions. This approach introduced significant delays between data generation and availability for analytics.
· Monolithic ETL design: A single, large AWS Glue job was responsible for data cleaning, flattening, enrichment, and loading. This design increased failure blast radius, complicated debugging, and made incremental improvements difficult.
· Limited orchestration and error handling: ETL jobs were triggered in a linear manner with minimal orchestration logic. Failures required manual intervention, and there was no automated retry, alerting, or incident creation mechanism.
· High compute cost and inefficiency: The bulk-processing approach required large Glue worker configurations to handle peak loads, even when only small portions of data had changed. This led to underutilized compute resources and elevated processing costs.
· Lack of incremental CDC processing: Without a proper change data capture mechanism, unchanged records were repeatedly reprocessed, increasing execution time and cost while providing no additional analytical value.
· Data skew and performance instability: Certain object types and scan attributes dominated data volume, causing uneven data distribution during Spark processing. This led to long-running stages, memory pressure, and occasional job failures.
· Insufficient governance and metadata management: While schemas were registered in the Glue Data Catalog, there was no clear strategy for metadata versioning, recovery, or cross-region availability, increasing operational risk during failures.
· Limited support for analytics and ML use cases: Due to delayed data availability and inconsistent transformations, analytics teams struggled to build reliable dashboards, and ML teams could not consistently access feature-ready datasets for model training.
The customer selected AWS as the cloud platform for their analytics modernization because most of their application infrastructure and operational databases were already hosted on AWS, enabling seamless integration without introducing additional platforms or operational complexity. AWS provided a mature set of managed and serverless services for data ingestion, transformation, orchestration, and analytics, allowing the customer to scale processing dynamically based on scan volume while avoiding always-on infrastructure. Native services such as event-driven orchestration, managed Spark for complex transformations, and a fully managed analytics warehouse enabled the customer to implement a secure, resilient, and cost-optimized data platform that could be operated efficiently by a small engineering team.
The customer chose to engage the partner Ancrew global services based on an existing relationship in which the partner was already managing the customer’s application infrastructure and DevOps operations on AWS. Through ongoing collaboration, the partner gained deep visibility into the customer’s data flows, operational constraints, and growth challenges, which led to early identification of limitations in the existing analytics approach. This established trust and hands-on understanding of the environment positioned the partner to effectively assess the problem, conduct data discovery, and design a scalable analytics solution aligned with the customer’s technical and business goals.

· Discovery-driven solution design:
The engagement started with a structured data discovery and assessment phase. Representative BSON samples were analysed to understand data volume, nesting depth, schema variability, array cardinality, and skew across object types. Existing analytics and ML query patterns were also reviewed to identify access paths and feature requirements, which directly influenced partitioning strategy and ETL design.
· Event-driven CDC ingestion architecture:
To address latency and cost challenges, the partner redesigned ingestion around an event-driven CDC model. Change events from the source datastore were routed through Amazon EventBridge via the partner event bus for MongoDB, triggering lightweight AWS Lambda functions responsible only for coordination and metadata capture, keeping ingestion fast and stateless.
· Error handling, retries, and incident management:
Each Step Functions state was configured with controlled retry policies and backoff thresholds. When retry limits were exceeded, failure events were captured in Amazon CloudWatch Logs. CloudWatch Log subscription filters triggered a Lambda-based error handler that enriched failure context with job metadata and relevant log excerpts. The handler published structured metrics to Datadog for monitoring, created incidents in ServiceNow, and sent notification emails with copied logs and execution details, enabling rapid root-cause analysis and reduced mean time to resolution for data engineering teams.
· Operational state management with DynamoDB:
DynamoDB was introduced as a centralized control plane to store CDC offsets, ingestion checkpoints, job execution metadata, and error states. This enabled reliable incremental processing, simplified restart logic, and provided a durable operational audit trail without adding infrastructure overhead.
· Layered AWS Glue transformation strategy:
Instead of a monolithic ETL job, the solution implemented a multi-layer, multi-stage transformation model using AWS Glue. Each layer was designed based on source characteristics and downstream consumption patterns, with consistent scaling and sharding practices applied across all stages.
o Layer 1: MongoDB (3D scan data) – Cleaning and Core ETL
This layer focused on processing high-volume, deeply nested BSON documents generated from 3D scans.
§ Stage M1 – Cleaning and normalization: Schema validation, removal of corrupt or incomplete records, deduplication using scan and object identifiers, and delta detection to isolate newly created or updated documents.
§ Stage M2 – Transformation and conversion: Controlled flattening of nested structures, conversion of JSON/BSON into columnar Parquet format, and preparation of analytics-ready datasets for downstream loading into Amazon Redshift.
o Layer 2: Relational databases (RDS) – BI and dashboard preparation
This layer addressed structured operational data used for reporting and BI use cases.
§ Stage R1 – PII discovery and redaction: Sensitive attributes were identified using Amazon Macie discovery reports. Identified PII fields were redacted or tokenized through Glue transformations before further processing or joins.
§ Stage R2 – BI-focused transformation: Cleaned relational data was transformed, standardized, and enriched to align with curated scan datasets, producing analytics-ready tables optimized for dashboards and reporting workloads.
o Common scaling and partitioning practices:
Across all stages, data was partitioned early based on discovery findings such as scan date, object type, and source system. Logical sharding of high-volume entities ensured balanced execution across Glue workers, enabling horizontal scalability while minimizing shuffle overhead and memory pressure.
· Scaling, sharding, and skew remediation: Uneven data distribution and skewed keys were mitigated through controlled repartitioning and sharding of high-volume object types across Glue workers. Worker types and counts were tuned per job stage to balance performance and cost while avoiding memory pressure and out-of-memory failures during large shuffle operations.
· Centralized orchestration using AWS Step Functions:
AWS Step Functions orchestrated the end-to-end pipeline, coordinating Lambda and Glue jobs with explicit state management, retries, and failure handling. Parallel branches enabled independent processing of different object types, reducing execution time and isolating failures.
· Architecture overview:
The overall solution integrates event-driven ingestion, multi-stage transformation, and governed analytics within a secure AWS environment.

· Glue transformation and orchestration detail:
The transformation layer consists of independently scalable Glue jobs coordinated through Step Functions, enabling fine-grained control over execution order, parallelism, and retries.

· Governance, security, and analytics delivery:
Transformed datasets were written in optimized, partitioned formats and registered in the AWS Glue Data Catalogue governed by AWS Lake Formation. Final datasets were loaded into Amazon Redshift, enabling consistent, secure, and analytics-ready access for dashboards and ML feature extraction.
· Data availability latency reduced from 6–12 hours (weekly batch) to 5–20 minutes using event-driven CDC and incremental processing.
· ETL execution time reduced by 60–70%, enabled by parallel, multi-stage AWS Glue pipelines instead of a single monolithic job.
· AWS Glue compute usage optimized, reducing DPU-hours by ~35%, by isolating heavy transformations to targeted stages (M2, R2) instead of always running large jobs.
· Reprocessing volume reduced by >80%, as only changed or newly ingested records are processed through CDC rather than full dataset reloads.
· Pipeline failure recovery time (MTTR) reduced from hours to <30 minutes, through Step Functions retries and automated incident creation.
· Operational visibility improved, with 100% ETL job failures automatically detected and alerted via CloudWatch → Lambda → Datadog → ServiceNow.
· Improved data reliability and consistency, with partitioned, validated datasets reducing downstream query failures and manual data corrections.
· Secure analytics enablement, with 100% identified PII fields masked or tokenized prior to BI and ML consumption using Macie-driven discovery.