AI/ML

SageMaker Unified Studio — The DataZone Migration Decision Framework for Data Engineers

Your DataZone environment works. AWS says SMUS is the future. Do you upgrade, go greenfield, or wait? Here is the coverage matrix, the 70–85 percent pipeline replacement reality, and the multi-account mesh architecture you actually need.

Alexandre Agius

AWS Solutions Architect

April 20, 2026 7 min read

AWS SageMaker SMUS DataZone Data Mesh Governance Glue S3 Tables Analytics

☁️ Part of: AWS Architecture

Table of Contents

SMUS vs SageMaker AI — The Coverage Matrix
The Pipeline Replacement Reality
The Multi-Account Mesh Architecture
Three primitives worth understanding
Greenfield vs upgrade
The Requester Pays Question
S3 Tables + Lake Formation — The New Lakehouse Primitive
Decision Framework
Key Takeaways

If you’ve been running AWS DataZone for the last eighteen months and built a working data-mesh with it, the SMUS (SageMaker Unified Studio) announcement creates an awkward decision: upgrade, go greenfield, or wait.

AWS messaging says SMUS is the natural successor. In practice, SMUS is a bigger product — it absorbs DataZone’s governance model but adds IDE-style analytics, Visual ETL, GenAI/Bedrock integration, and a revised multi-account architecture. The question isn’t just “should I migrate?” It’s “what does SMUS actually cover vs. classic SageMaker AI, and what percentage of my pipelines survive the move?”

This post is the decision framework I’ve used with data-engineering-heavy teams (80% data engineers, 20% data scientists) who care about the answer.

SMUS vs SageMaker AI — The Coverage Matrix

SMUS is not a replacement for SageMaker AI. It’s a replacement for DataZone and for the fragmented “bring your own IDE” developer experience. Several SageMaker AI capabilities remain outside SMUS.

Capability	100% in SMUS	Shared	SM AI Only
SQL analytics (Athena, Redshift)	✅	—	—
Visual ETL (Glue Studio)	✅	—	—
GenAI / Bedrock integration	✅	—	—
Governance (domain, projects, catalog)	✅	—	—
JupyterLab notebooks	—	✅	—
ML training (classical)	—	✅	—
Distributed training (PyTorch FSDP, DeepSpeed)	—	—	✅
Real-time inference endpoints	—	—	✅
Feature Store	—	—	✅
AutoML (SageMaker Canvas / Autopilot)	—	—	✅
ML Pipelines (SageMaker Pipelines)	—	—	✅

The big takeaway: if your team does training, real-time inference, Feature Store, or AutoML, you stay in SageMaker AI for those workloads. SMUS is the unified governance + analytics + GenAI layer around the top.

The Pipeline Replacement Reality

The single hardest question from data engineers: “Can I migrate all my Glue/Airflow/EMR pipelines into SMUS?”

Honest answer: 70–85% of traditional batch pipelines can migrate to SMUS (Visual ETL + MWAA Serverless). The remaining 15–30% is:

Pattern	SMUS Support	Stays Outside SMUS
Batch SQL analytics	✅ Full	—
Batch Spark ETL	✅ Via Visual ETL / Glue	—
Orchestration (Airflow-style)	✅ MWAA Serverless integration	—
Streaming (Kinesis, Flink)	❌	✅ Stays native
CI/CD for pipelines (CDK, Terraform)	⚠️ Partial	✅ Some custom SDK work needed
Cross-BU non-data orchestration	❌	✅ Step Functions / EventBridge

Budget for the fact that your streaming estate and your pipeline CI/CD stay where they are. SMUS is for the batch analytics and governance plane; it’s not the universal orchestrator.

The Multi-Account Mesh Architecture

This is where SMUS differs significantly from classic DataZone. The target topology:

┌────────────────────────────────────────────────────────┐
│          GOVERNANCE ACCOUNT                           │
│  ┌──────────────────────────────────────────────┐     │
│  │  SMUS Domain                                  │     │
│  │    ├── Blueprints (infrastructure templates) │     │
│  │    ├── Project Profiles (capability sets)    │     │
│  │    └── Data Catalog                           │     │
│  └──────────────────────────────────────────────┘     │
│                        │                              │
│                        │ AWS RAM sharing              │
│                        ▼                              │
└────────────────────────────────────────────────────────┘
         │                          │
         ▼                          ▼
┌──────────────────┐         ┌──────────────────┐
│ PRODUCER ACCOUNT │         │ CONSUMER ACCOUNT │
│  (BU A)          │         │  (BU B)          │
│  - Projects      │  ◄────► │  - Projects      │
│  - Assets        │         │  - Consumes BU A │
│  - Data sources  │         │  - Own compute   │
└──────────────────┘         └──────────────────┘

Three primitives worth understanding

Blueprints — reusable infra templates (IAM roles, S3 buckets, Glue configurations) that SMUS applies when a project provisions. Define once in the governance account, consume from anywhere. This is what replaces the hand-rolled CDK constructs teams wrote for DataZone.

Project Profiles — capability bundles (e.g. “SQL Analytics project” vs. “ML project”). A profile defines which compute is available, which roles are created, which blueprints are applied. Profiles are the right unit to align with your BU capability matrix.

AWS RAM for cross-account sharing — data sources in producer accounts are shared with consumer accounts via AWS RAM. Lake Formation is the enforcement point. The pattern is reversed from classic DataZone: producers explicitly share, consumers explicitly subscribe.

Greenfield vs upgrade

Greenfield is almost always the right call. Create a new SMUS domain, run it in parallel with your existing DataZone, and migrate project by project. The reasons:

SMUS’s project and profile model is richer than DataZone’s — migrating metadata in-place loses fidelity
You get to redo your blueprint strategy with lessons learned
Coexistence is explicitly supported — same accounts, both domains, no conflict

In-place upgrade only makes sense if your DataZone usage is very small (one or two domains, a handful of projects) and you want to avoid running two systems.

The Requester Pays Question

A question that always comes up in a data-mesh: when a consumer BU queries a producer BU’s data, who pays for the compute?

SMUS’s default is that the consumer pays for query compute — they’re running their own Athena/Redshift/Glue job against shared data. The producer pays only for storage.

This works for most inter-BU models but creates a wrinkle: a producer BU whose data is very popular gets storage costs amortized over time, while a consumer BU pays every time they query. Some organizations want the inverse (producer bears more of the cost because they benefit from being “the data BU”).

There’s no native “requester pays” toggle in SMUS today. The workaround is cost-center tagging of S3/compute and a monthly chargeback process — awkward but manageable. If you have complex inter-BU chargeback requirements, model this out before committing to SMUS.

S3 Tables + Lake Formation — The New Lakehouse Primitive

SMUS’s analytics stack integrates natively with S3 Tables (Apache Iceberg tables on S3) and Lake Formation for fine-grained access control. This matters because:

S3 Tables give you ACID transactions, time travel, and schema evolution on your lake data
Lake Formation enforces row-level and column-level security across all SMUS compute engines (Athena, Redshift, Spark)
The combination replaces a lot of custom Delta Lake / Hudi glue you may have built on top of plain S3

If you’re greenfielding a lakehouse in 2026, S3 Tables + Lake Formation + SMUS is the default stack.

Decision Framework

Go greenfield SMUS if:

You have >50 projects in DataZone or plan to scale aggressively
Your data engineers want Visual ETL and integrated notebooks in one place
You need Bedrock/GenAI in your analytics loop
You are building a multi-account data mesh from scratch

Stay on DataZone for now if:

Your DataZone is small and stable (1–2 domains, <20 projects)
You have heavy customization that depends on DataZone-specific APIs
You need the pending SMUS features (CI/CD tooling, advanced streaming integration)

Do not use SMUS for:

Streaming pipelines (keep Kinesis/Flink native)
Real-time inference / Feature Store / distributed training (keep SageMaker AI)
Non-data orchestration (keep Step Functions)

Key Takeaways

SMUS is not a SageMaker AI replacement — it’s a DataZone successor plus a unified analytics + GenAI layer.
70–85% of batch pipelines migrate; streaming and pipeline CI/CD stay outside.
The multi-account mesh architecture (governance account + AWS RAM + blueprints + project profiles) is richer than DataZone’s — greenfield is the way.
Requester-Pays chargeback is not native today; model your inter-BU cost model before committing.
S3 Tables + Lake Formation is the new lakehouse primitive; prefer it for greenfield.

References:

Alexandre Agius

AWS Solutions Architect

Passionate about AI & Security. Building scalable cloud solutions and helping organizations leverage AWS services to innovate faster. Specialized in Generative AI, serverless architectures, and security best practices.

LinkedIn GitHub

Never miss a post

Get notified when I publish new articles about AI, Cloud, and AWS.

No spam, unsubscribe anytime.

Comments

Cloud

SageMaker Unified Studio — The DataZone Migration Decision Framework for Data Engineers

SMUS vs SageMaker AI — The Coverage Matrix

The Pipeline Replacement Reality

The Multi-Account Mesh Architecture

Three primitives worth understanding

Greenfield vs upgrade

The Requester Pays Question

S3 Tables + Lake Formation — The New Lakehouse Primitive

Decision Framework

Key Takeaways

Alexandre Agius

Never miss a post

Comments

Related Posts

AWS Weekly Roundup — February 2026: AgentCore, Bedrock, EC2 and More

Python, Transformers, and SageMaker: A Practical Guide for Cloud Engineers

Fine-Tuning Mistral with Transformers and Serving with vLLM on AWS