Skip to content
AI/ML

SageMaker Unified Studio — The DataZone Migration Decision Framework for Data Engineers

Your DataZone environment works. AWS says SMUS is the future. Do you upgrade, go greenfield, or wait? Here is the coverage matrix, the 70–85 percent pipeline replacement reality, and the multi-account mesh architecture you actually need.

Alexandre Agius

Alexandre Agius

AWS Solutions Architect

7 min read
Share:

If you’ve been running AWS DataZone for the last eighteen months and built a working data-mesh with it, the SMUS (SageMaker Unified Studio) announcement creates an awkward decision: upgrade, go greenfield, or wait.

AWS messaging says SMUS is the natural successor. In practice, SMUS is a bigger product — it absorbs DataZone’s governance model but adds IDE-style analytics, Visual ETL, GenAI/Bedrock integration, and a revised multi-account architecture. The question isn’t just “should I migrate?” It’s “what does SMUS actually cover vs. classic SageMaker AI, and what percentage of my pipelines survive the move?”

This post is the decision framework I’ve used with data-engineering-heavy teams (80% data engineers, 20% data scientists) who care about the answer.

SMUS vs SageMaker AI — The Coverage Matrix

SMUS is not a replacement for SageMaker AI. It’s a replacement for DataZone and for the fragmented “bring your own IDE” developer experience. Several SageMaker AI capabilities remain outside SMUS.

Capability100% in SMUSSharedSM AI Only
SQL analytics (Athena, Redshift)
Visual ETL (Glue Studio)
GenAI / Bedrock integration
Governance (domain, projects, catalog)
JupyterLab notebooks
ML training (classical)
Distributed training (PyTorch FSDP, DeepSpeed)
Real-time inference endpoints
Feature Store
AutoML (SageMaker Canvas / Autopilot)
ML Pipelines (SageMaker Pipelines)

The big takeaway: if your team does training, real-time inference, Feature Store, or AutoML, you stay in SageMaker AI for those workloads. SMUS is the unified governance + analytics + GenAI layer around the top.

The Pipeline Replacement Reality

The single hardest question from data engineers: “Can I migrate all my Glue/Airflow/EMR pipelines into SMUS?”

Honest answer: 70–85% of traditional batch pipelines can migrate to SMUS (Visual ETL + MWAA Serverless). The remaining 15–30% is:

PatternSMUS SupportStays Outside SMUS
Batch SQL analytics✅ Full
Batch Spark ETL✅ Via Visual ETL / Glue
Orchestration (Airflow-style)✅ MWAA Serverless integration
Streaming (Kinesis, Flink)✅ Stays native
CI/CD for pipelines (CDK, Terraform)⚠️ Partial✅ Some custom SDK work needed
Cross-BU non-data orchestration✅ Step Functions / EventBridge

Budget for the fact that your streaming estate and your pipeline CI/CD stay where they are. SMUS is for the batch analytics and governance plane; it’s not the universal orchestrator.

The Multi-Account Mesh Architecture

This is where SMUS differs significantly from classic DataZone. The target topology:

┌────────────────────────────────────────────────────────┐
│          GOVERNANCE ACCOUNT                           │
│  ┌──────────────────────────────────────────────┐     │
│  │  SMUS Domain                                  │     │
│  │    ├── Blueprints (infrastructure templates) │     │
│  │    ├── Project Profiles (capability sets)    │     │
│  │    └── Data Catalog                           │     │
│  └──────────────────────────────────────────────┘     │
│                        │                              │
│                        │ AWS RAM sharing              │
│                        ▼                              │
└────────────────────────────────────────────────────────┘
         │                          │
         ▼                          ▼
┌──────────────────┐         ┌──────────────────┐
│ PRODUCER ACCOUNT │         │ CONSUMER ACCOUNT │
│  (BU A)          │         │  (BU B)          │
│  - Projects      │  ◄────► │  - Projects      │
│  - Assets        │         │  - Consumes BU A │
│  - Data sources  │         │  - Own compute   │
└──────────────────┘         └──────────────────┘

Three primitives worth understanding

Blueprints — reusable infra templates (IAM roles, S3 buckets, Glue configurations) that SMUS applies when a project provisions. Define once in the governance account, consume from anywhere. This is what replaces the hand-rolled CDK constructs teams wrote for DataZone.

Project Profiles — capability bundles (e.g. “SQL Analytics project” vs. “ML project”). A profile defines which compute is available, which roles are created, which blueprints are applied. Profiles are the right unit to align with your BU capability matrix.

AWS RAM for cross-account sharing — data sources in producer accounts are shared with consumer accounts via AWS RAM. Lake Formation is the enforcement point. The pattern is reversed from classic DataZone: producers explicitly share, consumers explicitly subscribe.

Greenfield vs upgrade

Greenfield is almost always the right call. Create a new SMUS domain, run it in parallel with your existing DataZone, and migrate project by project. The reasons:

  • SMUS’s project and profile model is richer than DataZone’s — migrating metadata in-place loses fidelity
  • You get to redo your blueprint strategy with lessons learned
  • Coexistence is explicitly supported — same accounts, both domains, no conflict

In-place upgrade only makes sense if your DataZone usage is very small (one or two domains, a handful of projects) and you want to avoid running two systems.

The Requester Pays Question

A question that always comes up in a data-mesh: when a consumer BU queries a producer BU’s data, who pays for the compute?

SMUS’s default is that the consumer pays for query compute — they’re running their own Athena/Redshift/Glue job against shared data. The producer pays only for storage.

This works for most inter-BU models but creates a wrinkle: a producer BU whose data is very popular gets storage costs amortized over time, while a consumer BU pays every time they query. Some organizations want the inverse (producer bears more of the cost because they benefit from being “the data BU”).

There’s no native “requester pays” toggle in SMUS today. The workaround is cost-center tagging of S3/compute and a monthly chargeback process — awkward but manageable. If you have complex inter-BU chargeback requirements, model this out before committing to SMUS.

S3 Tables + Lake Formation — The New Lakehouse Primitive

SMUS’s analytics stack integrates natively with S3 Tables (Apache Iceberg tables on S3) and Lake Formation for fine-grained access control. This matters because:

  • S3 Tables give you ACID transactions, time travel, and schema evolution on your lake data
  • Lake Formation enforces row-level and column-level security across all SMUS compute engines (Athena, Redshift, Spark)
  • The combination replaces a lot of custom Delta Lake / Hudi glue you may have built on top of plain S3

If you’re greenfielding a lakehouse in 2026, S3 Tables + Lake Formation + SMUS is the default stack.

Decision Framework

Go greenfield SMUS if:

  • You have >50 projects in DataZone or plan to scale aggressively
  • Your data engineers want Visual ETL and integrated notebooks in one place
  • You need Bedrock/GenAI in your analytics loop
  • You are building a multi-account data mesh from scratch

Stay on DataZone for now if:

  • Your DataZone is small and stable (1–2 domains, <20 projects)
  • You have heavy customization that depends on DataZone-specific APIs
  • You need the pending SMUS features (CI/CD tooling, advanced streaming integration)

Do not use SMUS for:

  • Streaming pipelines (keep Kinesis/Flink native)
  • Real-time inference / Feature Store / distributed training (keep SageMaker AI)
  • Non-data orchestration (keep Step Functions)

Key Takeaways

  • SMUS is not a SageMaker AI replacement — it’s a DataZone successor plus a unified analytics + GenAI layer.
  • 70–85% of batch pipelines migrate; streaming and pipeline CI/CD stay outside.
  • The multi-account mesh architecture (governance account + AWS RAM + blueprints + project profiles) is richer than DataZone’s — greenfield is the way.
  • Requester-Pays chargeback is not native today; model your inter-BU cost model before committing.
  • S3 Tables + Lake Formation is the new lakehouse primitive; prefer it for greenfield.

References:

Alexandre Agius

Alexandre Agius

AWS Solutions Architect

Passionate about AI & Security. Building scalable cloud solutions and helping organizations leverage AWS services to innovate faster. Specialized in Generative AI, serverless architectures, and security best practices.

Never miss a post

Get notified when I publish new articles about AI, Cloud, and AWS.

No spam, unsubscribe anytime.

Comments

Sign in to leave a comment

Related Posts