Modern Data Architecture Accelerator (MDAA)

MDAA Overview

The Modern Data Architecture Accelerator (MDAA) is designed to accelerate the implementation of a secure, compliant and fully capable Modern Data Architecture on AWS, allowing organizations of all sizes and sophistication to quickly focus on driving business outcomes from their data while maintaining high assurance of security compliance. Specifically, organizations are enabled to rapidly solve data-driven problems using both traditional analytics, as well as using contemporary capabilities such as generative AI.

MDAA provides rapid deployment of all major elements of a Modern Data Architecture, such as Ingest, Persistence, Governance, DataOps, Consumption, Visual Analytics, Data Science, and AI/ML. Additionally, MDAA has been designed to accelerate compliance with AWS Solutions, NIST 800-53 Rev5 (US), HIPAA, PCI-DSS CDK Nag Rulesets, as well as ITSG-33 (Canada) security control requirements. Terraform modules are compliant with standard Checkov security policies. This combination of integral compliance and broad, configuration-driven capability allows for rapid design and deployment of simple to complex data analytics environments--including Lake House and Data Mesh architectures--while minimizing security compliance risks.

Target Usage

Any organization looking to rapidly deploy a secure Modern Data Architecture in support of data-driven business/mission requirements, such as Analytics, Business Intelligence, AI/ML, and Generative AI
Large organizations looking to design and deploy complex Modern Data Architectures such as Lake House or Data Mesh.
Small to Medium organizations looking for code-free, configuration-driven deployment of a Data Analytics platform.
Builder organizations who are building custom, code-driven data analytics architectures through use of reusable compliant constructs across multiple languages.
Any organization with elevated compliance/regulatory requirements.

Getting Started

Getting started with MDAA requires the following steps:

Architecture and Design - A physical platform architecture should be defined either from scratch, or derived from an AWS/MDAA reference design.
Configuration - One or more MDAA configuration files are authored, along with individual configuration files for each MDAA module.
(Optional) Customization - Optionally, resources and stacks can be customized through code-based escape hatches before deployment.
Predeployment Preparation - In this step, the MDAA NPM packages are built and published to a private NPM repo.
Deployment - Each MDAA configuration file is either manually or automatically deployed (via CD/CD).

Sample Architectures

Alternatively, you can jump directly into a set of sample architectures and configurations. Note that these sample configurations can be used as a starting point for much more sophisticated architectures.

Basic DataLake with Glue - A basic S3 Data Lake with Glue database and crawler
Basic Terraform DataLake - A basic S3 Data Lake built with the MDAA Terraform module
Fine-grained Access Control DataLake - An S3 Data Lake with fine-grained access control using LakeFormation
Data Warehouse - A standalone Redshift Data Warehouse
Lakehouse - A full LakeHouse implementation, with Data Lake, Data Ops Layers (using NYC taxi data), and a Redshift data warehouse
AI Development Platform - A standalone SageMaker AI Studio Data Science Platform
GenAI Platform - A standalone GAIA GenAI Platform

Sample DataOps Blueprints

Additionally, once your Modern Data Architecture is deployed, you can use these sample Data Operations blueprints, including MDAA configs and DataOps code, to start solving your data-driven problems.

Basic Crawler - A basic crawler blueprint
Event-Driven CSV to Parquet Lambda - A blueprint for transforming small-medium CSV files into Parquet as they are uploaded into a datalake.
Schedule-Driven CSV to Parquet Glue - A blueprint for transforming larger CSV files into Parquet on a scheduled basis using Glue ETL.

Logical Design

MDAA is designed as a set of logical architectural layers, each constituted by a set of functional 'modules'. Each module configures and deploys a set of resources which constitute the data analytics environment. Modules may have logical dependencies on each other, and may also leverage non-MDAA resources deployed within the environment, such as those deployed via Landing Zone Accelerator.

While MDAA can be used to implement a comprehensive, end to end data analytics platform, it does not result in a closed system. MDAA may be freely integrated with non-MDAA deployed platform elements and analytics capabilities. Any individual layer or module of MDAA can be replaced by a non-MDAA component, and the remaining layers/modules will continue to function (assuming basic functional parity with the replaced MDAA module/layer).

MDAA is conceptually, architecturally, and technically similar in nature to the Landing Zone Accelerator (LZA), providing similar functionality for analytics platform configuration and deployment as LZA does for general cloud platform configuration and deployment. The logical layers of MDAA are specifically designed to be deployed on top of a general purpose, secure cloud platform such as that deployed by LZA.

Design Principles

Security and Compliance

See MDAA Security

Governance

Leverage Infrastructure as Code (CDK/CloudFormation, Terraform)--as the single agent of deployment and change within the target AWS accounts
Optional governed, secure self-service deployments via Service Catalog
Consistent but customizable naming convention across all deployed resources
Consistent tagging of all generated resources

Accessibility, Flexibility and Extensibility

Flexible, YAML configuration-driven deployments (CDK Apps) with implicit application of security controls in code
Ability to orchestrate architectures with both Terraform and CDK-based modules
Optional publishing of Service Catalog products for end-user self-service of compliant infrastructure
Reusable CDK L2 and L3 Constructs, and Terraform Modules for consistent application of security controls across modules
Extensibility through multi-language support using the same approach as CDK itself (via JSII)
- TypeScript/Node.js
- Python 3.x
- Java
- .Net

MDAA Components

MDAA is implemented as a set of compliant modules which can be deployed via a unified Deployment/Orchestration layer.

MDAA CDK Modules - A set of configuration-driven CDK Apps, which leverage the MDAA CDK Constructs in order to define and deploy compliant data analytics environment components as CloudFormation stacks. These apps can be executed directly and independently using CDK cli, or composed and orchestrated via the MDAA CLI.
MDAA Terraform Modules (Preview) - A set of standardized Terraform modules which adhere to security control requirements. These apps can be executed directly and independently using Terraform cli, or composed and orchestrated via the MDAA CLI. Note that Terraform integration is currently in preview, and not all MDAA functionality is available.
MDAA CDK L2 and L3 Constructs - A set of reusable CDK constructs which are leveraged by the rest of the MDAA codebase, but can also be reused to build additional compliant CDK constructs, stacks, or apps. These constructs are each designed for compliance with AWS Solutions, HIPAA, PCI-DSS and NIST 800-53 R5 CDK Nag rulesets. Similar to the CDK codebase MDAA is built on, MDAA constructs are available with binding for multiple languages, currently including TypeScript/Node.js and Python 3.
MDAA CLI (Deployment/Orchestration) App - A configuration driven CLI application which allows for composition and orchestration of multiple MDAA Modules (CDK and Terraform) in order to deploy a compliant end to end data analytics environment. Also ensures that each MDAA Module is deployed with the specified configuration into the specified accounts while also accounting for dependencies between modules.

Available MDAA Modules (CDK Apps and L3 Constructs)

Governance Modules (CDK Apps and L3 Constructs)

(Preview)SageMaker Catalog - Allows SageMaker Catalog domains to be deployed.
(Preview)DataZone - Allows DataZone domains and environment blueprints to be deployed.
(Preview)Macie Session - Allows Macie sessions to be deployed at the account level.
LakeFormation Data Lake Settings - Allows LF Settings to be administered using IaC.
LakeFormation Access Controls - Allows LF Access Controls to be administered using IaC
Glue Catalog - Configures the Encryption at Rest settings for Glue Catalog at the account level. Additionally, configures Glue catalogs for cross account access required by a Data Mesh architecture.
IAM Roles and Policies - Generates IAM roles for use within the Data Environment
Audit - Generates Audit resources to use as target for audit data and for querying audit data via Athena
Audit Trail - Generates CloudTrail to capture S3 Data Events into Audit Bucket
Service Catalog - Allows Service Catalog Portfolios do be deployed and access granted to principals

Data Lake Modules (CDK Apps and L3 Constructs)

Datalake KMS and Buckets - Generates a set of encrypted data lake buckets and bucket policies. Bucket policies are suitable for direct access via IAM and/or federated roles, as well as indirect access via LakeFormation/Athena.
Athena Workgroup - Generates Athena Workgroups for use on the Data Lake

Data Ops Modules (CDK Apps and L3 Constructs)

Data Ops Project - Generates shared secure resources for use in Data Ops pipelines, such as Glue Databases, LakeFormation grants, and DataZone Projects/Environments/DataSources
Data Ops Crawlers - Generates Glue crawlers for use in Data Ops pipelines
Data Ops Jobs - Generates Glue jobs for use in Data Ops pipelines
Data Ops Workflows - Generates Glue workflows for orchestrating Data Ops pipelines
Data Ops Step Functions - Generates Step Functions for orchestrating Data Ops pipelines
Data Ops Lambda - Deploys Lambda functions for reacting to data events and performing smaller scale data processing
Data Ops DataBrew - Generates Glue DataBrew resources (Jobs, Recipes) for performing data profiling and cleansing
(Preview) Data Ops Nifi - Generates Apache Nifi clusters for building event-driven data flows
(Preview) Data Ops Database Migration Service (DMS) - Generates DMS Replication Instances, Endpoints, and Tasks

Data Analytics Modules (CDK Apps and L3 Constructs)

Redshift Data Warehouse - Deploys secure Redshift Data Warehouse clusters
Opensearch Domain - Deploys secure Opensearch Domains and Opensearch Dashboards
QuickSight Account - Deploys resources which can be used to deploy a QuickSight account
QuickSight Namespace - Deploys QuickSight namespaces into an account to allow for QuickSight multi tenancy in the same QuickSight/AWS Account
QuickSight Project - Deploys QuickSight Shared Folders and permissions

AI/Data Science Modules (CDK Apps and L3 Constructs)

SageMaker Studio Domain - Deploys secured SageMaker Studio Domain
SageMaker Notebooks - Deploys secured SageMaker Notebooks
Data Science Team/Project - Deploys resource to support a team's Data Science activities
Generative AI Accelerator - Deploys resources for an authenticated GenAI-powered ChatBot

Core/Utility Modules (CDK Apps and L3 Constructs)

EC2 - Generates secure EC2 instances and Security groups
SFTP Transfer Family Server - Deploys SFTP Transfer Family service for loading data into the Data Lake
SFTP Transfer Family User Administrator - Allows SFTP Transfer Family users to be administered in IaC
DataSync - Deploys DataSync resources for data movement service between on-premises storage systems and cloud-based storage services
EventBridge - Deploys EventBridge resources such as EventBuses

Available MDAA Reusable CDK L2 Constructs

These constructs are specifically designed to be compliant with the AWSSolutions, HIPAA, PCI-DSS, and NIST 800-53 R5 CDK Nag Rulesets and are used throughout the MDAA codebase. Additionally, these compliant constructs can be directly leveraged to build new constructs outside of the MDAA codebase.

Available MDAA Reusable Terraform Modules (Preview)

These modules are specifically designed to be compliant with standard Checkov rules. Each Terraform module will have Checkov applied at plan/deploy time. Note that these modules are managed in a separate MDAA Terraform Git Repo.

Athena Workgroups
S3 Datalake
Data Science Team
Glue Catalog Settings
DataOps Glue Crawlers
DataOps Glue Jobs
DataOps Glue Workflow
DataOps Projects

Using/Extending MDAA Overview

MDAA can be used and extended in the following ways:

Configuration-driven, compliant, end to end Analytics Environments can be configured and deployed using MDAA config files and the MDAA CLI
- Organizations with minimal IaC development and support capability or bandwidth
- Accessible by all roles
  - No code, Yaml configurations
- Simple to complex configurations and deployments
- High end to end compliance assurance
Custom, code-driven end to end Analytics Environments can be authored and deployed using MDAA reusable constructs
- Organizations with IaC development and support capability
- Accessible by Developers and Builders
- Multi-language support
- High compliance assurance for resources deployed via MDAA constructs
Custom-developed and deployed data-driven applications/workloads can be configured to leverage MDAA-deployed resources via the standard set of SSM params which are published by all MDAA modules
- Independently developed in Terraform, CDK or CFN
- Loosely coupled with MDAA via SSM Params
- Workload/Application compliance independently validated

Metrics collection

This solution collects anonymous operational metrics to help AWS improve the quality and features of the solution. For more information, including how to disable this capability, please see the [implementation guide] (https://docs.aws.amazon.com/cdk/latest/guide/cli.html#version_reporting).

Development and Testing

MDAA includes comprehensive testing for both TypeScript/CDK code and Python Lambda/Glue functions:

TypeScript Testing: CDK unit tests using CDK Assertions framework
Python Testing: Modern uv-based testing with pytest for Lambda functions and Glue jobs
CI/CD Integration: Automated testing in build pipelines

Quick Start for Developers

# Run all tests
./scripts/test.sh              # Both TypeScript and Python tests

# Run specific test types
lerna run test --stream        # TypeScript tests only
npm run test:python:all        # Python tests only

# Development workflow
lerna run build && lerna run test    # Build and test TypeScript
uv run pytest                       # Run Python tests (from python-tests/ dir)

For detailed development and testing information, see:

DEVELOPMENT.md - Development setup and testing guide
PYTHON_TESTING.md - Comprehensive Python testing documentation
CONTRIBUTING.md - Contribution guidelines

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
.github/workflows		.github/workflows
.gitlab		.gitlab
bin		bin
deployment		deployment
docs		docs
installer		installer
integ/model-interfaces/request-handler		integ/model-interfaces/request-handler
packages		packages
sample_blueprints		sample_blueprints
sample_code		sample_code
sample_configs		sample_configs
schemas		schemas
scripts		scripts
templates		templates
.bandit		.bandit
.cdxgenrc		.cdxgenrc
.eslintrc.json		.eslintrc.json
.gitallowed		.gitallowed
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.gitleaksignore		.gitleaksignore
.pages		.pages
.prettierrc.json		.prettierrc.json
.semgrepignore		.semgrepignore
.viperlightignore		.viperlightignore
ARCHITECTURES.md		ARCHITECTURES.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONFIGURATION.md		CONFIGURATION.md
CONTRIBUTING.md		CONTRIBUTING.md
CUSTOMIZATION.md		CUSTOMIZATION.md
DATASETS.md		DATASETS.md
DEPLOYMENT.md		DEPLOYMENT.md
DEVELOPMENT.md		DEVELOPMENT.md
LICENSE.txt		LICENSE.txt
NOTICE.txt		NOTICE.txt
PREDEPLOYMENT.md		PREDEPLOYMENT.md
PYTHON_TESTING.md		PYTHON_TESTING.md
README.md		README.md
SECURITY.md		SECURITY.md
THIRD-PARTY-NOTICE.md		THIRD-PARTY-NOTICE.md
aws-codeguru-reviewer.yml		aws-codeguru-reviewer.yml
buildspec.yml		buildspec.yml
eslint.config.js		eslint.config.js
jest.config.snapshot.js		jest.config.snapshot.js
lerna.json		lerna.json
mkdocs.yml		mkdocs.yml
nx.json		nx.json
package-lock.json		package-lock.json
package.json		package.json
solution-manifest.yaml		solution-manifest.yaml
sonar-project.properties		sonar-project.properties
typedoc.json		typedoc.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Modern Data Architecture Accelerator (MDAA)

MDAA Overview

Target Usage

Getting Started

Sample Architectures

Sample DataOps Blueprints

Logical Design

Design Principles

Security and Compliance

Governance

Accessibility, Flexibility and Extensibility

MDAA Components

Available MDAA Modules (CDK Apps and L3 Constructs)

Governance Modules (CDK Apps and L3 Constructs)

Data Lake Modules (CDK Apps and L3 Constructs)

Data Ops Modules (CDK Apps and L3 Constructs)

Data Analytics Modules (CDK Apps and L3 Constructs)

AI/Data Science Modules (CDK Apps and L3 Constructs)

Core/Utility Modules (CDK Apps and L3 Constructs)

Available MDAA Reusable CDK L2 Constructs

Available MDAA Reusable Terraform Modules (Preview)

Using/Extending MDAA Overview

Metrics collection

Development and Testing

Quick Start for Developers

Security

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors 8

Uh oh!

Languages

License

aws/modern-data-architecture-accelerator

Folders and files

Latest commit

History

Repository files navigation

Modern Data Architecture Accelerator (MDAA)

MDAA Overview

Target Usage

Getting Started

Sample Architectures

Sample DataOps Blueprints

Logical Design

Design Principles

Security and Compliance

Governance

Accessibility, Flexibility and Extensibility

MDAA Components

Available MDAA Modules (CDK Apps and L3 Constructs)

Governance Modules (CDK Apps and L3 Constructs)

Data Lake Modules (CDK Apps and L3 Constructs)

Data Ops Modules (CDK Apps and L3 Constructs)

Data Analytics Modules (CDK Apps and L3 Constructs)

AI/Data Science Modules (CDK Apps and L3 Constructs)

Core/Utility Modules (CDK Apps and L3 Constructs)

Available MDAA Reusable CDK L2 Constructs

Available MDAA Reusable Terraform Modules (Preview)

Using/Extending MDAA Overview

Metrics collection

Development and Testing

Quick Start for Developers

Security

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors 8

Uh oh!

Languages

Packages