Skip to content

aws/modern-data-architecture-accelerator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Modern Data Architecture Accelerator (MDAA)

MDAA Overview

The Modern Data Architecture Accelerator (MDAA) is designed to accelerate the implementation of a secure, compliant and fully capable Modern Data Architecture on AWS, allowing organizations of all sizes and sophistication to quickly focus on driving business outcomes from their data while maintaining high assurance of security compliance. Specifically, organizations are enabled to rapidly solve data-driven problems using both traditional analytics, as well as using contemporary capabilities such as generative AI.

MDAA provides rapid deployment of all major elements of a Modern Data Architecture, such as Ingest, Persistence, Governance, DataOps, Consumption, Visual Analytics, Data Science, and AI/ML. Additionally, MDAA has been designed to accelerate compliance with AWS Solutions, NIST 800-53 Rev5 (US), HIPAA, PCI-DSS CDK Nag Rulesets, as well as ITSG-33 (Canada) security control requirements. Terraform modules are compliant with standard Checkov security policies. This combination of integral compliance and broad, configuration-driven capability allows for rapid design and deployment of simple to complex data analytics environments--including Lake House and Data Mesh architectures--while minimizing security compliance risks.

Target Usage

  • Any organization looking to rapidly deploy a secure Modern Data Architecture in support of data-driven business/mission requirements, such as Analytics, Business Intelligence, AI/ML, and Generative AI
  • Large organizations looking to design and deploy complex Modern Data Architectures such as Lake House or Data Mesh.
  • Small to Medium organizations looking for code-free, configuration-driven deployment of a Data Analytics platform.
  • Builder organizations who are building custom, code-driven data analytics architectures through use of reusable compliant constructs across multiple languages.
  • Any organization with elevated compliance/regulatory requirements.

Getting Started

Getting started with MDAA requires the following steps:

  1. Architecture and Design - A physical platform architecture should be defined either from scratch, or derived from an AWS/MDAA reference design.
  2. Configuration - One or more MDAA configuration files are authored, along with individual configuration files for each MDAA module.
  3. (Optional) Customization - Optionally, resources and stacks can be customized through code-based escape hatches before deployment.
  4. Predeployment Preparation - In this step, the MDAA NPM packages are built and published to a private NPM repo.
  5. Deployment - Each MDAA configuration file is either manually or automatically deployed (via CD/CD).

Sample Architectures

Alternatively, you can jump directly into a set of sample architectures and configurations. Note that these sample configurations can be used as a starting point for much more sophisticated architectures.

Sample DataOps Blueprints

Additionally, once your Modern Data Architecture is deployed, you can use these sample Data Operations blueprints, including MDAA configs and DataOps code, to start solving your data-driven problems.


Logical Design

MDAA is designed as a set of logical architectural layers, each constituted by a set of functional 'modules'. Each module configures and deploys a set of resources which constitute the data analytics environment. Modules may have logical dependencies on each other, and may also leverage non-MDAA resources deployed within the environment, such as those deployed via Landing Zone Accelerator.

While MDAA can be used to implement a comprehensive, end to end data analytics platform, it does not result in a closed system. MDAA may be freely integrated with non-MDAA deployed platform elements and analytics capabilities. Any individual layer or module of MDAA can be replaced by a non-MDAA component, and the remaining layers/modules will continue to function (assuming basic functional parity with the replaced MDAA module/layer).

MDAA is conceptually, architecturally, and technically similar in nature to the Landing Zone Accelerator (LZA), providing similar functionality for analytics platform configuration and deployment as LZA does for general cloud platform configuration and deployment. The logical layers of MDAA are specifically designed to be deployed on top of a general purpose, secure cloud platform such as that deployed by LZA.

Mdaa Logical Architecture


Design Principles

Security and Compliance

See MDAA Security

Governance

  • Leverage Infrastructure as Code (CDK/CloudFormation, Terraform)--as the single agent of deployment and change within the target AWS accounts
  • Optional governed, secure self-service deployments via Service Catalog
  • Consistent but customizable naming convention across all deployed resources
  • Consistent tagging of all generated resources

Accessibility, Flexibility and Extensibility

  • Flexible, YAML configuration-driven deployments (CDK Apps) with implicit application of security controls in code
  • Ability to orchestrate architectures with both Terraform and CDK-based modules
  • Optional publishing of Service Catalog products for end-user self-service of compliant infrastructure
  • Reusable CDK L2 and L3 Constructs, and Terraform Modules for consistent application of security controls across modules
  • Extensibility through multi-language support using the same approach as CDK itself (via JSII)
    • TypeScript/Node.js
    • Python 3.x
    • Java
    • .Net

MDAA Components

MDAA is implemented as a set of compliant modules which can be deployed via a unified Deployment/Orchestration layer.

  • MDAA CDK Modules - A set of configuration-driven CDK Apps, which leverage the MDAA CDK Constructs in order to define and deploy compliant data analytics environment components as CloudFormation stacks. These apps can be executed directly and independently using CDK cli, or composed and orchestrated via the MDAA CLI.

  • MDAA Terraform Modules (Preview) - A set of standardized Terraform modules which adhere to security control requirements. These apps can be executed directly and independently using Terraform cli, or composed and orchestrated via the MDAA CLI. Note that Terraform integration is currently in preview, and not all MDAA functionality is available.

  • MDAA CDK L2 and L3 Constructs - A set of reusable CDK constructs which are leveraged by the rest of the MDAA codebase, but can also be reused to build additional compliant CDK constructs, stacks, or apps. These constructs are each designed for compliance with AWS Solutions, HIPAA, PCI-DSS and NIST 800-53 R5 CDK Nag rulesets. Similar to the CDK codebase MDAA is built on, MDAA constructs are available with binding for multiple languages, currently including TypeScript/Node.js and Python 3.

  • MDAA CLI (Deployment/Orchestration) App - A configuration driven CLI application which allows for composition and orchestration of multiple MDAA Modules (CDK and Terraform) in order to deploy a compliant end to end data analytics environment. Also ensures that each MDAA Module is deployed with the specified configuration into the specified accounts while also accounting for dependencies between modules.

MDAA Code Architecture


Available MDAA Modules (CDK Apps and L3 Constructs)

Governance Modules (CDK Apps and L3 Constructs)

  • (Preview)SageMaker Catalog - Allows SageMaker Catalog domains to be deployed.
  • (Preview)DataZone - Allows DataZone domains and environment blueprints to be deployed.
  • (Preview)Macie Session - Allows Macie sessions to be deployed at the account level.
  • LakeFormation Data Lake Settings - Allows LF Settings to be administered using IaC.
  • LakeFormation Access Controls - Allows LF Access Controls to be administered using IaC
  • Glue Catalog - Configures the Encryption at Rest settings for Glue Catalog at the account level. Additionally, configures Glue catalogs for cross account access required by a Data Mesh architecture.
  • IAM Roles and Policies - Generates IAM roles for use within the Data Environment
  • Audit - Generates Audit resources to use as target for audit data and for querying audit data via Athena
  • Audit Trail - Generates CloudTrail to capture S3 Data Events into Audit Bucket
  • Service Catalog - Allows Service Catalog Portfolios do be deployed and access granted to principals

Data Lake Modules (CDK Apps and L3 Constructs)

  • Datalake KMS and Buckets - Generates a set of encrypted data lake buckets and bucket policies. Bucket policies are suitable for direct access via IAM and/or federated roles, as well as indirect access via LakeFormation/Athena.
  • Athena Workgroup - Generates Athena Workgroups for use on the Data Lake

Data Ops Modules (CDK Apps and L3 Constructs)

Data Analytics Modules (CDK Apps and L3 Constructs)

AI/Data Science Modules (CDK Apps and L3 Constructs)

Core/Utility Modules (CDK Apps and L3 Constructs)

  • EC2 - Generates secure EC2 instances and Security groups
  • SFTP Transfer Family Server - Deploys SFTP Transfer Family service for loading data into the Data Lake
  • SFTP Transfer Family User Administrator - Allows SFTP Transfer Family users to be administered in IaC
  • DataSync - Deploys DataSync resources for data movement service between on-premises storage systems and cloud-based storage services
  • EventBridge - Deploys EventBridge resources such as EventBuses

Available MDAA Reusable CDK L2 Constructs

These constructs are specifically designed to be compliant with the AWSSolutions, HIPAA, PCI-DSS, and NIST 800-53 R5 CDK Nag Rulesets and are used throughout the MDAA codebase. Additionally, these compliant constructs can be directly leveraged to build new constructs outside of the MDAA codebase.


Available MDAA Reusable Terraform Modules (Preview)

These modules are specifically designed to be compliant with standard Checkov rules. Each Terraform module will have Checkov applied at plan/deploy time. Note that these modules are managed in a separate MDAA Terraform Git Repo.

  • Athena Workgroups
  • S3 Datalake
  • Data Science Team
  • Glue Catalog Settings
  • DataOps Glue Crawlers
  • DataOps Glue Jobs
  • DataOps Glue Workflow
  • DataOps Projects

Using/Extending MDAA Overview

MDAA can be used and extended in the following ways:

  • Configuration-driven, compliant, end to end Analytics Environments can be configured and deployed using MDAA config files and the MDAA CLI

    • Organizations with minimal IaC development and support capability or bandwidth
    • Accessible by all roles
      • No code, Yaml configurations
    • Simple to complex configurations and deployments
    • High end to end compliance assurance
  • Custom, code-driven end to end Analytics Environments can be authored and deployed using MDAA reusable constructs

    • Organizations with IaC development and support capability
    • Accessible by Developers and Builders
    • Multi-language support
    • High compliance assurance for resources deployed via MDAA constructs
  • Custom-developed and deployed data-driven applications/workloads can be configured to leverage MDAA-deployed resources via the standard set of SSM params which are published by all MDAA modules

    • Independently developed in Terraform, CDK or CFN
    • Loosely coupled with MDAA via SSM Params
    • Workload/Application compliance independently validated

MDAA Usage and Extension

Metrics collection

This solution collects anonymous operational metrics to help AWS improve the quality and features of the solution. For more information, including how to disable this capability, please see the [implementation guide] (https://docs.aws.amazon.com/cdk/latest/guide/cli.html#version_reporting).

Development and Testing

MDAA includes comprehensive testing for both TypeScript/CDK code and Python Lambda/Glue functions:

  • TypeScript Testing: CDK unit tests using CDK Assertions framework
  • Python Testing: Modern uv-based testing with pytest for Lambda functions and Glue jobs
  • CI/CD Integration: Automated testing in build pipelines

Quick Start for Developers

# Run all tests
./scripts/test.sh              # Both TypeScript and Python tests

# Run specific test types
lerna run test --stream        # TypeScript tests only
npm run test:python:all        # Python tests only

# Development workflow
lerna run build && lerna run test    # Build and test TypeScript
uv run pytest                       # Run Python tests (from python-tests/ dir)

For detailed development and testing information, see:

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 8