AWS Lake Formation A Complete Guide to Centralized Data Management Anusha Dwivedula

Поділитися
Вставка
  • Опубліковано 22 чер 2023
  • AWS Lake Formation A Complete Guide to Centralized Data Management
    Anusha Dwivedula
    AWS Community Day Midwest 2023
    Chicagao AWS user group
    Morningstar, a leading investment research firm, is known for being a data powerhouse, and to truly unlock the value of our data, we embarked on a journey to build a data lake on AWS in 2019. The data lake is a central repository for all of Morningstar's data combining data for thousands of disparate systems. It allowed our consumers to ingest data quickly, make it accessible via the AWS Glue data catalog, and grant access to consumers to query the data via Amazon Athena. Data security is at the solution's core, and to enforce our access permissions, we chose AWS Lake Formation tag-based access control (TBAC).
    However, our consumers pushed us for better query performance and enhanced analytical capabilities. We realized we needed a data warehouse to cater to these consumer requirements, so we evaluated Amazon Redshift. Amazon Redshift provides us with features that we could use to work with our consumers and enable their analytical needs:
    ● Better performance for consumers’ analytical requirements
    ● Ability to tune query performance with user-specified sort keys and distribution keys
    ● Ability to have different representations of the same data via views and materialized views
    ● Consistent query performance, regardless of concurrency
    Because our Lake Formation-enforced data lake is a central data repository for all our data, it makes sense for us to flow the data permissions from the data lake into Amazon Redshift. We utilize AWS Identity and Access Management (IAM) authentication and want to centralize the governance of permissions based on IAM roles and groups. For each AWS Glue database and table, we have a corresponding Amazon Redshift schema and table. Our goal was to ensure customers with access to AWS Glue tables via Lake Formation also have access to the related tables in Amazon Redshift.
    However, we faced a problem with user-based entitlements as we moved to Amazon Redshift.
    Entitlements Technical Challenge:
    Amazon Redshift supports resource-based entitlements but doesn’t support tag-based entitlements. The challenge we had to overcome was mapping our existing tag-based entitlements in Lake Formation to the resource-based entitlements in Amazon Redshift.
    Solution Overview:
    We wanted to synchronize our Lake Formation tag ontology and classifications to the Amazon Redshift permission model to solve this mismatch. To do this, we mapped Lake Formation tags and grants to Amazon Redshift grants with the following steps:
    1. Map all the resources (databases, schemas, tables, and more) in Lake Formation that are tagged to their equivalent Amazon Redshift tables.
    2. Translate each policy in Lake Formation on a tag expression to a set of Amazon Redshift table grants and revokes.
    The net result is that when there is a tag or policy change in Lake Formation, a corresponding set of grants or revokes are made to the equivalent Amazon Redshift tables to keep our entitlements in sync.
    Conclusion:
    As part of the solution, we developed two components that map the tag-based access controls to Amazon Redshift permissions. The solution improved the time to market for our data and provided consistent entitlements across different business-driven categories.
    Reference:
    aws.amazon.com... e-formation-to-manage-permissions-for-an-amazon-redshift-data-warehouse/
    AWS lake formation

КОМЕНТАРІ •