Skip to main content

Glue Data Catalog for S3

info

Adding an Ahana data source creates a definition of the connection between Ahana and the data source you want Ahana to connect to. See Data Sources Overview.

Ahana data source definitions do not move or create database objects or data, and do not require the data source to exist when the definition is created.

Ahana Cloud supports external catalogs that are user managed. AWS Glue is a popular service on AWS that includes the Glue data catalog that manages metadata for structured data stored in an Amazon S3 data lake. This page presents how to add a Glue Data Catalog to Ahana Cloud for Presto.

Step 1: Select AWS Glue Data Catalog as the Connector Type

  1. In the Ahana SaaS Console, select Data Sources.
  2. Select Add Data Source.

Add Data Source

  1. Select AWS Glue Data Catalog for Amazon S3.

Configure Glue

Step 2: Configure data source general information

Enter Glue catalog name

The Name is used to derive the catalog name used for Presto to remove any special characters in the entered name that might not be Presto-friendly. A data source name can contain only lowercase letters, numbers, and underscores. The table in the Data Sources tab of the Ahana SaaS Console shows both the name entered here and the derived catalog name. See Data Sources Overview.

(Optional) Enter Glue catalog description

Enter a Description of the Glue catalog.

Configure AWS Glue for Presto

Step 3: Configure data source access details

In Access details:

  • Enable or disable AWS Lake Formation
  • Configure the connection information for the Glue Data Catalog

Enable AWS Lake Formation

Select Enable AWS Lake Formation to configure this data source to use AWS Lake Formation to access Glue and S3.

Enable AWS Lake Formation

Lake Formation is an AWS service to simplify and secure data lakes backed by AWS Glue and Amazon S3. Ahana-managed Presto clusters can integrate with Lake Formation to enforce data permissions centrally defined in the service. See AWS Lake Formation Integration.

By default, Enable AWS Lake Formation is not selected, and access to Glue and the underlying S3 data is directly by AWS IAM roles.

note

Selecting Enable AWS Lake Formation disables Glue Data Catalog ID.

Connection Configuration

In Connection Configuration, use the AWS Region drop-down to select the AWS Region of the Glue data catalog.

If the Glue Data Catalog exists in a different AWS account than the IAM role, enter the 12-digit AWS account ID of the AWS account where the Glue Data Catalog is located in Glue Data Catalog ID.

Connection Configuration

Directly access Glue and S3

If you want to directly access Glue and S3, leave the Enable AWS Lake Formation checkbox unchecked.

When directly accessing Glue and S3, the access is control via IAM. Further, you can have either separate IAM Roles for each service or a single IAM Role for both. We provide a CloudFormation template to help you create the IAM Role, but you can also create the IAM Role manually. Decide whether you want to setup the IAM role using CloudFormation (recommended) or manually; you do not need to do both.

Setup your IAM Role using CloudFormation

  1. In Roles Setup, select CloudFormation.

Set up your IAM Role using CloudFormation

  1. In CloudFormation Quick Create, select Open CloudFormation to launch a new tab to redirect you to the CloudFormation Quick create stack interface. You may be asked to log into your AWS account.
  2. Check the acknowledgment checkbox at the bottom of the screen and select the Create stack button. After a few minutes, the stack will complete and the IAM Role will be created.

Create IAM Role via CloudFormation

  1. As indicated in the user interface, once the CloudFormation stack is created, you need to obtain the IAM Role ARN. Go to the Outputs tab of the CloudFormation Stack and copy the value of the AhanaCloudDataSourceGlueS3RoleArn Key. Paste the value in the Glue/S3 Role ARN input box.

Setup your IAM Role manually

  1. Under Roles Setup, select the Manual radio button. If you need instructions on how to create the IAM Role manually, please see the Appendix.

Setup your IAM role manually

  1. In Glue Role ARN, enter the IAM Role ARN.

  2. If you are using a single IAM Role for both Glue and S3 access, select the Same as Glue role radio button in the S3 Role ARN section and leave the input box empty.

  3. If you are using a separate IAM Role (from Glue) for S3 access, select the Use different role radio button in the S3 Role ARN section and enter the IAM Role ARN to the S3 Role ARN input box.

Access via Lake Formation

To enable the Glue Data Source to use Lake Formation, check the Enable AWS Lake Formation checkbox. This will change the user interface for Lake Formation configuration.

Configure Glue Access Details Lake Formation

Configure Glue Available Presto Users

Configure Ahana Integration with CloudFormation

With a CloudFormation stack, Ahana can automatically configure your AWS Lake Formation service to work with Ahana-managed clusters. If you want to do the setup manually or get the details on the setup steps, please see the Appendix.

  1. Click the Open CloudFormation button. This will launch a new tab to redirect you to the CloudFormation Quick create stack interface. You may be asked to log into your AWS account.
  2. Check the acknowledgment checkbox at the bottom of the screen and press the Create stack button. After a few minutes, the stack will complete and Ahana Lake Formation integration will be configured.

Configure Ahana Integration with CloudFormation

Map Presto users to IAM Roles

With Lake Formation enabled, Presto users in Ahana- managed Presto clusters can assume IAM role as Lake Formation principals. The data permission policies associated with those assumed IAM roles will be enforced.

The Available Presto Users section lists all available Presto users in Ahana and an optional input box for an IAM Role ARN. Any well-formed non-blank IAM Role ARN will be mapped to the corresponding Presto User (same row) when the Glue Data Source is attached to a cluster. When the Presto user attempts to use this Glue catalog that is backed by Lake Formation, the mapped IAM Role will be used as the Lake Formation principal, and hence, all the data permissions associated with that principal. Please see the Appendix for requirements on IAM Roles for Lake Formation.

  1. Enter a well-formed IAM Role ARN for each Presto User you want to have Lake Formation access. Any blank IAM Role ARN field will be ignored and no mapping entry will be created for the corresponding Presto user. An unmapped Presto user will not have access to any resources in the underlying data catalog.

Map Presto users to IAM Roles

Authorization Service

Authorization Service

In Access Details, use the Authorization Service drop-down to select the authorization service name.

info

If Enable AWS Lake Formation is selected, or if no Apache Ranger authorization service definition exists, then the Authorization Service drop-down is not displayed.

See Add an Apache Ranger Authorization Service to create an Apache Ranger authorization service definition.

Step 4: Add the data source

When you have configured the data source, select Add Data Source.

You can now connect to this data source from your Presto clusters.

Appendix

Create AWS IAM Roles for AWS Glue and Amazon S3 access

The Presto clusters running in the Ahana Compute Plane need access to your AWS Glue catalog for the metadata as well as your Amazon S3 buckets for the data.

Ahana uses named Amazon IAM Roles with Presto. Even though the Ahana Compute Plane and the Presto clusters are deployed in your account, AWS requires that you grant the role being used access to Amazon Glue and Amazon S3.

You can use the same role or different roles for Amazon Glue and Amazon S3.

  • Go to the Amazon IAM Roles page on the AWS console.

  • Select your Glue role that you configured in Ahana's "Glue Role ARN" field (see above), and then click Attach Policies. If you don't want to use an existing role, you can also create a new role.

  • Filter on S3 and select AmazonS3FullAccess from the list of policies

  • Next filter on Glue and select AWSGlueConsoleFullAccess from the list of policies

  • Attach both these policies

  • Next go to Trust Relationships and click on Edit trust relationship

  • Copy paste the following into the JSON Editor. This gives the role ability to assume role so that any Ahana Presto Cluster can access Amazon Glue or Amazon S3. If you want to grant access to only certain Ahana Presto Clusters, see the section below.

important

Remember to replace <accountNumber> in the JSON below with your AWS account number

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<accountNumber>:root"
},
"Action": "sts:AssumeRole"
}
]
}

Restrict Amazon Glue and Amazon S3 access to only specific clusters

  • To restrict access to Amazon Glue and Amazon S3 to only certain clusters, you can provide the ARN for the nodeInstance role that Ahana creates for each cluster.

  • Select the role you configured into Ahana for Amazon Glue or Amazon S3 and go to Trust Relationships and click on Edit trust relationship

    • Copy paste the following into the JSON Editor. This gives the role ability to assume role so that any Ahana Presto Cluster can access Amazon Glue or Amazon S3. If you want to grant access to only certain Ahana Presto Clusters, see the section below. Use this role ARN in the JSON policy for the Role you
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<YOUR Ahana Cluster Node instance ARN >:root"
},
"Action": "sts:AssumeRole"
}
]
}
  • To find the ARN to replace <YOUR Ahana Cluster Node instance ARN > in the JSON above, you can find the role by filtering on AHANA-CF-EKSNG-STACK- on the Roles screen.

  • Go to that role for the cluster you want to grant access to Amazon Glue and copy the role ARN for the cluster and paste it into the JSON.

Example: arn:aws:iam::123456789000:role/AHANA-CF-EKSNG-STACK-YOUR-CLUSTER-NAME-NodeInstanceRole-SOME-HASH

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::123456789000:role/AHANA-CF-EKSNG-STACK-YOUR-CLUSTER-NAME-NodeInstanceRole-SOME-HASH"
},
"Action": "sts:AssumeRole"
}
]
}
tip

The Amazon Glue Data Catalog can be populated by defining data bases and tables or by using the crawler. Read more here

Create IAM Roles for AWS Lake Formation

When creating IAM Roles for AWS Lake Formation that will be mapped to Presto users, the following permissions are required:

  1. AWSGlueConsoleFullAccess
  2. lakeformation:GetDataAccess

In addition, the IAM Role trust policy must allow the following actions:

  1. sts:AssumeRole
  2. sts:TagSession

The principal of the trust policy should be the AWS account associated with your Ahana compute plane.

// Sample Trust Policy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": ["arn:aws:iam::<your-aws-account>:root"]
},
"Action": ["sts:AssumeRole", "sts:TagSession"],
"Condition": {}
}
]
}

Configure Ahana Lake Formation integration manually

To setup Ahana to integrate with your AWS Lake Formation service, you need to make changes to your Lake Formation Data Lake settings.

  1. Fetch the current Data Lake Settings using the AWS Lake Formation API:
aws --profile <profile> --region <region> lakeformation get-data-lake-settings

The following is an example response:

{
"DataLakeSettings": {
"DataLakeAdmins": [
{
"DataLakePrincipalIdentifier": "arn:aws:iam::<your-aws-account>:user/<your-aws-user>"
}
],
"CreateDatabaseDefaultPermissions": [],
"CreateTableDefaultPermissions": [],
"TrustedResourceOwners": [],
"AllowExternalDataFiltering": false,
"ExternalDataFilteringAllowList": [],
"AuthorizedSessionTagValueList": []
}
}
  1. Modify the Data Lake Settings response retrieved to allow your account to perform data filtering and to add Ahana as a trusted partner.
    • AllowExternalDataFiltering: Boolean. If set to true, it allows clusters, like Ahana, to access data in Amazon S3 locations that are registered with Lake Formation. Otherwise, if this is set to false or null, clusters will not be able to access data in Amazon S3 locations that are registered with Lake Formation. Set this to true.
    • ExternalDataFilteringAllowList: An array of DataLakePrincipal objects (not more than 10 structures) that represent a list of account IDs of AWS accounts that are to perform data filtering. Specifically, you provide your AWS account IDs of the assumed principals (i.e. IAM Roles) that will have data permissions and data filters applied.
    • AuthorizedSessionTagValueList: An array of UTF-8 strings that represent tags of accepted, trusted third party integrators, such as Ahana.
{
"DataLakeSettings": {
"DataLakeAdmins": [
{
"DataLakePrincipalIdentifier": "arn:aws:iam::<your-aws-account>:user/<your-aws-user>"
}
],
"CreateDatabaseDefaultPermissions": [],
"CreateTableDefaultPermissions": [],
"TrustedResourceOwners": [],
"AllowExternalDataFiltering": true,
"ExternalDataFilteringAllowList": [
{
"DataLakePrincipalIdentifier": "<your-aws-account>"
}
],
"AuthorizedSessionTagValueList": ["ahana"]
}
}
  1. Update the Data Lake Settings by using the Lake Formation API:
aws --profile <profile> --region <region> lakeformation put-data-lake-settings --catalog-id '<your-aws-account>' --data-lake-settings
'{
"DataLakeAdmins": [
{
"DataLakePrincipalIdentifier": "arn:aws:iam::<your-aws-account>:user/my-aws-user"
}
],
"CreateDatabaseDefaultPermissions": [],
"CreateTableDefaultPermissions": [],
"TrustedResourceOwners": [],
"AllowExternalDataFiltering": true,
"ExternalDataFilteringAllowList": [
{
"DataLakePrincipalIdentifier": "<your-aws-account>"
}
],
"AuthorizedSessionTagValueList": [
"ahana"
]
}'