Skip to main content

External Hive Metastore for S3

info

Adding an Ahana data source creates a definition of the connection between Ahana and the data source you want Ahana to connect to. See Data Sources Overview.

Ahana data source definitions do not move or create database objects or data, and do not require the data source to exist when the definition is created.

The external Hive metastore for S3 is a central repository containing Apache Hive metadata. It stores the data that describes Hive tables and partitions in a relational database.

A Hive metastore consists of two fundamental units:

  • A service that provides metastore access to other Apache Hive services.
  • Disk storage for the Hive metadata which is separate from HDFS storage.

This page walks you through adding a Hive Metastore for Amazon S3 cluster as a data source to Ahana Cloud for Presto.

Step 0: Locate all necessary connection information

Find your Thrift endpoints:

  • Locate the endpoints on your Hive Metastore hosts. The URI of the Hive metastore is in the format:

    thrift://<host_name>:9083

To find your Role ARN for the role that can connect to your S3 Bucket:

  1. Sign in to the AWS Management Console and open the IAM console at https://console.aws.amazon.com/iam/.
  2. In the navigation pane, choose Roles.
  3. Choose the name of the intended user, and then choose the ARN in the role summary.

Configure Hive Metastore

To list the access keys for an IAM user with permission to connect to your S3 bucket:

  1. Sign in to the AWS Management Console and open the IAM console at https://console.aws.amazon.com/iam/.
  2. In the navigation pane, choose Users.
  3. Choose the name of the intended user, and then choose the Security credentials tab. The user's access keys and the status of each key are displayed.

Only the user's access key ID is visible. The secret access key can only be retrieved when the key is created.

Step 1: Select Hive Metastore for Amazon S3 as the Connector Type

  1. In the Ahana SaaS Console, select Data Sources.
  2. Select Add Data Source.

Add Data Source

  1. Select Hive Metastore for Amazon S3.

Configure Hive Metastore

Step 2: Configure the data source details

  1. Enter a Name.
    The name is used to derive the catalog name used for Presto. A data source name can contain only lowercase letters, numbers, and underscores. The table in the Data Sources tab of the Ahana SaaS Console shows both the name entered here and the derived catalog name. See Data Sources Overview.
  2. Enter a Description for this data source.

Configure Hive Metastore

Step 3: Configure the Hive Metastore cluster access details

  1. Enter the Thrift Endpoint URI(s) for your Hive Metastore cluster.

    You may enter multiple URIs by separating them with commas. For example:

      thrift://hms:9083, thrift://hive-metastore:9083
  2. In S3 Credentials, select either IAM Role or Access key.

    • For IAM Role:
      • Enter the Role ARN. For example: arn:aws:iam::123456789012:role/my-s3-role
    • For Access key:
      1. Enter the Access Key ID to use.
        Typically looks like: AKIAIOSFODNN7EXAMPLE
      2. Enter the Secret Access Key.
        Typically looks like: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

Configure Hive Metastore 3. If a Ranger authorization service is configured, use the Authorization Service drop-down to select the authorization service name.

Authorization Service

info

If no Apache Ranger authorization service definition exists, then the Authorization Service drop-down is not displayed.

See Add an Apache Ranger Authorization Service to create an Apache Ranger authorization service definition.

Step 4: Add the data source

When you have configured the data source, select Add Data Source.

You can now connect to this data source from your Presto clusters.