Skip to main content

Presto Cluster Storage

Overview

Each Presto cluster comes pre-attached with a Ahana-managed Hive Metastore. The metastore is a data catalog that allows you to map database concepts like databases, tables and columns to files stored in datalakes like Amazon S3. The metastore is pre-configured to use an Amazon S3 bucket for storage and start using Presto as a data warehouse. No additional configuration or changes to any config properties is needed.

Architecture for metadata & storage

The Ahana-managed Hive Metastore (HMS) allows you to create managed or internal tables. This means that tables created in Presto using the ahana_hive catalog are treated as managed tables in Hive. Hive assumes that it owns the data for these managed tables. The Ahana HMS is pre-configured to store managed tables in an Amazon S3 datalake. Each cluster has an Amazon S3 bucket that is pre-created and configured for the HMS. Ahana configures the HMS to point to the S3 bucket.

The Ahana-managed S3 bucket is created in your AWS account. Access the data using the Amazon S3 console. The S3 bucket name includes a shortened name of the cluster that the bucket is attached to. Example: s3a://ahana-ahanademo5-hourly-batch-hms-j123456abcd12.

Example of a managed (internal) table:

CREATE TABLE orders (
orderkey bigint,
orderstatus varchar,
totalprice double,
orderdate date
);
tip

To locate all Amazon S3 buckets created by Ahana, search for the prefix ahana-cf-eksng-stack in your Amazon S3 console.

Ahana configures the Hive Metastore to point the hive.metastore.warehouse.dir path property to the S3 bucket created for each cluster. By default, data for managed tables created in this Hive Metastore are stored in a folder path similar to /databasename.db/tablename/ in the S3 bucket. If a managed table or partition is dropped, both the data and metadata associated with that table or partition are deleted. The files that were stored in S3 are also deleted.

note

External tables point to a location that is different from the pre-configured storage location. The Hive Metastore must be able to access the storage location.

You can create external tables in the Ahana-managed Hive Metastore using Presto. This can be using the Presto CLI or any other tool that connects to Presto. If an external table or partition is dropped, only the metadata associated with that table or partition are deleted from the Ahana-managed Hive Metastore.

Example of an external table:

CREATE TABLE airports (
iata varchar,
airport varchar,
city varchar,
state varchar,
country varchar,
lat real,
long real )
WITH (
format = 'PARQUET',
EXTERNAL_LOCATION = 's3a://ahana-test10001/presto/'
);
tip

For more information, see Managed vs. External Tables.

Managing Ahana-created S3 buckets

Ahana creates one AWS S3 bucket for each cluster that is pre-configured as the backend data warehouse storage for the Hive Metastore for managed tables.

Presto clusters with S3 buckets

  • Similar to the Ahana-managed Hive Metastore, the cluster's S3 bucket is maintained across all Presto cluster stops or restarts.

  • When a Presto cluster is deleted, the Ahana-managed Hive Metastore, the coordinator node, and all worker nodes are deleted.

  • A cluster's S3 bucket is not deleted by default when a cluster is deleted, but you can choose to delete the cluster's S3 bucket. See Delete a Presto Cluster.

  • An S3 bucket that was not deleted with its cluster can be deleted in Ahana, or managed in the AWS Console. See Ahana-managed S3 buckets.