Create a Presto Cluster
This page will walk you through the creation of a Presto Cluster in the Ahana Compute Plane in your AWS account.
Step 1: Go to the Create Cluster Page
In the Ahana SaaS Console, select Clusters, then select Create Cluster.
Step 2: Provision a Presto Cluster
tip
Select the icon to get additional help.
Enter Cluster Details
Enter a Cluster Name
This needs to be unique across your Ahana Compute Plane. We recommend a descriptive name to help you identify clusters. It will also be used as part of the cluster endpoints. Example: https://telemetry.tenant.cp.ahana.cloud.
The cluster name must begin and end with a letter or number. It can be a maximum of 63 characters long.
Select the Workload Profile
Select the Workload Profile based on the number of concurrent queries that you expect to run on this cluster. There are two types of workload profiles.
Low Concurrency is useful for clusters that run a limited number of queries or a few large, complex queries. It also supports bigger and heavier ETL jobs.
High Concurrency is better for running multiple queries at the same time. For example, dashboard and reporting queries or A/B testing analytics, etc.
Concurrent queries are simply the number of queries executing at the same time in a given cluster. Hence, we have classified workloads based on the number of concurrent queries and curated a set of tuned session properties for each profile.
This setting can be changed after the cluster has been created, and a cluster restart is not required. However, the change will only apply to new queries.
Enter the Cluster Settings
Select the Presto Coordinator AWS Instance Types
Select the AWS EC2 instance type to be provisioned for the Presto cluster coordinator. Since Presto has only 1 coordinator, it is important to have an instance that can support the workload. Recommended instance type is r5.4xlarge
Select the Presto Worker AWS Instance Type
Select the AWS EC2 instance type to be provisioned for the Presto workers. Recommended instance type is r5.2xlarge
tip
It is recommended to use a minimum AWS Instance type of r5.4xlarge
for the Presto coordinator and r5.2xlarge
for the Presto workers. You can find more information about the R5 instance class here.
The M5, R5, and C5 instance types will provide you with the best price performance from the underlying Intel Cascade Lake Process Technology.
To learn more about Intel-optimized instances, visit the AWS and Intel Partner Page.
Select the Scaling Strategy
There are currently two types of scaling strategy you can use to manage your cluster:
- Static
- Scale Out only (CPU)
A Static scaling strategy means that the number of worker nodes is constant while the cluster is being used.
A Scale Out only (CPU) scaling strategy means that the number of worker nodes is increasing based on the worker nodes average CPU utilization.
Additionally, both scaling strategies have the ability to scale into the minimum number of worker nodes when the cluster is idle for a user-specified amount of time.
More details on the Presto Cluster Autoscaling page
Enter the number of Presto Workers
Enter the number of worker nodes you want in the Presto cluster. Pick a number between 1 and 100. In addition to the workers, a coordinator is provisioned by default. The coordinator node is not included in this worker node count.
Enter the Query Termination Grace Period
Manually reducing Presto workers on a cluster will gracefully shutdown workers so that any running queries will not fail due to the scale in. The query termination grace period is the maximum time window that is allowed for existing query tasks to complete on Presto workers before forcefully terminating those workers. The default is ten (10) minutes, and you can set a period between one (1) minute and two hours (120 minutes). You cannot edit this period after it is set. If you want a different query termination grace period in the future, you must create a new cluster.
Set the Data Lake Settings
Ahana-managed Hive Metastore
Select Attach an Ahana Hive Metastore to have Ahana provison a Hive Metastore for the Presto cluster.
Use the Hive Metastore Instance Type drop-down to select the AWS EC2 instance type for the Hive Metastore. The recommended instance type is r5.4xlarge.
important
If you select to create an Ahana-managed Hive Metastore, Ahana will provision a Hive Metastore named ahana_hive
that comes pre-configured and attached to the cluster. It also is pre-integrated with an S3 bucket. You can find information about the S3 bucket created on the cluster page. Use the ahana_hive
name in endpoints to connect to the HMS.
Presto Query Log
Select Enable cluster query log to attach the Presto query log to the Ahana-managed Hive Metastore. The Presto Query Log is always stored in an S3 bucket by default. It can also optionally be attached to the Presto Cluster to query it with Presto. By selecting this option, Ahana creates an external table and view in the attached Hive Metastore for easy access to the log.
note
Enable cluster query log is available only if Attach an Ahana Hive Metastore is selected.
Data Lake Caching
Select Enable Data IO Cache to configure a local AWS EBS SSD drive for each Presto worker node. The volume size of the the configured AWS EBS SSD is three times the size of the memory of the selected Worker Node Instance Type for the Presto cluster.
Select Enable Intermediate Result Set Cache to cache partially computed results set on the worker node's local AWS EBS SSD. This is to prevent duplicated computation on multiple queries, which will improve your query performance and decrease CPU usage.
note
Intermediate result set cache is only beneficial for workloads with aggregation queries.
The volume size of the AWS EBS SSD for the intermediate result set cache is two times the size of the memory of the selected Worker Node Instance Type for the Presto cluster.
note
If the selected Worker Node Instance Type is a type d instance - for example, ‘c5d.xlarge’ - then both Enable Data IO Cache and Enable Intermediate Result Set Cache are automatically enabled using the instance storage instead of AWS EBS SSD volumes.
Select Presto Users
Each Presto cluster must have at least one Presto user. If you already have a pool of created Presto users, you can simply select the Presto users you want to add to the cluster; you will be able to add or remove Presto users after the cluster is created as well.
If there are no Presto users or you do not see the Presto user you want, you can create a new Presto user by clicking on the Create Presto User button.
After you create the new Presto user, you will be able to add it to your cluster.
All Presto user authentication is done over HTTPS to secure your connection to clients, such as the Presto CLI, JDBC drivers, and Superset.
Set Presto Cluster Credentials
Presto clusters created with Ahana Compute Plane versions below 3.0 only support a single Presto user per cluster. In this case, you will need to set a single set of credentials for the cluster by entering a username and password for the cluster.
info
If you are on an Ahana Compute Plane version below 3.0 and would like to be upgraded, please reach out to your Ahana representative or Contact Ahana Support. After it is upgraded, any existing single user Presto clusters can be migrated to use multiple Presto users.
Cluster Provisioning
After you select Create Cluster, the cluster's page will initially show it in Pending state in the Pending Cluster table.
After the cluster provisioning is complete, it will move into active state in the Active Clusters table.
Presto Clusters are provisioned on an Amazon EKS cluster that was initially created in your Ahana Compute Plane. Each node, coordinator, and workers as well as the Hive Metastore are provisioned in individual instances. This means that there is only one container per instance in the Kubernetes cluster for complete isolation and resource utilization.