Create a Presto Cluster
To create an Ahana-managed Presto cluster in the Ahana Compute Plane of your AWS account:
In the Ahana SaaS Console, select Clusters, then select Create Cluster.
The Create a Cluster page is displayed.
tip
Select the icon for additional help.
General Settings
Enter a Cluster Name
Enter a Name for the cluster. The Cluster Name:
- must be unique across your Ahana Compute Plane
- must begin and end with a letter or number
- may contain letters, numbers, spaces, and hyphens
- must be a maximum length of 63 characters.
Ahana recommends entering a descriptive name to help identify the cluster.
The cluster name is used as part of the cluster endpoints. For example, a cluster name telemetry
would be used to form the Presto endpoint https://telemetry.tenant.cp.ahana.cloud
and the JDBC endpoint jdbc:presto://telemetry.tenant.cp.ahana.cloud:443
. For more information about how the endpoints are defined, see Endpoints.
Select the Workload Profile
Concurrent queries are the number of queries executing at the same time in a cluster. Ahana has identified workloads based on the number of concurrent queries and curated a set of tuned session properties for each workload profile.
Select the Workload Profile based on the number of concurrent queries expected to run on the cluster.
Low Concurrency is useful for clusters that run a limited number of queries or a few large, complex queries. Low concurrency also supports bigger and heavier ETL jobs.
High Concurrency is better for running multiple queries at the same time, such as dashboard and reporting queries or A/B testing analytics.
note
The workload profile of an existing cluster can be changed, and changing the workload profile takes effect without requiring the cluster to restart. The change only apples to queries that begin after the change is made.
Cluster Settings
Select the Node Instance Types
Select the AWS EC2 instance type to be provisioned for the Coordinator Instance Type. Because Presto has only one coordinator node, it is important to have an instance that can support the workload. The recommended Coordinator Instance Type is r5.4xlarge.
Select the AWS EC2 instance type to be provisioned for the Worker Node Instance Type. The recommended Worker Node Instance Type is r5.2xlarge.
tip
For more information about the R5 instance class, see Amazon EC2 R5 Instances.
The M5, R5, and C5 instance types provide the best price performance from the underlying Intel Cascade Lake Process Technology.
To learn more about Intel-optimized instances, visit the AWS and Intel Partner Page.
Configure Cluster Scaling
There are two types of scaling strategy available:
- Static: A Static scaling strategy means that the number of worker nodes is constant while the cluster is being used. See Configure Static Scaling.
- Scale Out only (CPU): A Scale Out only (CPU) scaling strategy means that the number of worker nodes begins at a minimum and increases to a maximum based on the worker nodes' average CPU utilization. See Configure Scale Out only (CPU) Scaling.
For more information, see Presto Cluster Autoscaling.
note
The choice of Static or Scale Out only (CPU) scaling strategy, Scale to a single worker node when idle, and the Query Termination Grace Period cannot be modified after the Presto cluster is created.
Configure Static Scaling
In Scaling Strategy, select Static.
Enter the Default Worker Node Count for the number of worker nodes in the Presto cluster. Choose a number between 1 and 100.
Optionally, select Scale to a single worker node when idle to scale the cluster to a single worker node when the cluster is idle for a user-specified amount of time.
If Scale to a single worker node when idle is enabled, the cluster idle time limit can be set in Time window before scaling to a single worker node. The default value is 30
minutes.
Configure Scale Out only (CPU) Scaling
In Scaling Strategy, select Scale Out only (CPU).
Enter the:
- Minimum Worker Node Count
- Maximum Worker Node Count
- Scale Out Step Size
The Presto cluster starts with the number of worker nodes in Minimum Worker Node Count and if the average CPU utilization of the worker nodes goes above 75% for a period of 15 minutes, new worker nodes are added in the Scale Out Step Size amount up to the Maximum Worker Node Count. See When does autoscaling occur?
Optionally, set the Time window before scaling to minimum worker node count. The default value is 30
minutes.
Enter the Query Termination Grace Period
Optionally, set the Query Termination Grace Period value.
Reducing Presto workers on a cluster gracefully shuts down worker nodes so that any running queries do not fail due to the scale in. The Query Termination Grace Period is the maximum time window that is allowed for existing query tasks to complete on Presto workers before forcefully terminating those workers. The default is 10
minutes. The range is between 1
minute and 120
minutes.
Data Lake Settings
Configure the Hive Metastore
Select Attach an Ahana Hive Metastore to provision a Hive Metastore named ahana_hive
that is pre-configured and attached to the Presto cluster.
Select the AWS EC2 instance type to be provisioned for the Hive Metastore Instance Type. The recommended instance type is m5.xlarge.
note
The provisioned Ahana Hive Metastore is pre-integrated with an S3 bucket. You can find information about the S3 bucket at Ahana-managed Hive Metastore and Amazon S3 storage. Use the ahana_hive
name in endpoints to connect to the Hive Metastore.
Enable cluster query log
Select Enable cluster query log to attach the Presto query log to the Ahana-managed Hive Metastore. The Presto Query Log is stored in an S3 bucket by default. The Presto Query Log can optionally be attached to the Presto cluster. Selecting this option creates an external table and view in the attached Hive Metastore for easy access to the query log.
note
Enable cluster query log is available only if Attach an Ahana Hive Metastore is selected.
Configure Data Lake Caching
note
If the selected Worker Node Instance Type is a type d instance - for example, c5d.xlarge
- then both Enable Data IO Cache and Enable Intermediate Result Set Cache are automatically enabled, and use the instance storage instead of AWS EBS SSD volumes.
Select Enable Data IO Cache to configure a local AWS EBS SSD drive for each worker node. The volume size of the the configured AWS EBS SSD is three times the size of the memory of the selected Worker Node Instance Type for the Presto cluster.
Select Enable Intermediate Result Set Cache to cache partially computed results set on the worker node's local AWS EBS SSD. This prevents duplicated computation on multiple queries for improved query performance and decreased CPU usage. The volume size of the AWS EBS SSD for the intermediate result set cache is two times the size of the memory of the selected Worker Node Instance Type for the Presto cluster.
note
Enable Intermediate Result Set Cache is only beneficial for workloads with aggregation queries.
Identity Provider
If an Identity Provider is configured, the Identity Provider pane is displayed and Enable authentication through OIDC is selected.
note
To enable identity providers in Ahana, contact Ahana Support.
Presto Users
Each Presto cluster must have at least one Presto user. Select the Selected checkbox for a Presto user to add that user to the cluster.
Select Create Presto User to create a new Presto user. After you create a Presto user, you can add it to your cluster.
You can also add or remove Presto users after the cluster is created.
Presto user authentication is done over HTTPS to secure your connection to clients such as the Presto CLI, JDBC drivers, and Apache Superset.
info
If you are creating a cluster using an Ahana Compute Plane version below 3.0, the Presto Users pane is not present because Presto clusters created with Ahana Compute Plane versions below 3.0 support only one Presto user per cluster. See Create Single Presto User Cluster Credentials, then return to this page to finish creating the Presto cluster.
Create the Cluster
Select Create Cluster to create the cluster.