Ahana Cloud for Presto
Ahana Cloud for Presto offers a cloud-native managed service for Presto on AWS that simplifies open data lake analytics by integrating the components of an open data lakehouse through a unified management console.
Presto is an open source distributed SQL engine that was created by Facebook. Used at Facebook, Uber, Twitter, and thousands more, Presto is the default standard for fast distributed SQL processing on data lakes and lakehouses. Ahana Cloud for Presto enables data platform teams to provide high performance SQL analytics on their AWS S3 data lakes and other data sources. For more information see What is Presto?
One challenge faced by data analytics teams is the use of different engines for different workloads and the use of separate query languages for relational and NoSQL databases. This context switching between different engines and languages is sometimes called replatforming and is expensive in time and expertise. Another challenge is the need to move or ingest data into data warehouses before it can be queried. Another challenge is that data sources are constantly growing in size and even smaller businesses can have data measured in petabytes. Querying data sources at Internet scale needs to be both fast and efficient to serve multiple business needs, including providing insight for data-driven decision-making.
Presto addresses these challenges with
- ANSI-standard SQL, a single language for batch and interactive queries that avoids the need for teams to switch between contexts
- Presto’s data source connector design supports querying of both structured and unstructured data
- An in-memory distributed SQL engine design gives Presto speed and efficiency, allowing it to scale to handle data sources of immense size
The Open Data Lakehouse
The open data lakehouse concept combines the reliability and performance of a data warehouse with the flexibility and price performance of a data lake. The data lake is the central location to store data from disparate sources such as structured, semi-structured, and unstructured data.
Presto’s in-memory distributed design enables price performance efficient SQL data warehouse workloads on your data lake. Consolidate your structured and unstructured data into your data lake and then query it all in the ways you need - interactive, batch, real-time, or streaming - efficiently with Presto using ANSI-standard SQL.
By using Presto and ANSI-standard SQL as a unified interface for interactive, batch, and other workloads, the friction and inefficiency of replatforming from one engine or language to another is avoided. Open source technologies such as Presto and TensorFlow and open formats such as Apache Parquet and ORC avoid vendor lock-in.
For more information, see The Open Data Lakehouse.
Ahana Cloud for Presto
While Presto’s features make it a powerful component of the open data lakehouse solution, Presto is complicated to both configure and manage. Presto also requires a metadata catalog for use with data lakes.
Ahana Cloud for Presto simplifies Presto cluster and data source connector configuration and management, providing control of compute costs, flexible workloads, and queries in open formats. Ahana integrates the components of the open data lakehouse stack, provisioning Presto clusters with a pre-integrated Amazon S3 data lake bucket, an optional Hive metadata catalog, and an instance of Apache Superset to test cluster connectivity to configured data sources.
RaptorX caching allows for speed performance similar to collocated data and compute solutions. Low latency increases the range of use cases for interactive queries, and the speed performance for batch processing lowers cost.
How does Ahana Cloud for Presto Work?
The Ahana SaaS Console, also referred to as the Ahana Control Plane, provides management of resources deployed into your Ahana Compute Plane in your AWS account. Use the Ahana SaaS Console to provision Presto clusters with optional metadata catalogs into your AWS account. Users querying the Presto clusters can use a range of clients such as Looker, Tableau, or Jupyter to access the Presto clusters, and those users do not need to use or access the Ahana SaaS Console.
In addition to deploying and managing Presto clusters, use the Ahana SaaS Console to configure data source connectors, to define Presto users, to integrate authorization services such as Apache Ranger and Amazon Lake Formation, and to integrate OIDC identity providers such as Okta.
For more information on Ahana Cloud for Presto architecture, see What is Ahana Cloud for Presto?
Why Choose Ahana?
Ease of use
Presto has many advantages: as an in-memory distributed SQL engine, Presto is fast. Presto can perform federated queries, querying relational & NoSQL databases, data warehouses, and data lakes. Because Presto queries in place, you don’t have to move or ingest data.
However, Presto’s origins in the Hadoop ecosystem mean that configuring Presto is complex, involving many parameters and configuration files. Presto needs a metadata catalog to work with today’s data lakes, and configuring and managing Presto clusters can be challenging.
Ahana provides fast setup of Presto in your own AWS account using the Ahana SaaS Console and cross-account roles. The Ahana SaaS Console simplifies end-to-end life cycle management of Presto clusters with a pre-built and integrated optional metadata catalog.
Make a configuration change to a Presto cluster using the Ahana SaaS Console, and Ahana manages the configuration changes and restarting of the cluster’s nodes required for the changes to become active. Ahana-managed cluster configuration and metadata information, such as data source connector configurations and Presto users, are preserved in the Ahana SaaS Console when clusters are de-provisioned.
A metadata catalog is only one piece of a broader stack of disaggregated services required for Presto SQL workloads on top of a data lake or federated across data sources. Another important service is data access control governing which users have access to which tables, columns, and rows. Ahana simplifies Presto integration with open and cloud native data access control options such as Apache Ranger and AWS Lake Formation to work with Presto, and supports identity provider integration with OIDC services.
Ahana also provisions an instance of Apache Superset into your Ahana Compute Plane as an administrator tool to test cluster connectivity to data sources. To configure the provided Apache Superset to work with an Ahana-managed Presto cluster, see Query Presto Cluster with Apache Superset.
The Ahana SaaS Console is built natively for AWS and runs in the Ahana AWS account. Using the Ahana SaaS Console and the AWS-recommended best practice of cross-account roles, you deploy and manage Presto clusters in your Ahana Compute Plane in your AWS account. Your Presto clusters run in containers on Amazon EKS for high scalability, availability, and manageability.
Ahana is a leading member of the Presto community and the Linux Foundation’s Presto Foundation. Ahana is a sponsor of PrestoCon and Ahana people are active in the Presto community, including the Presto Slack channel.
Ahana’s contributions to the community include the Presto Query Analyzer by Ahana. The Presto Query Analyzer by Ahana can help identify what type of queries are run on a Presto cluster, and what profile best fits a Presto cluster’s workload. Use the Presto Query Analyzer by Ahana to understand and optimize query performance in Presto clusters.
Ahana delivers great price-performance in several ways:
Ahana uses RaptorX’s hierarchical caching to reduce query latency and improve performance. For more information see RaptorX: Building a 10x Faster Presto and Configuring RaptorX - a multi-level caching with Presto.
To enable data lake caching when creating a Presto cluster in Ahana see Configure Data Lake Caching.
Workload profile optimizing
For each Presto cluster, choose a workload profile of either low concurrency or high concurrency that fits the type of queries you plan to run, and Ahana sets the correct parameters to fit your choice.
For more information, see Announcing the workload profile feature in Ahana Cloud. To set the workload profile when creating an Ahana-managed Presto cluster see Select the Workload Profile.
Both Intel and Ahana are primary contributors to the Velox open source project. Velox is a C++ database acceleration library. As part of the Intel Disruptor Program, Ahana and Intel offer an Open Data Lake Analytics Accelerator Package that is available for Ahana Cloud users that leverage Intel on AWS. For more information contact Ahana.
Ahana Cloud for Presto supports AWS Graviton2-based Amazon Elastic Compute Cloud (Amazon EC2) instances for Presto coordinator and worker nodes in Ahana-managed Presto clusters. Graviton2 processors are designed by AWS to provide high price-performance for cloud workloads running in Amazon EC2. For more information, see AWS Graviton Processor.
The Ahana SaaS Console, also called the Ahana Control Plane, runs in Ahana’s AWS account. Using the AWS-recommended best practice of cross-account roles, you deploy AWS VPC and EKS into your AWS account. Those provisioned resources in your AWS account form your Ahana Compute Plane. You use the Ahana SaaS Console to deploy Presto clusters into your Ahana Compute Plane in your AWS account.
All of your AWS resources run in containers in Amazon EKS in your AWS account. Each Presto cluster is provisioned as a set of containers inside its own private subnet in your AWS account.
Open Source Table Format Integration
A lack of generalized abstractions for metadata - such as schema, versions, and statistics - leads to separate connectors for different table formats and no clear way to ensure consistency and leverage engine optimizations. Open source table formats such as Apache Hudi, Apache Iceberg, or Delta Lake address transaction management needs such as enabling ingesting incremental data, managing data capture for inserts and deletions, and ACID transactions. Integrating with these open source table formats also helps avoid vendor lock-in. For more information, see Building an Open Data Lakehouse with Presto, Hudi, and AWS S3.
Apache Ranger is an open source authorization solution that provides access control and audit capabilities for big data platforms through centralized security administration. Apache Ranger’s open data governance model and plugin architecture enables the extension of access control to projects beyond the Hadoop ecosystem, and the platform is widely accepted among major cloud vendors including AWS, Azure, and GCP.
Ahana-managed Presto clusters can be configured to use Apache Ranger to enforce access control policies defined in Apache Ranger, including:
- role level
- group level
- table level
- column level
and others. You can enable Apache Ranger integration with a Hive Metastore or an AWS Glue data source to apply and audit fine-grained data access control across these catalogs.
To use Apache Ranger with Ahana, see Apache Ranger Integration.
AWS Lake Formation
AWS Lake Formation supports data governance through centralized security management, data discovery, and data sharing. Ahana-managed Presto clusters can use AWS Lake Formation to enforce data permissions and access control policies defined in AWS Lake Formation, such as column-level and row-level security.
To use AWS Lake Formation with Ahana, see AWS Lake Formation Integration.
OIDC, or Open ID Connect, is an identity layer that uses the OAuth 2.0 protocol. OIDC clients use OIDC to verify user identity based on authentication that is performed by an authorization server.
To work with OIDC and Ahana, see Add an Identity Provider.
Start Using Ahana
You have two ways to start with Ahana:
Free Trial Subscription
You can sign up for an Ahana trial subscription that is free for fourteen days. During the trial period all of the Ahana Cloud for Presto features are enabled and you are not billed for use of Ahana Cloud for Presto. You may be billed for AWS resources that you provision that exceed Amazon’s free tier account.
At any time during or after the trial period you can subscribe to Ahana in AWS Marketplace, either in a pay-as-you-go subscription or a contract subscription. For more information see Subscriptions.
To sign up for a fourteen-day free trial subscription see Create an Ahana account.
Ahana Cloud for Presto Community Edition
You can also sign up for Ahana Cloud for Presto Community Edition. Your Community Edition subscription is free forever for your use of Ahana Cloud for Presto. You may be billed for AWS resources that you provision that exceed Amazon’s free tier account.
Some of the full set of Ahana Cloud for Presto features available in the free trial or in a paid subscription are limited in Ahana Cloud for Presto Community Edition. To see the differences, see Ahana Cloud for Presto Community Edition Sign Up.
To sign up for Ahana Cloud for Presto Community Edition see Get Started with Ahana Cloud Community Edition.