Optimize Temporary Table on Presto/Hive SQL

Optimize Temporary Table on Presto/Hive SQL

Optimize Temporary Table on Presto/Hive SQL

Redshift is relational database, Presto is a distributed SQL Query Engine. Presto currently doesnt support the creation of temporary tables and also not the creation of indexes. But you may create tables based on a SQL statement via CREATE TABLE ASPresto Documentation

You optimize the performance of Presto in two ways:

  • Optimizing the query itself
  • Optimizing how the underlying data is stored

One of the best articles around is Top 10 Performance Tuning Tips for Amazon Athena – Athena is a AWS Service based on Presto 0.172 and therefore the tips should also work for Presto.

I am not a Redshift expert but it seems you want to precompute a data set, distributing it and sorting by selected columns, so that it is faster to query.

This corresponds to Presto Hive connector ability to:

  • partition data — data with same value in partitioning column(s) will form a single partition, which is a folder on storage; do not use partitioning on high cardinality columns. This is defined using partitioned_by table property
  • bucket data — data is grouped in files using hash of bucketing column(s); this is similar to partitioning to a certain extent. This is defined using bucketed_by and bucket_count table properties.
  • sort data — within data file, data is sorted by given column(s). This is defined using sorted_by table property.

See examples in Trino (formerly Presto SQL) Hive connector documentation

Note: while i realize documentation is scarce at the moment, i filed an issue to improve it. In the meantime, you can get additional information on Trino (formerly Presto SQL) community slack.

Optimize Temporary Table on Presto/Hive SQL

Related posts on SQL  :

Leave a Reply

Your email address will not be published.