PostgreSQL’s GREATEST (and LEAST) function

PostgreSQL has two functions that are not standard SQL, but are very helpful in certain situations : GREATEST and LEAST. PostgreSQL does not have scalar functions for MIN and MAX (those are aggregate functions that operate on all rows in a query), but GREATEST and LEAST accomplish that and then some. Both allow for more than two values to be passed in, and also handle NULLs. From the docs:

The GREATEST and LEAST functions select the largest or smallest value from a list of any number of expressions. The expressions must all be convertible to a common data type, which will be the type of the result (see Section 10.5 for details). NULL values in the list are ignored. The result will be NULL only if all the expressions evaluate to NULL.

PostgreSQL 11 Documentation

I was recently working on a system that provided a VIEW that joined together multiple tables, and had an UPDATED_AT column which represents the most recent UPDATED_AT value from all of the tables that backed the view. This was necessary because the view was being used to snapshot data into another system, and would periodically be checked by comparing the UPDATED_AT field (ie: WHERE UPDATED_AT > :some_value).

Here is an example of what that might look like in a different system (ie: the system I am working on does not have customers, orders, line items).

CREATE VIEW vw_customer_order_line_items (customer_id, order_id, amount, sku,
item_price, item_quantity, most_recent_update)
AS
  SELECT c.id,
         o.id,
         o.amount,
         li.sku,
         li.price,
         li.quantity,
         (SELECT Max(updatecheck.updated_at)
          FROM   ( VALUES (c.updated_at),
                          (o.updated_at),
                          (li.updated_at) ) updatecheck(updated_at))
  FROM   customers c
         JOIN orders o
           ON o.customer_id = c.id
         JOIN line_items li
           ON li.orders_id = o.id;

SELECT *
FROM   vw_customer_order_line_items
WHERE  updated_at > '2019-12-01'; 

The nested SELECT statement creates an aliased nested query, updatecheck, with a single column – updated_at. Let’s look at the query plan to see what is going on:

  • SELECT
    • Merge Join : 119200
      • Index Scan : orders_pk : 18047
      • Temporary (Materialize) : 111767
        • Sort: 111413
          • Hash Join : 86875
            • Full Scan : customers : 14310
            • Transformation Hash : 24128
              • Full scan : line_items : 24128
            • Aggregate : 0.4
              • Value : 0.3
      • Aggregate : 0.4
        • Value: 0.3

A query plan cost of 119,200? That is pretty high, but, not surprising given the lack of other WHERE clauses. With different WHERE clauses added, like on a customer_id, that cost would come down to 25.

I was working on trying to optimize this query, and one of the portions that stood out was the sub-select for computing the maximum updated_at value. I knew that GREATEST was an option, so I changed the query to use it and re-ran the plan to see what changed:

CREATE VIEW vw_customer_order_line_items (customer_id, order_id, amount, sku,
item_price, item_quantity, most_recent_update)
AS
  SELECT c.id,
         o.id,
         o.amount,
         li.sku,
         li.price,
         li.quantity,
         Greatest(c.updated_at, o.updated_at, li.updated_at)
  FROM   customers c
         JOIN orders o
           ON o.customer_id = c.id
         JOIN line_items li
           ON li.orders_id = o.id;

SELECT *
FROM   vw_customer_order_line_items
WHERE  updated_at > '2019-12-01'; 

After optimization:

  • SELECT
    • Gather : 81701
      • Hash Join : 66558
        • Hash Join : 50999
          • Full Scan : line_items : 19226
          • Transformation Hash : 11718
            • Full Scan : orders : 11718
      • Transformation Hash : 8252
        • Full Scan : customers : 8252

The cost went from 119,200 down to 81,701, a roughly 32% reduction. What is interesting is that the query optimizer chose to do three full scans this time, where as previously it only needed to do two plus an index scan. But, for a quick one-line change to the query, it still was a good improvement in query execution time.

There are other optimizations that I made to this view that further reduced the overall cost and execution time, but the simple change to GREATEST helped with the performance, not to mention the readability of the SQL.

Next time you need to find the largest (or smallest) value in a list of scalar values, give GREATEST and LEAST a try.

Leave a Reply