SQL Aggregation Tutorial

Introduction

PostgreSQL provides a variety of built-in aggregation functions that allow you to calculate values such as count, sum, average, minimum, and maximum values for a set of rows. In this post, I’m going to walk you through the basic to advance aggregation functions.

Let’s get started

Basic Aggregation Functions

You can use aggregation functions such as COUNT, SUM, AVG, MAX, MIN to get some insights into your data. For example, given the order items with the following data (more rows below):

The order_items table
The order_items table

If you want to calculate the total items ordered in the table, you can use this:

SELECT sum(quantity) as total_items_ordered FROM order_items;

Select total items ordered
Select total items ordered

Of course, you can use the WHERE condition to do further filtering.

To get the average of a column (price for example), you can run this query:

select AVG(price) as average_price from order_items;

To get the MAX/MIN value of the price column, you can use the following queries:

select MAX(price) as max_price from order_items;
select MIN(price) as min_price from order_items;

And of course, you can use both MIN and MAX in a single query:

select MIN(price) as min_price, MAX(price) as max_price from order_items;
Using MIN and MAX to query values from a table in a single query
Using MIN and MAX to query values from a table in a single query

If you want to calculate the total items of a specific order, you can use the SUM function.

select SUM(quantity) as total_order_quantity from order_items where order_id = 1;
Using sum to calculate total of items in a single order
Using sum to calculate total of items in a single order

Using GROUP BY clause

The GROUP BY clause is quite useful when you want to do aggregation on fields grouped by certain criteria. Sounds confusing? It need not be.

Given the order_items table above, you can use the GROUP BY clause to:

  • Calculate the total items of each order:
SELECT order_id, SUM(quantity) total_items from order_items GROUP BY order_id; 
Using GROUP BY to do calculation on the quantity column
Using GROUP BY to do calculations on the quantity column

You can also use GROUP BY to calculate the value of each order:

SELECT order_id, SUM(quantity) total_items, SUM(quantity * price) as order_total from order_items GROUP BY order_id;

This would produce:

Use GROUP BY to sum quantity and order value
Use GROUP BY to sum quantity and order value

There are a few things you need to remember when using GROUP BY:

  • You cannot use * in the SELECT statement (why? reasons below)
  • The columns between your SELECT and FROM are either appear in the GROUP BY statement or aggregate functions (SUM/AVG/MIN/MAX)

In other words, any field that is not being aggregated must be included in the GROUP BY clause.

Filtering Data with the HAVING Clause

The HAVING clause is used in conjunction with the GROUP BY clause to filter data returned from GROUP BY. You can imagine what HAVING does to GROUP BY is similar to what WHERE does to SELECT.

In the GROUP BY example above, when you have the total value of the order back in the column order_total, you may want to filter only orders that have value > 50.

You can use HAVING to achieve this:

SELECT order_id, SUM(quantity) total_items, SUM(quantity * price) as order_total from order_items GROUP BY order_id HAVING SUM(quantity * price) > 50;

You may wonder, why did I need to do the calculation again in the HAVING clause but not use the alias order_total? It is because the aliases only available after GROUP BY and HAVING are complete so using aliases in HAVING would result in an error.

Conclusion

In this post, I’ve shown you how to use various aggregate functions in SQL to extract valuable data from your database. Using these functions can help you quickly draw insights from your raw data.

Leave a Comment