Table of Contents
Introduction
PostgreSQL provides a variety of built-in aggregation functions that allow you to calculate values such as count, sum, average, minimum, and maximum values for a set of rows. In this post, I’m going to walk you through the basic to advance aggregation functions.
Let’s get started
Basic Aggregation Functions
You can use aggregation functions such as COUNT, SUM, AVG, MAX, MIN to get some insights into your data. For example, given the order items with the following data (more rows below):
If you want to calculate the total items ordered in the table, you can use this:
SELECT sum(quantity) as total_items_ordered FROM order_items;
Of course, you can use the WHERE condition to do further filtering.
To get the average of a column (price for example), you can run this query:
select AVG(price) as average_price from order_items;
To get the MAX/MIN value of the price column, you can use the following queries:
select MAX(price) as max_price from order_items; select MIN(price) as min_price from order_items;
And of course, you can use both MIN and MAX in a single query:
select MIN(price) as min_price, MAX(price) as max_price from order_items;
If you want to calculate the total items of a specific order, you can use the SUM function.
select SUM(quantity) as total_order_quantity from order_items where order_id = 1;
Using GROUP BY clause
The GROUP BY clause is quite useful when you want to do aggregation on fields grouped by certain criteria. Sounds confusing? It need not be.
Given the order_items table above, you can use the GROUP BY clause to:
- Calculate the total items of each order:
SELECT order_id, SUM(quantity) total_items from order_items GROUP BY order_id;
You can also use GROUP BY to calculate the value of each order:
SELECT order_id, SUM(quantity) total_items, SUM(quantity * price) as order_total from order_items GROUP BY order_id;
This would produce:
There are a few things you need to remember when using GROUP BY:
- You cannot use * in the SELECT statement (why? reasons below)
- The columns between your SELECT and FROM are either appear in the GROUP BY statement or aggregate functions (SUM/AVG/MIN/MAX)
In other words, any field that is not being aggregated must be included in the GROUP BY clause.
Filtering Data with the HAVING Clause
The HAVING clause is used in conjunction with the GROUP BY clause to filter data returned from GROUP BY. You can imagine what HAVING does to GROUP BY is similar to what WHERE does to SELECT.
In the GROUP BY example above, when you have the total value of the order back in the column order_total, you may want to filter only orders that have value > 50.
You can use HAVING to achieve this:
SELECT order_id, SUM(quantity) total_items, SUM(quantity * price) as order_total from order_items GROUP BY order_id HAVING SUM(quantity * price) > 50;
You may wonder, why did I need to do the calculation again in the HAVING clause but not use the alias order_total
? It is because the aliases only available after GROUP BY and HAVING are complete so using aliases in HAVING would result in an error.
Conclusion
In this post, I’ve shown you how to use various aggregate functions in SQL to extract valuable data from your database. Using these functions can help you quickly draw insights from your raw data.
I build softwares that solve problems. I also love writing/documenting things I learn/want to learn.