In the previous lesson 4. Basic Data Retrieval, you were (gently) introduced to: how to run your practice SQL and learning the basics of data retrieval. This lesson now builds on your first real foray into the world of SQL by showing you:
- A little more advanced data filtering
- How to aggregate your retrieved data
Table of contents
- Example data for this lesson
- More ways to filter your data retrieval
- Aggregating your retrieved data
- Grouping your retrieved data
- Summary
- Test Yourself
1. Example data for this lesson
For this lesson, we will be using the same Table and Data used in the last lesson. So, to get started, if you have cleared SqlFiddle.com let’s create things again and do a SELECT to show the data that you will be working with.
CREATE TABLE account_transaction (
trans_id INT (7) PRIMARY KEY,
account_name VARCHAR (30) NOT NULL,
trans_date DATE NOT NULL,
payee VARCHAR (30) NOT NULL,
category VARCHAR (40) NOT NULL,
description VARCHAR (100) NOT NULL,
income DECIMAL (7,2) NOT NULL,
expense DECIMAL (7,2) NOT NULL,
reconciled VARCHAR (1)
)
;
INSERT INTO account_transaction
(
trans_id
,account_name
,trans_date
,payee
,category
,description
,income
,expense
,reconciled
)
VALUES
(1, 'VISA', '2022-01-01', 'OpeningBalance', 'Opening Balance', ' ', 2.00, 0.00, 'Y')
,(2, 'VISA', '2022-01-01', 'The Gas Company', 'Utilities:Gas', ' ', 0.00, 37.50, 'Y')
,(3, 'VISA', '2022-01-02', 'The Big Supermarket', 'Weekly Shop', 'our shopping', 0.00, 52.39, 'Y')
,(4, 'VISA', '2022-02-12', 'The Big Supermarket', 'Weekly Shop', 'mom shopping', 0.00, 29.99, NULL)
,(5, 'VISA', '2022-02-17', 'The Football Club', 'Misc Expenses', ' ', 0.00, 30.00, NULL)
,(6, 'VISA', '2022-02-22', 'The Big Bank', 'Refund', ' ', 39.00, 0.00, NULL)
,(7, 'VISA', '2022-02-28', 'The Corner Shop', 'Weekly Shop', 'holiday shop', 0.00, 10.46, NULL)
,(8, 'VISA', '2022-03-17', 'The Football Club', 'Misc Expenses', ' ', 0.00, 30.00, NULL)
;
select *
from account_transaction
;
- Right then…now we have data for the rest of this lesson 🙂 !
You will see from the above Expected Result, that you have 8 rows of Account Transaction data to play with. To help you refer back to this Result’s Table, you can view it in a new browser tab by clicking on here.
2. More ways to filter your data retrieval
2.1 IN operator
We have seen how you can use the OR operator to retrieve specific data. However, the IN operator allows you to specify multiple OR statements more simply!
SELECT *
FROM account_transaction
WHERE category IN ('Refund', 'Opening Balance', 'Misc Expenses')
;
- Instead of using 3 OR statements to retrieve rows for ‘Refund’, ‘Opening Balance’ and ‘Misc Expenses’, we have used just one statement…great isn’t it! However, please be aware that you cannot use wildcard searches, see Note below
The disadvantage of using IN rather than OR is that you can only use it for exact matches, so wildcard searches (percent sign (%) in the search value) cannot be used
2.2 BETWEEN operator
BETWEEN is used mainly for integer (INT) and DATE (which we will come onto in the next section) searches, where we are looking for data matches between certain numbers and dates.
SELECT *
FROM account_transaction
WHERE expense BETWEEN 29.99 and 35
;
- Here, we have retrieved 3 rows where the expense is between £29.99 and £35.00
- Note: even though the clause is between, it has actually retrieved a row where the expense matches the first BETWEEN integer (29.99)
2.3 DATE function
As shown in lesson 3. Database Concepts, there are many ways to store data as Data Types (INT, VARCHAR, DATE etc) on a database. One of these data types being the DATE format. So how do we search on this data type?
SELECT *
FROM account_transaction
WHERE trans_date = '2022-01-01'
;
SELECT *
FROM account_transaction
WHERE trans_date BETWEEN '2022-02-12' and '2022-02-28'
;
- To search on a date in MySql, you simply put the search value in a ‘YYYY-MM-DD’ format, where YYYY is the year, MM is the month and DD is the day
- The 1st statement shows a straight search on a specific date
- The 2nd statement shows how we can do a search BETWEEN 2 dates
As you know…we are using the MySql Database format for all our examples, as set at the top of the SqlFiddle.com application, shown below. We are using MySql because it is an easy introduction to learning SQL! The Date query is very simple in MySql, however, you will find that other DBMSs are maybe a little more difficult to use
3. Aggregating your retrieved data
Sometimes you will just want to summarise data from a table instead of retrieving it all, for example:
- Finding out how many rows there are on a table or for a selection of data
- Calculating the sum total of a customer’s bank transactions
- Working out the average number of customers that open a bank account each month
- Finding the maximum and minimum amount a customer has paid into their bank account
This section will show you how to do all this, by using these SQL functions:
- COUNT()
- SUM()
- AVG()
- MAX()
- MIN()
3.1 COUNT() function
COUNT() is used to find out how many rows have been retrieved by a SELECT statement. This is a very useful function that I find myself using a great deal of.
As a Test Analyst, I am always being asked questions such as:
- So, how many customers do we have?
- How many customers do we have that haven’t paid their bills this month?
- How many customers are in credit?
- How many customers who pay > £100 each month are in debit?
SELECT *
FROM account_transaction
WHERE trans_date > '2022-02-01'
;
SELECT COUNT(*)
FROM account_transaction
WHERE trans_date > '2022-02-01'
;
- The first statement is just showing the data that would be retrieved for a SELECT * statement that we are using in the 2nd (COUNT) statement. From this, you can see all the rows that have a trans_date > 2022-02-01 (YYYY-MM-DD)
- The 2nd statement shows us the number (COUNT) of rows retrieved
3.1.1 Using Aliases
As you can see in the above result, the calculated column does not have a specific name, it just uses the function name (COUNT(*)).
Wouldn’t it be handy to perhaps retrieve the data and give the resulting column name a more meaningful title, say number_of_transactions…
SELECT COUNT(*) AS number_of_transactions
FROM account_transaction
WHERE trans_date > '2022-02-01'
;
- By using the AS keyword, we are giving the resulting column an alias (number_of_transactions). This is also handier than just using as a column title, as you will see in 5.4.2 HAVING clause, where it is actually used as a search condition
3.2 SUM() function
Now, what if we want to show the sum total of transaction expenses for rows that have a trans_date > 2022-02-01?
SELECT *
FROM account_transaction
WHERE trans_date > '2022-02-01'
;
SELECT SUM(expense)
FROM account_transaction
WHERE trans_date > '2022-02-01'
;
- Again, the first statement is just showing the data that would be retrieved for the SELECT * statement we are using in the 2nd (SUM) statement. From this, you can see all the rows that have a date > 2022-02-01 (YYYY-MM-DD)
- The 2nd statement shows us the sum total of the expense (SUM) of rows retrieved. So, if you add up the 5 row’s expenses, it would come to £100.45
3.3 AVG() / MAX() / MIN() functions
Hopefully, you get the picture of how functions work in SQL. Just to finish off, here are some further examples of using these very useful tools.
SELECT *
FROM account_transaction
WHERE trans_date > '2022-02-01'
;
SELECT AVG(expense)
FROM account_transaction
WHERE trans_date > '2022-02-01'
;
SELECT MAX(expense)
FROM account_transaction
WHERE trans_date > '2022-02-01'
;
SELECT MIN(expense)
FROM account_transaction
WHERE trans_date > '2022-02-01'
;
- The first statement is just showing the data that would be retrieved for the SELECT * statement we are using in the other (AVG, MAX, MIN) statements. From this, you can see all the rows that have a trans_date > 2022-02-01 (YYYY-MM-DD)
- The 2nd statement displays the average (AVG) amount of the transactions’ expenses that have been retrieved, this being £20.09 (=29.99 + 30 + 0 + 10.46 + 30 / 5 (rows))
- The 3rd statement simply displays the maximum expense amount for the SELECTed 5 rows £30
- And the 4th statement simply displays the minimum expense amount for the SELECTed 5 rows £0
4. Grouping your retrieved data
4.1 GROUP BY statement
There will be times when we may want to know something about groups of data on a database. Using our account_transaction table as an example, if we want to know how much we have spent on each of our Categories (Weekly Shop, Misc Expenses etc), we can use the Group By statement, along with the Sum() function.
SELECT category, SUM(income - expense)
FROM account_transaction
GROUP BY category
;
- In order to use GROUP BY, we must specify which column(s) we are going to group our data on. In this instance, we have just used one column (category)
- By running this statement, each of the categories will be grouped together on one line
- For each category, we are then calculating the sum of all its transactions (income – expense) by using the SUM() function
4.2 HAVING clause
To complement the GROUP BY clause, we can use the HAVING clause.
This clause works like a WHERE clause. It sets the search conditions for GROUPED rows that are returned. So why not just use a WHERE clause? Well, the advantage of using a HAVING clause is that we can include aggregate functions (COUNT(), SUM(), AVG() etc) in the search condition, which are applied after the groups are formed (see the 3rd SELECT statement below).
You MUST be aware, however, that the WHERE clause filters data before a GROUP BY, whereas HAVING actually filters the data that has already been grouped! So, just to reiterate:
- Rows that have been excluded by the WHERE clause will not be used by the GROUP BY clause
- Rows that have been grouped are then further filtered by using the HAVING clause
The HAVING clause can use all the WHERE clause operators as shown in section 4.5.2 WHERE clause operators (=, >, <>, BETWEEN, LIKE etc)
Let’s take a look…
SELECT category, SUM(income - expense) AS trans_total
FROM account_transaction
GROUP BY category
;
SELECT category, SUM(income - expense) AS trans_total
FROM account_transaction
GROUP BY category
HAVING SUM(income - expense) < 0
;
SELECT category, SUM(income - expense) AS trans_total
FROM account_transaction
GROUP BY category
HAVING trans_total > 0
;
- The 1st statement shows what would be returned by a GROUP BY without a HAVING clause…all categories are grouped and returned
- The 2nd statement adds a HAVING clause to the GROUP BY clause, which applies a filter after the groups are formed, therefore, only returning grouped categories that have a sum (income – expense) less than zero
- The 3rd statement shows an aggregated amount aliased as trans_total being used in the HAVING clause search condition. This time, only grouped categories that have a sum > zero are displayed
5. Summary
This lesson has completed our learning of the Data Retrieval basics and has hopefully given you a good deal of confidence to tackle the next, more advanced, lesson of retrieving data from multiple tables.
You will have learned how to…
- Filter your data retrieval using IN, BETWEEN and DATE
- Summarise data using aggregate functions
- Use aliases for retrieved columns
- Group retrieved data
- Use the HAVING clause to act as a WHERE clause, allowing aggregate functions in search conditions
And…some additional common SQL Keywords:
- IN
- BETWEEN
- DATE()
- COUNT()
- SUM()
- AVG()
- MAX()
- MIN()
- GROUP BY
- HAVING
6. Test Yourself
For this test, you do not have as many questions as previous tests, as the content was shorter, however…
Important to do well…
As with the other tests, once complete check your answers, if you do not score that well (let’s say not getting at least 10 correct), please either go back over the tutorial’s section again, perhaps making more notes as you go, or re-do the test to ensure you have a good understanding.
It is always important that you understand the section that you have read / learned before you move on 🙂 !
Click here to Test Yourself.
Prev lesson ⇒ 4. Basic Data Retrieval
Next lesson ⇒ 6. Joining tables