Up to now, we have been retrieving data from a single table, but in the real world, you will almost certainly be looking to retrieve data from multiple tables so as to get the information you want.
SQL joins are a very important operation for Test Analysts to be able to execute, so as to query databases effectively. Before we try to practice JOINing tables, let’s first understand a little about what a relational database is.
Table of contents
1. A very basic look at relational databases / tables
One of the main goals of designing a DB is to remove data redundancy (duplicate data). If we take our account_transaction table as an example, we have a column called account_name, this column corresponds to a certain type of account. Now, we can see that there are multiple transactions with the same account… account_name = VISA:
What if we have other account information that we need to store on the Database (e.g. description, start_date and end_date), where would this data be stored? Well, if we were to store it on the account_transaction table, then there would be a great deal of data redundancy, which would break the main goal of design, as mentioned above! Note: certainly, if this were a Bank Database’s table, then there would be millions and millions amounts of redundant data!
If all the data were to be stored on the account_transaction table, notice all the unnecessary repeated data for account_desc, account_start_date and account_end_date as follows:
So, to overcome this data redundancy problem, we can split the account information out into a new relational table called account, so as to hold the account’s information just once! It would be relational, as it would relate to the account_transaction table by the account_name (the PRIMARY KEY of the account table). We can now, if we wish, link the 2 tables using this column through our SQL queries!
We actually call a table like account a lookup table, as we can lookup stored information
With a new account table, the data from the 2 tables would now look like the following, where the 1st table is the account_transaction and the 2nd is the account table. Can you notice the reduction in duplicate data…we now only have to store the account information just once and the account_transaction table then only stores its own transaction information!
Okay, but how do we get the account information, if needed, in a query when querying the account_transaction table? Well, that’s where SQL JOINs come in 🙂 !
2. JOINing table data
2.1 SQL JOINs
The benefits of having relational tables are that they are easy to use and understand, they reduce data redundancy, therefore, making them powerful and more efficient than non-relational tables! However, with this structured manner, we are left with data in separate tables.
So, how do we retrieve data from multiple tables in a SELECT statement? We use a JOIN 🙂 ! Let’s look at some examples to explain. First, let’s create 2 tables to use account and account_transaction …
CREATE TABLE account (
account_id INT (7) PRIMARY KEY,
account_name VARCHAR (30) NOT NULL,
account_desc VARCHAR (100) NOT NULL,
account_start_date DATE NOT NULL,
account_end_date DATE
)
;
INSERT INTO account
VALUES
(1, 'VISA', 'VISA Credit Card Account', '2022-02-01', NULL)
,(2, 'CurrentAccount', 'Current Account', '2022-03-01', NULL)
,(3, 'SavingsAccount', 'Savings Account', '2022-05-01', '2022-05-27')
;
CREATE TABLE account_transaction (
trans_id INT (7) PRIMARY KEY,
account_id INT (7) NOT NULL,
trans_date DATE NOT NULL,
payee VARCHAR (30) NOT NULL,
category VARCHAR (40) NOT NULL,
description VARCHAR (100) NOT NULL,
income DECIMAL (7,2) NOT NULL,
expense DECIMAL (7,2) NOT NULL,
reconciled VARCHAR (1)
)
;
INSERT INTO account_transaction
(
trans_id
,account_id
,trans_date
,payee
,category
,description
,income
,expense
,reconciled
)
VALUES
(1, 1, '2022-01-01', 'OpeningBalance', 'Opening Balance', ' ', 2.00, 0.00, 'Y')
,(2, 1, '2022-01-01', 'The Gas Company', 'Utilities:Gas', ' ', 0.00, 37.50, 'Y')
,(3, 1, '2022-01-02', 'The Big Supermarket', 'Weekly Shop', 'our shopping', 0.00, 52.39, 'Y')
,(4, 1, '2022-02-12', 'The Big Supermarket', 'Weekly Shop', 'mom shopping', 0.00, 29.99, NULL)
,(5, 1, '2022-02-17', 'The Football Club', 'Misc Expenses', ' ', 0.00, 30.00, NULL)
,(6, 1, '2022-02-22', 'The Big Bank', 'Refund', ' ', 39.00, 0.00, NULL)
,(7, 1, '2022-02-28', 'The Corner Shop', 'Weekly Shop', 'holiday shop', 0.00, 10.46, NULL)
,(8, 1, '2022-03-17', 'The Football Club', 'Misc Expenses', ' ', 0.00, 30.00, NULL)
;
Now, let’s see what data we have on these tables:
SELECT *
FROM account
;
SELECT *
FROM account_transaction
;
- Okay, we now have the 2 tables, account and account_transaction
- The connection between the 2 tables is the account_id, which is the Primary key (unique identifier) on the account table
To help you refer back to the above Result’s 2 Tables, you can view it in a new browser tab by clicking on here.
2.2 Let’s retrieve information from more than one table
Don’t be afraid of using a join! Yes, it’s a little more involved than what we have faced so far in this tutorial, but if you just patiently follow the rest of this lesson…all will become very clear and you will have a really powerful skill under your belt 🙂 !
In order to retrieve the data from more than one Table, you will simply need to specify:
- the columns, in the SELECT clause, that you wish to retrieve
- the table names to be joined in the FROM clause
- a WHERE clause with a join condition between the tables
SELECT trans_id, account_name, income, expense
FROM account_transaction, account
WHERE account.account_id = account_transaction.account_id
;
This explanation will take a little more following than we’ve previously experienced in this tutorial, but just be patient and all will make sense…
- The difference with this SELECT statement to all our previous ones in this tutorial is that we are now retrieving columns from more than one table. We are fetching columns from:
- account_transaction table: trans _id, income and expense
- account table: account_name
- The FROM clause has 2 tables specified, account_transaction and account, which are the tables to be joined in the WHERE clause
- The WHERE clause is the all important clause for specifying the JOIN relationship between tables
- We have to actually tell the DBMS what JOIN we want, otherwise, it won’t know. For our example the join (relationship) is the account_id for both tables
- Note that the account_id has been prefixed by each table (e.g. account.account_id and account_transaction.account_id), this tells the DBMS which table’s column you are referring to
- When we JOIN the tables, we are attempting to match every row in the first table with every row in the 2nd table using account_id
- When the query is executed, the WHERE clause only returns the data that satisfies the condition
- For our example query, above:
- The 1st row in the account_transaction table is retrieved and an attempt is made to match every row in the account table by account_id:
- We have to actually tell the DBMS what JOIN we want, otherwise, it won’t know. For our example the join (relationship) is the account_id for both tables
- As only 1 row exists in account where account_id = 1, the account_name will be returned for that row and (along with the account_transaction data that has already been retrieved) it will be displayed. So…the first row will be shown in our results as:
- Then…as for this 1st row, the same will then be performed for all the other rows in account_transaction, where it attempts to match account_id in the account table so as to get the account_name
2.3 So, what if we don’t use a JOIN?
Well, simply put, we would be getting a lot more than we bargained for! We would be actually matching, unnecessarily, every row in the account_transaction table with every row in the account table, even with account types that are not being used by transactions, as the following results show! So…ENSURE you specify a JOIN!
SELECT trans_id, account_name, account_desc, account_start_date, income, expense
FROM account_transaction trans, account acct
;
2.4 Table aliases
The 2 previous SQL queries may appear a little busy in that the WHERE clause is using the full table names before all the column names! Can we make it easier to understand? Yes 🙂 ! As we saw previously with aliasing column names in the SELECT statement 5.3.1.1 Using Aliases, we can also use aliases for tables too!
SELECT t.trans_id, a.account_name, t.income, t.expense
FROM account_transaction as t, account as a
WHERE a.account_id = t.account_id
;
- Previously, we’d coded the statement above, with table prefixes before the account_id columns in the WHERE clause, as:
SELECT trans_id, account_name, income, expense
FROM account_transaction, account
WHERE account.account_id = account_transaction.account_id
;
- This time:
- We again linked the 2 tables together by account_id:
- But instead of using the table names as prefixes to identify which account_id belongs to which table in the WHERE clause, we have coded an alias for each table in the FROM clause (t for account_transaction and a for account)
- We can then use these aliases in the WHERE clause and the SELECT clause to identify which column belongs to which table
We could call the aliases whatever we like. Me…personally, I would give the prefixes more meaningful names, such as trans for account_transaction and acct for account 🙂
2.5 Types of JOINs (and UNIONs)
You may hear the terms INNER JOIN and OUTER JOIN. The join we have been using is an INNER join, which is set implicitly, meaning that you don’t have to code JOIN in the SQL query…which I don’t! The OUTER joins (LEFT, RIGHT and FULL)…in order to use them, you do have to explicitly code them in SQL queries.
The way I have shown you how to join tables, from my experience, is the most commonly used in the development world.
I will NOT be teaching you how to use OUTER joins!
I really don’t want to confuse you with all the different types of JOINs! Why, because I have seen so many inexperienced people being confused by attempting to use the OUTER joins, when they really don’t need to! I have solely been using the INNER join (without specifying the actual word INNER in queries) for over 30 years! Oh…how I do love to keep things simple 🙂 !
You may also come across the term UNION. I have deliberately not taught this. Again, this is a beginners tutorial. If you wish to know more, then there will be sufficient material on the internet for you to learn.
2.6 More table join examples
Before we actually finish our lesson, I would just like to cement our understanding of joining tables, by showing you more examples.
Let’s start with creating another lookup table called category, along with our other 2 tables (account_transaction and account), by running the following:
CREATE TABLE account_transaction (
trans_id INT (7) PRIMARY KEY,
account_id INT (7) NOT NULL,
trans_date DATE NOT NULL,
payee VARCHAR (30) NOT NULL,
category_id INT (7),
description VARCHAR (100) NOT NULL,
income DECIMAL (7,2) NOT NULL,
expense DECIMAL (7,2) NOT NULL,
reconciled VARCHAR (1)
)
;
INSERT INTO account_transaction
(
trans_id
,account_id
,trans_date
,payee
,category_id
,description
,income
,expense
,reconciled
)
VALUES
(1, 3, '2022-01-01', 'OpeningBalance', 1, ' ', 2.00, 0.00, 'Y')
,(2, 1, '2022-01-01', 'The Gas Company', 2, ' ', 0.00, 37.50, 'Y')
,(3, 1, '2022-01-02', 'The Big Supermarket', 3, 'our shopping', 0.00, 52.39, 'Y')
,(4, 1, '2022-02-12', 'The Big Supermarket', 3, 'mom shopping', 0.00, 29.99, NULL)
,(5, 2, '2022-02-17', 'The Football Club', 4, ' ', 0.00, 30.00, NULL)
,(6, 1, '2022-02-22', 'The Big Bank', 5, ' ', 39.00, 0.00, NULL)
,(7, 1, '2022-02-28', 'The Corner Shop', 3, 'holiday shop', 0.00, 10.46, NULL)
,(8, 2, '2022-03-17', 'The Football Club', 4, ' ', 0.00, 30.00, NULL)
;
CREATE TABLE account (
account_id INT (7) PRIMARY KEY,
account_name VARCHAR (30) NOT NULL,
account_desc VARCHAR (100) NOT NULL,
account_start_date DATE NOT NULL,
account_end_date DATE
)
;
INSERT INTO account
VALUES
(1, 'VISA', 'VISA Credit Card Account', '2022-02-01', NULL)
,(2, 'Current', 'Current Account', '2022-03-01', NULL)
,(3, 'Savings', 'Savings Account', '2022-05-01', '2022-05-27')
;
CREATE TABLE category (
category_id INT (7) PRIMARY KEY,
category_name VARCHAR (30) NOT NULL,
category_desc VARCHAR (100) NOT NULL,
date_created DATE NOT NULL
)
;
INSERT INTO category
VALUES
(1, 'Opening Balance', 'An Account Opening Balance', '2022-01-01')
,(2, 'Utilities:Gas', 'Gas Utility', '2022-02-01')
,(3, 'Weekly Shop', 'Weekly Shoppin
g', '2022-01-01')
,(4, 'Misc Expenses', 'Miscellaneous Expenses', '2022-02-01')
,(5, 'Refund', 'Account Refund', '2022-04-01')
;
Now, let’s see what data we have on these tables:
SELECT *
FROM account_transaction
;
SELECT *
FROM account
;
SELECT *
FROM category
;
- So, now we have the 3 tables, account_transaction, account and category, for us to practice against
Again, to help you refer back to the above Result’s 3 Tables, you can view it in a new browser tab by clicking on here.
2.6.1 Joining 3 tables in a query
For this next example, yep…you’ve guessed it, we’re going to link 3 tables in a single query:
SELECT trans.trans_id
, acct.account_name
, acct.account_desc
, acct.account_start_date
, trans.income
, trans.expense
, cat.category_name
FROM account_transaction trans
, account acct
, category cat
WHERE acct.account_id = trans.account_id
AND cat.category_id = trans.category_id
ORDER BY trans.trans_id
;
- We have successfully brought back data from all 3 tables
- The WHERE clause and the AND clause are instructing the DBMS to join account_transaction to both the account table and the category table (via account_id and category_id respectively), so that data from them both can be retrieved
- The Expected Result shows the data retrieved from the account_transaction table (trans_id, income and expense), along with the related data from account (account_name, account_desc and account_start_date) and from category (category_name)
2.6.2 Using other conditions, with JOINs, in the WHERE clause
SELECT trans.trans_id
, acct.account_name
FROM account_transaction trans
, account acct
WHERE trans.trans_id = 8
AND acct.account_id = trans.account_id
;
- Now, we have combined the JOIN with another condition
- The WHERE trans.trans_id = 8 is telling the DBMS to only return an account_transaction row where its trans_id = 8
- The AND clause is telling the DBMS to match any returned account_transaction rows (in this case row 8) with the account table via account_id. When the match is performed, the account_name can then be returned
2.6.3 JOINs and multiple conditions
It is important to know that any number of conditions can be set for any table. The list is endless, but just to give you a flavour of things…
SELECT trans.trans_id
, acct.account_name
FROM account_transaction trans
, account acct
WHERE trans.trans_id IN (3, 4, 5)
AND acct.account_id = trans.account_id
AND acct.account_name <> 'Current'
;
- We are using 2 tables account_transaction and account
- In the WHERE clause, we are retrieving only transactions where trans_id equals 3, 4 or 5
- The 1st AND clause tells the DBMS that we want to join the rows we retrieved with the WHERE clause (trans_id 3, 4 and 5) with the account table, using account_id, however:
- The 2nd AND clause is saying that if the account table’s account_name is equal to “Current”, then no row will be displayed for “trans_id = 5” in the Expected Results
- The 2nd AND clause is saying that if the account table’s account_name is equal to “Current”, then no row will be displayed for “trans_id = 5” in the Expected Results
The end of joins!
So, this brings us to the end of this brief, albeit intense, lesson of learning about table joins!
To get the most out of your learning, you will need to practice yourself and it really will only be experience that will teach you what you can and can’t do when combining conditions.
3. Summary
Gosh…that was quite a lesson. What you have learned here will really stand you in good stead! As mentioned before, SQL joins are a very important operation for Test Analysts to be able to query databases effectively.
In this lesson you will have learned how:
- Relational databases and tables work at a basic level
- To join multiple tables
- Table aliases work
- There are different types of JOINs
- An INNER JOIN works implicitly
And…some of the common SQL Keywords:
- INNER
- OUTER
- JOIN
- AS
4. Test Yourself
Again, I have kept the questions to a minimum in this test, however, please still take time to complete them and understand the answers.
Important to do well…
As with the other tests, once complete check your answers, if you do not score that well (let’s say not getting at least 6 correct), please either go back over the tutorial’s section again, perhaps making more notes as you go, or re-do the test to ensure you have a good understanding.
It is always important that you understand the section that you have read / learned before you move on 🙂 !
Click here to Test Yourself.
5. What next?
We have learned a great deal throughout this tutorial in a really short period of time! Will you have retained it all? NO…I shouldn’t imagine so!
Again, the trick really is to practice, practice and practice some more.
So, we have looked at the SQL that you, as a Test Analyst, should start with! Is there more you can learn to make you an even better SQL practitioner…yes! However, this tutorial wasn’t about that, it was about giving you enough knowledge to be a Professional Test Analyst and this really is a great start!
Feeling brave?
Are you feeling brave now that you have mastered the SQL basics? Then you could try to download MySql and create your own database and practice a little more. This would show you how we really connect to a database and run SQL in a DBMS. To show you how to do this, I will shortly be publishing a tutorial…please watch this space by Subscribing 🙂 …
Prev lesson ⇒ 5. More Data Retrieval