SQL Queries and their Order of Operation.
Understanding SQL order of operation is critical in working with SQL. The syntactical order of operations (clauses) doesn’t always correspond to the logical order of operation or order of execution. However, sometimes the syntactical order of operations may correspond to the logical operations. In this post, we are going to discuss the order of operations of any SQL query and along the way, we will see query examples and pseudocodes.
- FROM, including JOINs:
- WHERE
- GROUP BY
- AGGREGATION
- HAVING
- WINDOW
- SELECT
- DISTINCT
- UNION/UNION ALL
- ORDER BY
- LIMIT/OFFSET/FETCH
SQL Order of Operations
- FROM, including JOINs
This is the first thing that happens logically during query execution. Before we pull the rows or columns of a table, we first locate the table or tables. The JOIN clause comes subsequently to the FROM clause. Actually, JOINS is a relational algebra operator, just like + and — operators in arithmetics. FROM and JOIN are first executed to determine the total working set of data that is being queried. Under the hood, it creates a temporary table containing all the rows and columns of the table that is being joined.
Show the salary of the employees for all of the departments in the school?
SELECT d.name, e.salary
FROM Department AS d
JOIN Employee AS e
ON d.department_id = e.department_id;
2. WHERE
Once we load all the rows from the tables above, we can now use the WHERE clause to filter some of the rows based on some logical condition.
The following query will filter out employee salaries greater than 100000 from the department table.
SELECT Employee, salary
FROM Department
WHERE total > 100000;
3. GROUP BY
In the above step, we used the WHERE clause to filter out some of the rows. Then, if you want, you can take the remaining rows and put them in groups or buckets, where each group contains the same value.
The following query will show the number of employees in each department. We use an aggregate function COUNT and a GROUP BY clause. Don’t use a GROUP BY without an aggregate function.
SELECT DeptName, COUNT(*) AS total
FROM Department
GROUP BY DeptName
4. AGGREGATION
After grouping the records (rows), we apply aggregation functions. You can’t put an aggregation function in the WHERE clause, because the WHERE clause logically happens before the aggregation step. The aggregation function accesses the rows that have been grouped into buckets. Some aggregation functions include but are not limited to AVG, COUNT, MIN, MAX, SUM. There are other aggregation functions depending on the RDBMS of choice.
A query that displays the name of the department and the average salary in that department.
SELECT d.name, AVG(e.salary)
FROM Department AS d
JOIN Employee AS e
ON d.department_id = e.department_id
GROUP BY d.name;
5. HAVING
After aggregating the rows, we use the HAVING clause to filter out the aggregated values. We can also use HAVING to filter the results produced by GROUP BY. The expressions used in the HAVING clause should either appear in the GROUP BY clause or must be the result of an aggregate function. Let’s see the following examples to demonstrate the usage in each case.
- Aggregation function in the SELECT list:
SELECT DeptName, SUM(salary) AS total
FROM Department
GROUP BY DeptName
HAVING total > 15000;
- Aggregation function in the HAVING list:
SELECT DeptName
FROM Department
GROUP BY DeptName
HAVING SUM(salary) > 15000;
- Aggregation function in the SELECT list and HAVING clause:
SELECT DeptName, COUNT(*)
FROM Department
GROUP BY DeptName
HAVING SUM(salary) > 1500;
6. WINDOW
Windows function is one of the coolest features of SQL. After calculating the aggregate functions logically, we can nest the output in WINDOW functions.
For instance, we can write a query that shows how to calculate the top 3 salaries for each department:
SELECT name, salary
FROM (
SELECT d.name, e.salary, DENSE_RANK() OVER(PARTITION BY d.name ORDER BY salary DESC) AS rank
FROM Department AS d
JOIN Employee AS e
ON d.department_id = e.department_id) AS a
WHERE a.rank <=3;
7. SELECT
After doing some grouping, aggregation, and filtering, we use the SELECT clause to select. A SELECT list will show the columns that the query will return.
SELECT name, department_id, salary
FROM Employees
SELECT * will show all columns from the query.
SELECT *
FROM Employees
8. DISTINCT
Distinct happens after SELECT even though it appears before the SELECT column list. If you think about it, it makes sense. Syntactically, it appears before the list, but the order of execution happens after you SELECT the columns. Therefore, after you SELECT your DISTINCT column, if any of the remaining columns have duplicate values, those values will be discarded.
SELECT DISTINCT column1, column2, ...
FROM table_name;
In the above query, we care for the unique values of column 1, hence, we apply the DISTINCT statement.
9. UNION/UNION ALL
A union is an operator that connects two subqueries. It combines the records from two tables row-wise. It is important that the two tables have the same column number and columns else the union will not take place. If you use UNION instead of UNION ALL, duplicates will be discarded.
SELECT column_1, column_2, column_3 ... FROM table1
UNION ALL
SELECT column_1, column_2, column_3 ... FROM table2;
When DISTINCT and UNION are used together, first the union is computed and then the DISTINCT is computed.
10. ORDER BY
If a query statement is specified by the ORDER BY clause, the rows will be sorted either in ascending or descending order. By default, ORDER BY sorts the values in ascending order.
SELECT col_1, col_2
FROM table_1
Order by col_1 ASC, col_2 DESC
11. LIMIT, FETCH and OFFSET
Finally, we limit the output of our query with LIMIT, FETCH and OFFSET. If the rows fall outside the range of the specified by LIMIT, FETCH, and OFFSET, then it is discarded.
SELECT DeptName, SUM(salary) AS total
FROM Department
LIMIT 10
Stitching it all together we can have a complete query that looks like this:
SELECT DISTINCT column, AGG_FUNC(column_or_expression), ...
FROM mytable
JOIN another_table
ON mytable.column = another_table.columnWHERE constraint_expression
GROUP BY column
HAVING constraint_expression
ORDER BY column ASC/DESC
LIMIT count OFFSET COUNT;
CONCLUSION
Not every query needs all of the above clauses or operations. Oftentimes, we only use a subset of the clauses or operations. SQL is very flexible as it allows developers and analysts to quickly manipulate data. In this post, we have discussed a brief description of the order of operation and examples of the clauses. We also saw examples and syntaxes of the SQL clauses. I hope this will help you understand how SQL works under the hood as orders of operation are crucial when writing a query.