You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: advanced_sql_class_notes.md
+91-1Lines changed: 91 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -567,7 +567,7 @@ GROUP BY 1,2,3,4,5,6,7,8
567
567
```
568
568
569
569
570
-
## 4.5 Self Joins
570
+
## 4.5A Self Joins
571
571
572
572
We can join a table to itself by invoking it twice with two aliases. This can be useful, for example, to look up the previous day's order quantity (if any) for a given `CUSTOMER_ID` and `PRODUCT_ID`:
573
573
@@ -610,6 +610,96 @@ QUANTITY,
610
610
FROM CUSTOMER_ORDER c1
611
611
```
612
612
613
+
614
+
## 4.5B Recursive Self Joins
615
+
616
+
At some point of your career, you may encounter a table that is inherently designed to be self-joined. For instance, run this query:
617
+
618
+
```sql
619
+
SELECT * FROM EMPLOYEE
620
+
```
621
+
622
+
This is a table containing employee information, including their manager via a `MANAGER_ID` field. Here is a sample of the results below.
623
+
624
+
| ID | FIRST_NAME | LAST_NAME | TITLE | DEPARTMENT | MANAGER_ID |
This `MANAGER_ID` points to another `EMPLOYEE` record. If you want to bring in Daniel and his superior's information, this isn't hard to do with a self join.
But what if you wanted to display the entire hierarchy above Daniel? Well shoot, this is hard because now I have to do several self joins to daisy-chain my way to the top. What makes this even harder is I don't know how many self joins I will need to do. For cases like this, it can be helpful to leverage recursive queries.
657
+
658
+
A recursion is a special type of common table expression (CTE). Typically, you "seed" a starting value and then use `UNION`or`UNION ALL` to append the results of a query that uses each "seed", and the result becomes the next seed.
659
+
660
+
In this case, we will use a `RECURSIVE` common table expression to seed Daniel's ID, and then append each `MANAGER_ID` of each `EMPLOYEE_ID` that matches the seed. This will give a set of ID's for employees hierarchical to Daniel. We can then use these ID's to navigate Daniel's hierarchy via JOINS, IN, or other SQL operators.
661
+
662
+
```sql
663
+
-- generates a list of employee ID's hierarchical to Ashlin
664
+
665
+
WITH RECURSIVE hierarchy_of_daniel(x) AS (
666
+
SELECT 21 -- start with Daniel's ID
667
+
UNION ALL -- append each manager ID recursively
668
+
SELECT MANAGER_ID
669
+
FROM hierarchy_of_daniel INNER JOIN EMPLOYEE
670
+
WHERE EMPLOYEE.ID = hierarchy_of_daniel.x -- employee ID must equal previous recursion
671
+
)
672
+
673
+
SELECT * FROM EMPLOYEE
674
+
WHERE ID IN hierarchy_of_daniel;
675
+
```
676
+
677
+
Recursive queries are a bit tricky to get right, but practice them if you have tables structured like this. Note they also can be used to improvise a set of consecutive values without creating a table. For instance, we can generate a set of consecutive integers. Here is how you create a set of integers from1 to 1000.
678
+
679
+
```sql
680
+
WITH RECURSIVE my_integers(x) AS (
681
+
SELECT 1
682
+
UNION ALL
683
+
SELECT x + 1
684
+
FROM my_integers
685
+
WHERE x < 1000
686
+
)
687
+
SELECT * FROM my_integers
688
+
```
689
+
690
+
You can apply the same concept to generate a set of chronological dates. This recursive query will generate all dates from today to '2030-12-31':
691
+
692
+
```sql
693
+
WITH RECURSIVE my_dates(x) AS (
694
+
SELECT date('now')
695
+
UNION ALL
696
+
SELECT date(x, '+1 day')
697
+
FROM my_dates
698
+
WHERE x < '2030-12-31'
699
+
)
700
+
SELECT * FROM my_dates
701
+
```
702
+
613
703
## 4.6 Cross Joins
614
704
615
705
Sometimes it can be helpful to generate a "cartesian product", or every possible combination between two or more data sets using a CROSS JOIN. This is often done to generate a data set that fills in gaps for another query. Not every calendar date has orders, nor does every order date have an entry for every product, as shown in this query:
0 commit comments