-
Notifications
You must be signed in to change notification settings - Fork 1
SQL Windows Functions QNA
Ans:
1. Rank Employees by Sales within Each Department:
- Question: Rank employees based on their sales performance within each department.
-
Query:
SELECT employee_id, employee_name, department_id, sales, RANK() OVER (PARTITION BY department_id ORDER BY sales DESC) AS sales_rank FROM employees;
2. Calculate Running Total of Sales:
- Question: Calculate the running total of sales over time.
-
Query:
SELECT sale_date, sales, SUM(sales) OVER (ORDER BY sale_date) AS running_total FROM sales;
3. Determine Employees with Above-Average Salary:
- Question: Identify employees whose salary is above the average salary.
- Query:
WITH SalaryStats AS (
SELECT
employee_id,
employee_name,
salary,
AVG(salary) OVER () AS avg_salary
FROM employees
)
SELECT
employee_id,
employee_name,
salary
FROM SalaryStats
WHERE salary > avg_salary;
Note: Using AVG(salary) without an OVER() clause would compute one single aggregated value for all rows, which means you would lose the individual employee details. The window function (AVG(salary) OVER ()) calculates the overall average but still retains each row’s data, allowing you to compare each employee’s salary against that overall average.
Alternative simpler query:
SELECT
employee_id,
employee_name,
salary
FROM employees
WHERE salary > (SELECT avg(salary) from employees)
** Q3.1: What's the difference between an empty OVER() clause and specifying a column with ORDER BY inside the OVER() clause in the window function?
Let's consider a simple table named employees
with the following data:
employee_id | salary |
---|---|
1 | 100 |
2 | 200 |
3 | 300 |
When you use the aggregate function with an empty OVER()
clause, it computes the aggregate over all rows for each row. For example:
SELECT
employee_id,
salary,
AVG(salary) OVER () AS overall_avg
FROM employees;
Result:
employee_id | salary | overall_avg |
---|---|---|
1 | 100 | 200 |
2 | 200 | 200 |
3 | 300 | 200 |
Explanation:
- The overall average salary is calculated as (100 + 200 + 300) / 3 = 200.
- This same value (200) is shown for every row.
When you add an ORDER BY
clause without specifying a window frame, the default frame is from the start of the partition to the current row. This creates a running or cumulative average:
SELECT
employee_id,
salary,
AVG(salary) OVER (ORDER BY employee_id) AS running_avg
FROM employees;
Result:
employee_id | salary | running_avg |
---|---|---|
1 | 100 | 100 |
2 | 200 | 150 |
3 | 300 | 200 |
Explanation:
- For employee 1: The running average is just 100 (only one row: 100).
- For employee 2: The running average is (100 + 200) / 2 = 150.
- For employee 3: The running average is (100 + 200 + 300) / 3 = 200.
-
Empty
OVER()
Clause:- Calculates the aggregate (average) across the entire dataset for each row.
- Every row displays the same overall average (in this case, 200).
-
OVER (ORDER BY ...)
Clause:- Computes a cumulative (running) aggregate by default, from the beginning of the ordered set up to the current row.
- The average can change with each row as more rows are included in the calculation.
If you want to include an ORDER BY
for some specific ordering but still calculate the overall average for every row, you need to explicitly define the frame like this:
SELECT
employee_id,
salary,
AVG(salary) OVER (
ORDER BY employee_id
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS overall_avg
FROM employees;
This will force the calculation to consider all rows, yielding the overall average (200) for every row regardless of the ordering.
** Q3.2: What will be the impact of adding PARTITION BY inside OVER Clause for above query or in general?
Consider a table with unique employee_id
values (i.e., employee_id
is the primary key):
employee_id | salary |
---|---|
1 | 100 |
2 | 200 |
3 | 300 |
SELECT
employee_id,
salary,
AVG(salary) OVER (ORDER BY employee_id) AS running_avg
FROM employees;
How It Works:
- Running Average: Since there's no partitioning, the window function computes a cumulative average over the entire ordered result set.
Result:
employee_id | salary | running_avg |
---|---|---|
1 | 100 | 100 |
2 | 200 | 150 |
3 | 300 | 200 |
SELECT
employee_id,
salary,
AVG(salary) OVER (PARTITION BY employee_id ORDER BY employee_id) AS running_avg
FROM employees;
How It Works:
-
Partitioning Effect: Because
employee_id
is unique, each partition contains exactly one row. - Running Average: In each partition, the average is computed over that single row. Thus, the running average equals the salary itself.
Result:
employee_id | salary | running_avg |
---|---|---|
1 | 100 | 100 |
2 | 200 | 200 |
3 | 300 | 300 |
-
Without
PARTITION BY
:- The running average is cumulative over the entire dataset.
- The average grows as more rows are included.
-
With
PARTITION BY employee_id
:- Each row is in its own partition (since
employee_id
is unique). - The running average for each row is simply the row’s salary.
- Each row is in its own partition (since
This demonstrates that when employee_id
is a primary key, adding PARTITION BY employee_id
effectively resets the window for each row, leading to a running average equal to the individual salary rather than a cumulative average.
Question 4:Find the percentage of total revenue contributed by each product.
SELECT
product_id,
product_name,
revenue,
revenue / SUM(revenue) OVER () * 100 AS revenue_percentage
FROM products;
Method 2: Alternatively, you can use a correlated subquery directly in the SELECT list:
SELECT
product_id,
product_name,
revenue,
revenue / (SELECT SUM(revenue) FROM products) * 100 AS revenue_percentage
FROM products;
Question 5: Retrieve employees with the highest salary within each department.
SELECT
employee_id,
employee_name,
department_id,
salary,
ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS salary_rank
FROM employees
WHERE salary_rank = 1;
Question 6: Calculate the 3-month moving average of sales for each month, assuming sale_date has sales on month level.
SELECT
sale_date,
sales,
AVG(sales) OVER (ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg_sales
FROM sales;
Question 7: Retrieve the top N products based on their sales.
SELECT
product_id,
product_name,
sales,
DENSE_RANK() OVER (ORDER BY sales DESC) AS sales_rank
FROM products
WHERE sales_rank <= 5;
Question 8: Determine the percentile rank of each employee's salary.**
SELECT
employee_id,
employee_name,
salary,
PERCENT_RANK() OVER (ORDER BY salary) AS salary_percentile
FROM employees;
Question 9: Find employees with the lowest sales in each department.**
SELECT
employee_id,
employee_name,
department_id,
sales,
DENSE_RANK() OVER (PARTITION BY department_id ORDER BY sales ASC) AS sales_rank
FROM employees
WHERE sales_rank = 1;
Question 10: Calculate the 3-month moving sum of revenue for each product.**
SELECT
product_id,
product_name,
revenue,
SUM(revenue) OVER (ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_sum_revenue
FROM products;
These questions and queries demonstrate the versatility of window functions in SQL, covering scenarios such as ranking, running totals, percentages, moving averages, and more. Adjust the queries based on your specific database schema and requirements.