Skip to content

SQL Windows Functions QNA

Praveen Kumar Anwla edited this page Feb 13, 2025 · 11 revisions

Q1: Top 5 real-time questions along with their SQL queries for Windows functions in SQL.

Ans:

1. Rank Employees by Sales within Each Department:

  • Question: Rank employees based on their sales performance within each department.
  • Query:
    SELECT
        employee_id,
        employee_name,
        department_id,
        sales,
        RANK() OVER (PARTITION BY department_id ORDER BY sales DESC) AS sales_rank
    FROM employees;

2. Calculate Running Total of Sales:

  • Question: Calculate the running total of sales over time.
  • Query:
    SELECT
        sale_date,
        sales,
        SUM(sales) OVER (ORDER BY sale_date) AS running_total
    FROM sales;

3. Determine Employees with Above-Average Salary:

  • Question: Identify employees whose salary is above the average salary.
  • Query:
WITH SalaryStats AS (
    SELECT
        employee_id,
        employee_name,
        salary,
        AVG(salary) OVER () AS avg_salary
    FROM employees
)
SELECT
    employee_id,
    employee_name,
    salary
FROM SalaryStats
WHERE salary > avg_salary;

Note: Using AVG(salary) without an OVER() clause would compute one single aggregated value for all rows, which means you would lose the individual employee details. The window function (AVG(salary) OVER ()) calculates the overall average but still retains each row’s data, allowing you to compare each employee’s salary against that overall average.

Alternative simpler query:

SELECT
    employee_id,
    employee_name,
    salary
FROM employees
WHERE salary > (SELECT avg(salary) from employees)

** Q3.1: What's the difference between an empty OVER() clause and specifying a column with ORDER BY inside the OVER() clause in the window function?

Understanding Window Functions: Overall vs. Running Average

Let's consider a simple table named employees with the following data:

employee_id salary
1 100
2 200
3 300

Example 1: Overall Average (Empty OVER() Clause)

When you use the aggregate function with an empty OVER() clause, it computes the aggregate over all rows for each row. For example:

SELECT
    employee_id,
    salary,
    AVG(salary) OVER () AS overall_avg
FROM employees;

Result:

employee_id salary overall_avg
1 100 200
2 200 200
3 300 200

Explanation:

  • The overall average salary is calculated as (100 + 200 + 300) / 3 = 200.
  • This same value (200) is shown for every row.

Example 2: Running (Cumulative) Average (With ORDER BY in the OVER Clause)

When you add an ORDER BY clause without specifying a window frame, the default frame is from the start of the partition to the current row. This creates a running or cumulative average:

SELECT
    employee_id,
    salary,
    AVG(salary) OVER (ORDER BY employee_id) AS running_avg
FROM employees;

Result:

employee_id salary running_avg
1 100 100
2 200 150
3 300 200

Explanation:

  • For employee 1: The running average is just 100 (only one row: 100).
  • For employee 2: The running average is (100 + 200) / 2 = 150.
  • For employee 3: The running average is (100 + 200 + 300) / 3 = 200.

Summary

  • Empty OVER() Clause:

    • Calculates the aggregate (average) across the entire dataset for each row.
    • Every row displays the same overall average (in this case, 200).
  • OVER (ORDER BY ...) Clause:

    • Computes a cumulative (running) aggregate by default, from the beginning of the ordered set up to the current row.
    • The average can change with each row as more rows are included in the calculation.

If you want to include an ORDER BY for some specific ordering but still calculate the overall average for every row, you need to explicitly define the frame like this:

SELECT
    employee_id,
    salary,
    AVG(salary) OVER (
        ORDER BY employee_id
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS overall_avg
FROM employees;

This will force the calculation to consider all rows, yielding the overall average (200) for every row regardless of the ordering.

** Q3.2: What will be the impact of adding PARTITION BY inside OVER Clause for above query or in general?

Impact of Adding PARTITION BY When employee_id Is a Primary Key

Consider a table with unique employee_id values (i.e., employee_id is the primary key):

employee_id salary
1 100
2 200
3 300

Query Without PARTITION BY

SELECT 
    employee_id,
    salary,
    AVG(salary) OVER (ORDER BY employee_id) AS running_avg
FROM employees;

How It Works:

  • Running Average: Since there's no partitioning, the window function computes a cumulative average over the entire ordered result set.

Result:

employee_id salary running_avg
1 100 100
2 200 150
3 300 200

Query With PARTITION BY employee_id

SELECT 
    employee_id,
    salary,
    AVG(salary) OVER (PARTITION BY employee_id ORDER BY employee_id) AS running_avg
FROM employees;

How It Works:

  • Partitioning Effect: Because employee_id is unique, each partition contains exactly one row.
  • Running Average: In each partition, the average is computed over that single row. Thus, the running average equals the salary itself.

Result:

employee_id salary running_avg
1 100 100
2 200 200
3 300 300

Summary

  • Without PARTITION BY:

    • The running average is cumulative over the entire dataset.
    • The average grows as more rows are included.
  • With PARTITION BY employee_id:

    • Each row is in its own partition (since employee_id is unique).
    • The running average for each row is simply the row’s salary.

This demonstrates that when employee_id is a primary key, adding PARTITION BY employee_id effectively resets the window for each row, leading to a running average equal to the individual salary rather than a cumulative average.

Question 4:Find the percentage of total revenue contributed by each product.

     SELECT
         product_id,
         product_name,
         revenue,
         revenue / SUM(revenue) OVER () * 100 AS revenue_percentage
     FROM products;

Method 2: Alternatively, you can use a correlated subquery directly in the SELECT list:

SELECT
    product_id,
    product_name,
    revenue,
    revenue / (SELECT SUM(revenue) FROM products) * 100 AS revenue_percentage
FROM products;

Question 5: Retrieve employees with the highest salary within each department.

     SELECT
         employee_id,
         employee_name,
         department_id,
         salary,
         ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS salary_rank
     FROM employees
     WHERE salary_rank = 1;

Question 6: Calculate the 3-month moving average of sales for each month, assuming sale_date has sales on month level.

     SELECT
         sale_date,
         sales,
         AVG(sales) OVER (ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg_sales
     FROM sales;

Question 7: Retrieve the top N products based on their sales.

     SELECT
         product_id,
         product_name,
         sales,
         DENSE_RANK() OVER (ORDER BY sales DESC) AS sales_rank
     FROM products
     WHERE sales_rank <= 5;

Question 8: Determine the percentile rank of each employee's salary.**

     SELECT
         employee_id,
         employee_name,
         salary,
         PERCENT_RANK() OVER (ORDER BY salary) AS salary_percentile
     FROM employees;

Question 9: Find employees with the lowest sales in each department.**

     SELECT
         employee_id,
         employee_name,
         department_id,
         sales,
         DENSE_RANK() OVER (PARTITION BY department_id ORDER BY sales ASC) AS sales_rank
     FROM employees
     WHERE sales_rank = 1;

Question 10: Calculate the 3-month moving sum of revenue for each product.**

     SELECT
         product_id,
         product_name,
         revenue,
         SUM(revenue) OVER (ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_sum_revenue
     FROM products;

These questions and queries demonstrate the versatility of window functions in SQL, covering scenarios such as ranking, running totals, percentages, moving averages, and more. Adjust the queries based on your specific database schema and requirements.

Clone this wiki locally