Window functions (also known as analytic functions) are a powerful SQL feature – and a favorite topic in many technical interviews, especially for data roles. Why? Because they enable you to perform calculations across rows of a result set without collapsing them into groups. This allows for more advanced analytics like running totals, rankings, moving averages, and comparing row values with others. Interviewers might ask about window functions to see if you can handle these advanced SQL concepts.
Let’s break down what you need to know, with common questions and examples focused on window functions:
A window function performs a calculation across a set of table rows that are somehow related to the current row. Unlike GROUP BY aggregates, window functions do not merge the rows into a single output – they produce a value for each original row, possibly based on a group of rows (the “window” of rows) around itsimplilearn.com.
In simpler terms, a window function can give you aggregated or ranked information while still retaining the detail of each row. For example, you can use a window function to add a column that shows the average value of all rows up to the current one, or the rank of the current row compared to others, and so on.
Key components of window functions:
Common window functions include:
Why interviewers ask: Window functions are considered an advanced SQL topic. By asking about them, interviewers identify candidates who can perform complex analytics in SQL – something particularly important for data analyst and data engineer roles. It’s also a way to gauge your familiarity with SQL beyond basic SELECTs. They might ask conceptual questions (“What’s the difference between RANK and DENSE_RANK?”) or give you a problem best solved with a window function (“Find the running total of sales by date”).
1. Ranking and ROW_NUMBER(): “How would you retrieve the top 3 salespeople by sales amount in each region from a sales table?”
This question is about ranking within partitions (each region) and then filtering by rank. You’d use ROW_NUMBER()
(or RANK/DENSE_RANK) in a subquery or CTE, then filter.
Solution outline: Use a window function partitioned by region, ordered by sales desc to assign a rank, then select rank 1–3.
WITH SalesRank AS (
SELECT
salesperson, region, total_sales,
RANK() OVER (PARTITION BY region ORDER BY total_sales DESC) AS sales_rank
FROM SalesSummary
)
SELECT region, salesperson, total_sales
FROM SalesRank
WHERE sales_rank <= 3;
This will give the top 3 salespeople in each region by total sales. We used RANK()
abovedatalemur.com – note: if there’s a tie, RANK will give ties the same rank and might yield more than 3 per region (skipping to 4th rank appropriately). If the interviewer expects exactly 3 per region even with ties, they might prefer ROW_NUMBER()
(which breaks ties arbitrarily) or DENSE_RANK()
(which like RANK, but without gaps). It’s a great chance to clarify requirements and show you know the differencesdatalemur.com. You could explain: “If ties are not an issue or we want exactly 3, I’d use ROW_NUMBER(). If we consider ties and don’t want to break them arbitrarily, RANK() or DENSE_RANK() could be used – but then we might end up with more than 3 if there’s a tie at 3rd place.”
Follow-up: “What’s the difference between RANK() and DENSE_RANK()?” This is frequently asked. Answer: RANK() will skip a rank if there’s a tie (e.g., 1, 2, 2, 4...), whereas DENSE_RANK() does not skip (1, 2, 2, 3...)datalemur.com. Provide a quick example to demonstrate you truly get it (e.g., “If two employees tie for highest salary, RANK gives them both rank 1 and the next gets rank 3; DENSE_RANK would give the next rank 2”).
2. Running total / cumulative sum: “Given a table of daily sales, how would you compute a running total (cumulative sales) by date?”
This is a classic use of a window with ORDER BY
and a frame. A solution using SUM as a window function:
SELECT
sale_date,
SUM(amount) OVER (
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_sales
FROM DailySales
ORDER BY sale_date;
This assumes one row per day in DailySales
(if multiple sales per day, you might need PARTITION BY sale_date to sum that day, or better, aggregate first then do running sum). The frame ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
means “from the start up to the current row” which yields a running totalcodesignal.comcodesignal.com. Many databases default to this frame for cumulative sums if ORDER BY is given and no frame specified, so you might omit it depending on SQL dialect (e.g., in SQL Server or Postgres, SUM(amount) OVER (ORDER BY sale_date)
is enough for a running total).
Interviewers ask this to see if you know window frames. If you forget frame syntax, you can conceptually say “sum from the beginning to current row” – they’ll likely accept that phrasing, but it’s even better if you know the exact syntax.
3. Using LAG/LEAD: “How can you compare each row to the previous row in SQL – for example, to find month-over-month growth in sales?”
Here they’re looking for the LAG() function (or LEAD for next row). LAG allows you to access the prior row’s value in the current row’s context, given an ordering. Example solution:
SELECT
month,
sales,
LAG(sales, 1) OVER (ORDER BY month) AS prev_month_sales,
(sales - LAG(sales, 1) OVER (ORDER BY month)) / NULLIF(LAG(sales, 1) OVER (ORDER BY month), 0)
AS sales_growth_rate
FROM MonthlySales;
We partition by nothing (assuming the table is for one product/overall; if it was by product, we’d PARTITION BY product). The LAG(sales, 1)
gives last month’s salesalmabetter.com for the current month row. We then compute growth rate. (Used NULLIF
to avoid division by zero, and it will yield NULL growth if previous is 0).
Interviewer might then ask: “What is the difference between LAG and LEAD?” – Easy: LEAD gives the next row’s value instead of previous. Also mention both can take an offset (lag 2 = two rows behind) and a default value if there is no previous/next (for first or last row edge cases).
4. FIRST_VALUE()/LAST_VALUE(): “In each group of records, how do you retrieve the first and last entry according to a certain ordering?”
This tests understanding of window frames. Example: “List each employee and the date of their first and last order.” Using FIRST_VALUE and LAST_VALUE over partition by employee, ordered by order_date:
SELECT
employee_id,
FIRST_VALUE(order_date) OVER (
PARTITION BY employee_id
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS first_order_date,
LAST_VALUE(order_date) OVER (
PARTITION BY employee_id
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS last_order_date
FROM Orders;
Why the weird frame? By default, without specifying, FIRST_VALUE
would give the first value from start to current row (so actually it would act like a running minimum), and LAST_VALUE
would give last value from start to current row (basically just the current row in default frame!). To get the absolute first or last within the partition, you specify the frame as unbounded preceding to unbounded following (the whole partition)codesignal.comcodesignal.com. This ensures every row for that employee shows the same first and last date (the true first and last). Alternatively, one could use a subquery or combination of MIN()/MAX() since first/last in a group are essentially min/max by date. But the interviewer probably wants to see if you know