Episode - SQL Deep Dive — Everything Interviews Ask You

Shreyesh kolhe

June 23, 2026 June 23, 2026

Episode - SQL Deep Dive — Everything Interviews Ask You

Hey everyone! Welcome back. Today we go deep into SQL — the way interviewers actually test it. Not just syntax, but HOW things work under the hood, edge cases, and the tricky questions that separate beginners from seniors!

What we will cover:

SQL Basics Refresher — Tables, Keys, Constraints
All Types of JOINs — With Visual Diagrams
Indexing — What it is, How it Works, When to Use
WHERE vs HAVING — The Most Confused Pair
UNION vs UNION ALL
Subqueries and Correlated Subqueries
Window Functions — ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD
GROUP BY Deep Dive
Transactions and ACID Properties
Normalization (1NF, 2NF, 3NF)
Stored Procedures vs Functions
Interview Questions

Sample Tables — Used Throughout This Blog

We'll use these tables for all examples:

── employees ──────────────────────────────────────────
 emp_id | name       | dept_id | salary | manager_id
────────────────────────────────────────────────────────
  1     | Shreyesh   |   10    | 75000  |    NULL
  2     | Arjun      |   10    | 60000  |    1
  3     | Priya      |   20    | 80000  |    1
  4     | Zara       |   20    | 55000  |    3
  5     | Rahul      |   30    | 90000  |    NULL
  6     | Nisha      |   NULL  | 45000  |    5

── departments ────────────────────────────────────────
 dept_id | dept_name  | location
─────────────────────────────────
   10    | Engineering | Bangalore
   20    | Marketing   | Mumbai
   30    | Finance     | Delhi
   40    | HR          | Chennai   ← No employees here!

── orders ─────────────────────────────────────────────
 order_id | emp_id | amount  | order_date
──────────────────────────────────────────
    1     |   1    | 5000.00 | 2024-01-15
    2     |   2    | 3200.00 | 2024-01-20
    3     |   1    | 7500.00 | 2024-02-10
    4     |   3    | 1200.00 | 2024-02-15
    5     |   7    | 9900.00 | 2024-03-01   ← emp_id 7 doesn't exist!

1. All Types of JOINs

A JOIN combines rows from two or more tables based on a related column.

VISUAL DIAGRAM — All JOINs:
=============================

   Table A           Table B
  ┌───────┐         ┌───────┐
  │ A ∩ B │         │ A ∩ B │
  │  only │ A ∩ B   │  only │
  └───────┘         └───────┘

INNER JOIN     → only matching rows (A ∩ B)
LEFT JOIN      → all of A + matching B (NULL if no match)
RIGHT JOIN     → all of B + matching A (NULL if no match)
FULL OUTER JOIN → all of A + all of B (NULL where no match)
CROSS JOIN     → every row of A × every row of B (cartesian product)
SELF JOIN      → table joined with itself

INNER JOIN

Returns ONLY rows where there is a match in BOTH tables.
Rows with no match on either side are EXCLUDED.

SELECT e.name, d.dept_name
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id;

RESULT:
 name      | dept_name
────────────────────────
 Shreyesh  | Engineering
 Arjun     | Engineering
 Priya     | Marketing
 Zara      | Marketing
 Rahul     | Finance

Nisha is EXCLUDED → dept_id is NULL, no match in departments
HR department is EXCLUDED → no employees in dept 40

LEFT JOIN (LEFT OUTER JOIN)

Returns ALL rows from the LEFT table.
If no match in right table → NULLs for right table columns.

SELECT e.name, d.dept_name
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.dept_id;

RESULT:
 name      | dept_name
────────────────────────
 Shreyesh  | Engineering
 Arjun     | Engineering
 Priya     | Marketing
 Zara      | Marketing
 Rahul     | Finance
 Nisha     | NULL          ← Nisha included! dept_name is NULL

Nisha IS included (left table = employees)
HR department still NOT shown (right table = departments)

RIGHT JOIN (RIGHT OUTER JOIN)

Returns ALL rows from the RIGHT table.
If no match in left table → NULLs for left table columns.

SELECT e.name, d.dept_name
FROM employees e
RIGHT JOIN departments d ON e.dept_id = d.dept_id;

RESULT:
 name      | dept_name
────────────────────────
 Shreyesh  | Engineering
 Arjun     | Engineering
 Priya     | Marketing
 Zara      | Marketing
 Rahul     | Finance
 NULL      | HR            ← HR included! name is NULL

HR IS included (right table = departments)
Nisha NOT shown (left table = employees, no dept match)

NOTE: RIGHT JOIN is rarely used — you can always rewrite it
      as a LEFT JOIN by swapping table order!

-- These are equivalent:
SELECT * FROM A RIGHT JOIN B ON A.id = B.id;
SELECT * FROM B LEFT JOIN A ON A.id = B.id;

FULL OUTER JOIN

Returns ALL rows from BOTH tables.
NULL where there is no match on either side.

SELECT e.name, d.dept_name
FROM employees e
FULL OUTER JOIN departments d ON e.dept_id = d.dept_id;

RESULT:
 name      | dept_name
────────────────────────
 Shreyesh  | Engineering
 Arjun     | Engineering
 Priya     | Marketing
 Zara      | Marketing
 Rahul     | Finance
 Nisha     | NULL          ← No dept match
 NULL      | HR            ← No employee match

-- MySQL doesn't support FULL OUTER JOIN directly!
-- Emulate it with UNION:
SELECT e.name, d.dept_name FROM employees e
LEFT JOIN departments d ON e.dept_id = d.dept_id
UNION
SELECT e.name, d.dept_name FROM employees e
RIGHT JOIN departments d ON e.dept_id = d.dept_id;

CROSS JOIN

Returns every combination of rows from both tables.
Also called Cartesian Product.
Result rows = (rows in A) × (rows in B)

SELECT e.name, d.dept_name
FROM employees e
CROSS JOIN departments d;

RESULT: 6 employees × 4 departments = 24 rows!
(Every employee paired with every department)

 name      | dept_name
────────────────────────
 Shreyesh  | Engineering
 Shreyesh  | Marketing
 Shreyesh  | Finance
 Shreyesh  | HR
 Arjun     | Engineering
 Arjun     | Marketing
 ... (24 rows total)

Use cases:
- Generating test data combinations
- Generating a calendar (cross join days × months)
- Seat assignment (rows × columns in cinema)

SELF JOIN

A table joined with ITSELF.
Use case: hierarchical data (employee-manager), org charts.

-- Find each employee and their manager's name
SELECT
    e.name       AS employee,
    m.name       AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.emp_id;

RESULT:
 employee  | manager
──────────────────────
 Shreyesh  | NULL      ← Top-level, no manager
 Arjun     | Shreyesh
 Priya     | Shreyesh
 Zara      | Priya
 Rahul     | NULL      ← Top-level
 Nisha     | Rahul

HOW: We treat 'employees' as TWO tables — 'e' (employee) and 'm' (manager)
     We join e.manager_id = m.emp_id to find who the manager is

INTERVIEW TRICK — Find Rows with No Match (Anti-Join)

-- Find departments with NO employees
SELECT d.dept_name
FROM departments d
LEFT JOIN employees e ON d.dept_id = e.dept_id
WHERE e.emp_id IS NULL;

RESULT:
 dept_name
───────────
 HR

-- Alternative with NOT IN:
SELECT dept_name FROM departments
WHERE dept_id NOT IN (
    SELECT DISTINCT dept_id FROM employees WHERE dept_id IS NOT NULL
);

-- Alternative with NOT EXISTS:
SELECT d.dept_name FROM departments d
WHERE NOT EXISTS (
    SELECT 1 FROM employees e WHERE e.dept_id = d.dept_id
);

All three give the same result.
LEFT JOIN + IS NULL is usually fastest!

2. Indexing — How Databases Go From Slow to Lightning Fast

An index is a data structure (usually a B-Tree) that helps the database find rows without scanning every row.

Without Index — Full Table Scan:
==================================

SELECT * FROM employees WHERE name = "Priya";

Database reads EVERY row → checks if name = "Priya"
For 1 million rows → checks 1 million rows!
Time: O(n) — linear scan

This is called a FULL TABLE SCAN → SLOW!

With Index — B-Tree Lookup:
==============================

CREATE INDEX idx_name ON employees(name);

Now the database has a sorted B-Tree of names.
It jumps directly to "Priya" using binary search.
Time: O(log n) — from 1 million → only 20 comparisons!

B-Tree structure for names:
          [Nisha]
         /       \
    [Arjun]   [Rahul]
    /     \      \
[Arjun] [Nisha] [Shreyesh]

Types of Indexes:
==================

1. PRIMARY KEY INDEX (Clustered Index)
   - Automatically created on PRIMARY KEY
   - Data rows are PHYSICALLY sorted by this key
   - Only ONE per table (data can only be sorted one way)
   - Fastest lookup — data and index are together

2. UNIQUE INDEX
   - Ensures all values in column are unique
   - Auto-created on UNIQUE constraint columns
   CREATE UNIQUE INDEX idx_email ON employees(email);

3. COMPOSITE INDEX (Multi-column Index)
   - Index on multiple columns together
   CREATE INDEX idx_dept_salary ON employees(dept_id, salary);
   -- Works for: WHERE dept_id = 10
   -- Works for: WHERE dept_id = 10 AND salary > 50000
   -- DOES NOT help: WHERE salary > 50000 (dept_id must come first!)
   -- This is the "leftmost prefix rule"!

4. FULL-TEXT INDEX
   - For searching text content (LIKE "%word%" is slow)
   CREATE FULLTEXT INDEX idx_bio ON articles(content);
   SELECT * FROM articles WHERE MATCH(content) AGAINST("JavaScript");

5. PARTIAL INDEX (PostgreSQL/SQLite)
   - Index only a subset of rows (saves space!)
   CREATE INDEX idx_active_users ON users(email) WHERE is_active = true;

When Indexes HELP vs HURT:
============================

✅ INDEXES HELP:
- High cardinality columns (many unique values): email, emp_id, phone
- Columns in WHERE clause: WHERE salary > 50000
- Columns in JOIN conditions: ON a.dept_id = b.dept_id
- Columns in ORDER BY (avoids sorting)
- Columns in GROUP BY

❌ INDEXES HURT (or don't help):
- Low cardinality columns: is_active (only true/false) → not useful!
- Small tables: index overhead not worth it
- Columns with many NULLs
- Frequently updated columns: index must be updated on every INSERT/UPDATE/DELETE
- Using functions on indexed column in WHERE:
    WHERE YEAR(hire_date) = 2024   ← index NOT used!
    WHERE hire_date >= '2024-01-01' ← index IS used ✅

INDEXING RULES:
- Don't index every column — each index costs write performance
- Index columns you frequently filter, join, or sort on
- Use EXPLAIN to see if your query uses an index!

EXPLAIN — See Your Query Execution Plan:
==========================================

EXPLAIN SELECT * FROM employees WHERE name = "Priya";

Output without index:
  type: ALL    ← Full Table Scan! BAD!
  rows: 6      ← scanned all 6 rows

After: CREATE INDEX idx_name ON employees(name);

Output with index:
  type: ref    ← Index lookup! GOOD!
  rows: 1      ← found directly!
  key: idx_name ← which index was used

EXPLAIN keywords to know:
  ALL   → full table scan → add an index!
  index → full index scan → still slow for large indexes
  range → index range scan → good for BETWEEN, >, <
  ref   → non-unique index lookup → good
  const → primary key lookup → fastest!

3. WHERE vs HAVING — The Most Confused Pair in SQL!

Simple Rule:
=============

WHERE   → filters ROWS before grouping
HAVING  → filters GROUPS after grouping

WHERE  works on individual ROW data
HAVING works on AGGREGATED data (SUM, COUNT, AVG, MAX, MIN)

-- WHERE: Filter rows BEFORE GROUP BY
-- "Show me the average salary per department, but only for Engineering"

SELECT dept_id, AVG(salary) AS avg_salary
FROM employees
WHERE dept_id = 10           ← filters ROWS first (only dept 10 rows kept)
GROUP BY dept_id;

STEP BY STEP:
1. WHERE filters: only rows where dept_id = 10 remain
2. GROUP BY groups those rows by dept_id
3. AVG(salary) calculated on the filtered rows

RESULT:
 dept_id | avg_salary
──────────────────────
   10    | 67500.00

-- HAVING: Filter GROUPS after GROUP BY
-- "Show me departments where average salary > 60000"

SELECT dept_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY dept_id
HAVING AVG(salary) > 60000;  ← filters GROUPS after aggregation

STEP BY STEP:
1. GROUP BY groups all employees by dept_id
2. AVG(salary) calculated per group
3. HAVING filters: only groups where avg_salary > 60000 survive

RESULT:
 dept_id | avg_salary
──────────────────────
   10    | 67500.00    ← 67500 > 60000 ✅
   20    | 67500.00    ← 67500 > 60000 ✅
   30    | 90000.00    ← 90000 > 60000 ✅

-- Can you use WHERE here instead?
-- ❌ NO! This is WRONG:
SELECT dept_id, AVG(salary) AS avg_salary
FROM employees
WHERE AVG(salary) > 60000    ← ERROR! Can't use aggregate in WHERE!
GROUP BY dept_id;

-- WHY? WHERE runs BEFORE aggregation.
-- AVG(salary) doesn't exist yet when WHERE runs!

-- USING BOTH WHERE and HAVING together:
-- "Among employees with salary > 40000,
--  show departments where average salary > 60000"

SELECT dept_id, AVG(salary) AS avg_salary, COUNT(*) AS emp_count
FROM employees
WHERE salary > 40000             ← 1. Filter rows: exclude salary ≤ 40000
GROUP BY dept_id                 ← 2. Group remaining rows
HAVING AVG(salary) > 60000       ← 3. Filter groups by average
ORDER BY avg_salary DESC;        ← 4. Sort result

EXECUTION ORDER (very important for interviews!):
1. FROM         → get the table
2. WHERE        → filter rows
3. GROUP BY     → group filtered rows
4. HAVING       → filter groups
5. SELECT       → compute output columns
6. ORDER BY     → sort
7. LIMIT        → return top N

Memorize: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT

Feature Comparison:
====================

Feature         WHERE                   HAVING
─────────────────────────────────────────────────────────────
Acts on         Individual rows         Groups (after GROUP BY)
When it runs    BEFORE GROUP BY         AFTER GROUP BY
Can use         Column values           Column values + Aggregates
Aggregates      ❌ NO (SUM, COUNT etc)  ✅ YES (SUM, COUNT, AVG...)
Index usage     ✅ Uses indexes         ❌ Can't use indexes (groups)
Performance     Faster (fewer rows)     Slower (filters after grouping)

TRICK: If you CAN use WHERE instead of HAVING, always use WHERE!
It filters rows early → less data for GROUP BY to process → FASTER!

4. UNION vs UNION ALL

Both combine results of two SELECT statements vertically (stacking rows).

UNION     → combines and REMOVES DUPLICATES (like SELECT DISTINCT)
UNION ALL → combines and KEEPS ALL ROWS including duplicates (faster!)

-- Sample data for this section
-- Table A: active_users       Table B: premium_users
  id | name                   id | name
  1  | Shreyesh               1  | Shreyesh   ← same!
  2  | Arjun                  3  | Priya
  3  | Priya                  5  | Rahul

-- UNION — removes duplicates
SELECT id, name FROM active_users
UNION
SELECT id, name FROM premium_users;

RESULT:
 id | name
────────────
  1 | Shreyesh   ← appears once (duplicate removed)
  2 | Arjun
  3 | Priya      ← appears once (duplicate removed)
  5 | Rahul

-- UNION ALL — keeps ALL rows including duplicates
SELECT id, name FROM active_users
UNION ALL
SELECT id, name FROM premium_users;

RESULT:
 id | name
────────────
  1 | Shreyesh   ← from active_users
  2 | Arjun
  3 | Priya
  1 | Shreyesh   ← from premium_users (duplicate kept!)
  3 | Priya      ← from premium_users (duplicate kept!)
  5 | Rahul

Total: 6 rows (3 + 3)

Rules for UNION / UNION ALL:
==============================

1. Both queries must have the SAME number of columns
2. Corresponding columns must have compatible data types
3. Column names in result come from the FIRST query

-- ❌ WRONG: different number of columns
SELECT id, name FROM active_users
UNION
SELECT id FROM premium_users;   ← Error!

-- ✅ Column names come from FIRST SELECT
SELECT id AS user_id, name AS user_name FROM active_users
UNION
SELECT id, name FROM premium_users;

Result columns are named: user_id, user_name (from first query)

-- ORDER BY applies to the FINAL result (not individual queries)
SELECT id, name FROM active_users
UNION ALL
SELECT id, name FROM premium_users
ORDER BY name ASC;  ← orders the combined result

Performance Comparison:
========================

UNION:     Sorts and removes duplicates → extra sort operation → SLOWER
UNION ALL: No sorting, no dedup → just concatenates → FASTER

WHEN TO USE WHICH:
UNION ALL → when you KNOW there are no duplicates (or don't care)
            when combining non-overlapping data (Jan sales + Feb sales)
UNION     → when you need to remove duplicates
            when combining overlapping data (active + premium users list)

REAL WORLD UNION ALL USE CASE:
-- Combine sales from all regional tables (non-overlapping data)
SELECT sale_id, amount, 'North' AS region FROM north_sales
UNION ALL
SELECT sale_id, amount, 'South' AS region FROM south_sales
UNION ALL
SELECT sale_id, amount, 'East'  AS region FROM east_sales
UNION ALL
SELECT sale_id, amount, 'West'  AS region FROM west_sales;
-- No duplicates possible → UNION ALL is correct AND faster!

UNION vs JOIN — They're different!:
=====================================

JOIN  → combines columns HORIZONTALLY (adds more columns)
UNION → combines rows VERTICALLY (adds more rows)

-- JOIN: Employees WITH their department name (new columns)
SELECT e.name, d.dept_name
FROM employees e
JOIN departments d ON e.dept_id = d.dept_id;

-- UNION: List of names from both tables (new rows)
SELECT name FROM employees
UNION
SELECT dept_name FROM departments;

5. Subqueries and Correlated Subqueries

A subquery is a query INSIDE another query.

-- Find employees who earn more than the average salary
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

-- The inner query runs first:
--   SELECT AVG(salary) FROM employees → 67500
-- Then outer query uses that value:
--   WHERE salary > 67500

RESULT:
 name   | salary
──────────────────
 Priya  | 80000
 Rahul  | 90000

Types of Subqueries:
=====================

1. SCALAR SUBQUERY — returns ONE value
   SELECT name, salary,
          (SELECT AVG(salary) FROM employees) AS avg_salary
   FROM employees;

2. ROW SUBQUERY — returns ONE row
   SELECT * FROM employees
   WHERE (dept_id, salary) = (SELECT dept_id, MAX(salary)
                               FROM employees WHERE dept_id = 10);

3. TABLE SUBQUERY (Derived Table) — returns a table
   SELECT dept_id, avg_sal
   FROM (SELECT dept_id, AVG(salary) AS avg_sal
         FROM employees GROUP BY dept_id) AS dept_avg
   WHERE avg_sal > 60000;

4. CORRELATED SUBQUERY — references outer query (runs once per row!)
   SELECT name, salary
   FROM employees e1
   WHERE salary > (SELECT AVG(salary)
                   FROM employees e2
                   WHERE e2.dept_id = e1.dept_id);  ← uses e1!

   This runs the inner query ONCE FOR EACH ROW of outer query!
   For 1 million rows → 1 million inner queries → SLOW!
   Use JOIN with GROUP BY instead for better performance.

-- Find employees earning MORE than their department's average
-- Correlated Subquery (slower):
SELECT name, salary, dept_id
FROM employees e
WHERE salary > (
    SELECT AVG(salary) FROM employees
    WHERE dept_id = e.dept_id   ← correlated to outer query
);

-- Better with JOIN (faster):
SELECT e.name, e.salary, e.dept_id
FROM employees e
JOIN (
    SELECT dept_id, AVG(salary) AS dept_avg
    FROM employees
    GROUP BY dept_id
) dept_stats ON e.dept_id = dept_stats.dept_id
WHERE e.salary > dept_stats.dept_avg;

RESULT:
 name    | salary | dept_id
──────────────────────────────
 Rahul   | 90000  | 30       ← only one in dept 30
 Shreyesh| 75000  | 10       ← above dept 10 avg (67500)
 Priya   | 80000  | 20       ← above dept 20 avg (67500)

6. Window Functions — The Interview Favorite!

Window functions perform calculations across a set of rows related to the current row — WITHOUT collapsing rows like GROUP BY does!

Syntax:
========
function_name() OVER (
    PARTITION BY column   ← divide into groups (like GROUP BY)
    ORDER BY column       ← order within each group
    ROWS/RANGE frame      ← optional window frame
)

ROW_NUMBER() — Unique sequential number per partition:
========================================================

SELECT name, dept_id, salary,
       ROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS row_num
FROM employees;

RESULT:
 name      | dept_id | salary | row_num
─────────────────────────────────────────
 Shreyesh  |   10    | 75000  |    1
 Arjun     |   10    | 60000  |    2
 Priya     |   20    | 80000  |    1
 Zara      |   20    | 55000  |    2
 Rahul     |   30    | 90000  |    1
 Nisha     |  NULL   | 45000  |    1

-- COMMON INTERVIEW USE: Find top N per group (e.g., top earner per dept)
SELECT name, dept_id, salary
FROM (
    SELECT name, dept_id, salary,
           ROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS rn
    FROM employees
) ranked
WHERE rn = 1;

RESULT:
 name     | dept_id | salary
──────────────────────────────
 Shreyesh |   10    | 75000
 Priya    |   20    | 80000
 Rahul    |   30    | 90000

RANK() vs DENSE_RANK() vs ROW_NUMBER() — Critical Difference!:
================================================================

ROW_NUMBER → Always unique: 1, 2, 3, 4, 5
RANK       → Skips after ties: 1, 2, 2, 4, 5  (skips 3!)
DENSE_RANK → No skip after ties: 1, 2, 2, 3, 4 (no gap)

-- Example with tied salaries
salary: 90000, 75000, 75000, 60000, 55000

ROW_NUMBER: 1, 2, 3, 4, 5    (always unique)
RANK:       1, 2, 2, 4, 5    (skips 3 because of tie at rank 2)
DENSE_RANK: 1, 2, 2, 3, 4    (no gaps even with ties)

SELECT name, salary,
       ROW_NUMBER()  OVER (ORDER BY salary DESC) AS row_num,
       RANK()        OVER (ORDER BY salary DESC) AS rnk,
       DENSE_RANK()  OVER (ORDER BY salary DESC) AS dense_rnk
FROM employees
ORDER BY salary DESC;

LAG() and LEAD() — Access Previous and Next Row:
==================================================

LAG(col, n)  → value from n rows BEFORE current row
LEAD(col, n) → value from n rows AFTER current row

-- Compare each employee's salary with the previous employee's salary
SELECT name, salary,
       LAG(salary, 1) OVER (ORDER BY salary) AS prev_salary,
       salary - LAG(salary, 1) OVER (ORDER BY salary) AS diff
FROM employees
ORDER BY salary;

RESULT:
 name      | salary | prev_salary | diff
──────────────────────────────────────────
 Nisha     | 45000  | NULL        | NULL
 Zara      | 55000  | 45000       | 10000
 Arjun     | 60000  | 55000       | 5000
 Shreyesh  | 75000  | 60000       | 15000
 Priya     | 80000  | 75000       | 5000
 Rahul     | 90000  | 80000       | 10000

-- Real-world: Month over month revenue change
SELECT month, revenue,
       LAG(revenue) OVER (ORDER BY month) AS prev_month,
       revenue - LAG(revenue) OVER (ORDER BY month) AS growth
FROM monthly_sales;

SUM() OVER — Running Total:
=============================

SELECT name, salary,
       SUM(salary) OVER (ORDER BY emp_id) AS running_total
FROM employees;

RESULT:
 name      | salary | running_total
────────────────────────────────────
 Shreyesh  | 75000  | 75000
 Arjun     | 60000  | 135000
 Priya     | 80000  | 215000
 Zara      | 55000  | 270000
 Rahul     | 90000  | 360000
 Nisha     | 45000  | 405000

-- Running total PER DEPARTMENT
SELECT name, dept_id, salary,
       SUM(salary) OVER (PARTITION BY dept_id ORDER BY emp_id) AS dept_running_total
FROM employees;

7. Transactions and ACID Properties

A TRANSACTION is a sequence of SQL operations treated as ONE unit.
Either ALL succeed, or NONE do — no partial states!

-- Classic example: Bank transfer
BEGIN TRANSACTION;

UPDATE accounts SET balance = balance - 1000 WHERE acc_id = 101;  -- Debit
UPDATE accounts SET balance = balance + 1000 WHERE acc_id = 202;  -- Credit

-- If BOTH succeed:
COMMIT;  -- Make changes permanent

-- If ANYTHING fails:
ROLLBACK;  -- Undo ALL changes (balance restored!)

-- Without transaction: debit succeeds, credit fails → money DISAPPEARS!
-- With transaction: either both happen or neither → data stays consistent!

ACID Properties (Every interview asks this!):
==============================================

A — ATOMICITY
    Transaction is ALL or NOTHING.
    If any step fails → entire transaction rolls back.
    "Atom" = indivisible unit

C — CONSISTENCY
    Database goes from one VALID state to another VALID state.
    All constraints (NOT NULL, FOREIGN KEY, CHECK) are respected.
    Example: account balance can never go negative (if constraint exists)

I — ISOLATION
    Concurrent transactions don't interfere with each other.
    Each transaction sees the database as if it were alone.
    Levels: READ UNCOMMITTED → READ COMMITTED → REPEATABLE READ → SERIALIZABLE

D — DURABILITY
    Committed transactions are PERMANENT.
    Even if server crashes immediately after COMMIT — data is safe.
    Achieved via write-ahead logging (WAL) and disk persistence.

Isolation Levels — Interview Deep Dive:
=========================================

Dirty Read    → reading uncommitted changes from another transaction
Non-repeatable → same query returns different values in same transaction
Phantom Read  → new rows appear between two identical queries in same transaction

Level                Dirty Read  Non-Repeatable  Phantom Read
──────────────────────────────────────────────────────────────
READ UNCOMMITTED      Possible    Possible        Possible
READ COMMITTED        Prevented   Possible        Possible   (default in most DB)
REPEATABLE READ       Prevented   Prevented       Possible   (default in MySQL)
SERIALIZABLE          Prevented   Prevented       Prevented  (strictest, slowest)

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN;
-- Your queries here
COMMIT;

8. Normalization — 1NF, 2NF, 3NF

Normalization = organizing tables to reduce REDUNDANCY and improve INTEGRITY.

Un-normalized (BAD) table:
===========================

 order_id | customer | customer_email   | products
──────────────────────────────────────────────────────────────
    1     | Shreyesh | s@gmail.com      | Laptop, Mouse
    2     | Arjun    | a@gmail.com      | Keyboard
    3     | Shreyesh | s@gmail.com      | Monitor        ← email repeated!

Problems: Duplicate data, update anomalies, insert anomalies, delete anomalies

FIRST NORMAL FORM (1NF):
==========================
Rule: Each cell must contain ATOMIC (single) values. No repeating groups.

Before 1NF:
  order_id | products
  1        | Laptop, Mouse   ← NOT atomic!

After 1NF:
  order_id | product
  1        | Laptop
  1        | Mouse           ← separate rows for each product ✅
  2        | Keyboard

Rule: No arrays or lists in a single cell.

SECOND NORMAL FORM (2NF):
==========================
Rule: Must be in 1NF AND every non-key column must depend on the WHOLE primary key.
Violation = Partial Dependency (non-key column depends on PART of composite key)

Table with partial dependency:
  order_id | product_id | product_name | quantity
  (PK = order_id + product_id)

  product_name depends ONLY on product_id → not the whole PK!
  This is PARTIAL DEPENDENCY → violates 2NF!

Fix: Split into two tables:
  orders_products: order_id, product_id, quantity
  products:        product_id, product_name

THIRD NORMAL FORM (3NF):
==========================
Rule: Must be in 2NF AND no non-key column should depend on ANOTHER non-key column.
Violation = Transitive Dependency

Table with transitive dependency:
  emp_id | dept_id | dept_name
  dept_name depends on dept_id (a non-key) → not directly on emp_id!
  emp_id → dept_id → dept_name = TRANSITIVE DEPENDENCY → violates 3NF!

Fix: Split into two tables:
  employees:   emp_id, dept_id
  departments: dept_id, dept_name

3NF Summary: "Every non-key attribute must depend on the KEY,
              the WHOLE KEY, and NOTHING BUT THE KEY!"

9. Stored Procedures vs Functions

Feature          Stored Procedure          Function
────────────────────────────────────────────────────────────────
Returns          0 or more result sets     Must return ONE value
Use in SELECT    ❌ Cannot                  ✅ Can use in SELECT
DML (INSERT etc) ✅ Can have               ❌ Usually restricted
Transaction      ✅ Can manage             ❌ Cannot start/commit
Call with        CALL / EXEC               SELECT / in expressions
Purpose          Complex business logic    Reusable calculations

-- STORED PROCEDURE: Get employees by department
DELIMITER //
CREATE PROCEDURE GetDeptEmployees(IN dept_id_param INT)
BEGIN
    SELECT name, salary
    FROM employees
    WHERE dept_id = dept_id_param
    ORDER BY salary DESC;
END //
DELIMITER ;

-- Calling it:
CALL GetDeptEmployees(10);

-- FUNCTION: Calculate tax for a salary
DELIMITER //
CREATE FUNCTION CalculateTax(salary DECIMAL(10,2))
RETURNS DECIMAL(10,2)
DETERMINISTIC
BEGIN
    RETURN salary * 0.30;
END //
DELIMITER ;

-- Using it in SELECT:
SELECT name, salary, CalculateTax(salary) AS tax
FROM employees;

Interview Questions

Q1: What is the difference between INNER JOIN and LEFT JOIN?

"INNER JOIN returns only the rows where there is a matching value in both tables — rows with no match are excluded from both sides. LEFT JOIN returns ALL rows from the left table plus matching rows from the right table. If there is no match in the right table, the right table columns return NULL. Use INNER JOIN when you only care about matched data, and LEFT JOIN when you need all rows from the left table regardless of whether they have a match."

Q2: What is the difference between WHERE and HAVING?

"WHERE filters individual rows BEFORE the GROUP BY clause is applied — it cannot use aggregate functions. HAVING filters groups AFTER the GROUP BY is applied — it can use aggregate functions like SUM, COUNT, AVG. The SQL execution order is: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY. If you can achieve the same result with WHERE, prefer it over HAVING because WHERE filters early and reduces the data that GROUP BY has to process."

Q3: What is the difference between UNION and UNION ALL?

"Both UNION and UNION ALL combine the result sets of two SELECT statements vertically. UNION removes duplicate rows from the combined result, which requires a sort/deduplication step and is slower. UNION ALL keeps all rows including duplicates and is faster because it skips the deduplication step. Use UNION ALL when you know there are no duplicates or when duplicates are acceptable, as it is significantly more efficient."

Q4: What is an index and how does it work?

"An index is a data structure — typically a B-Tree — that the database builds on a column to speed up data retrieval. Without an index, a query does a full table scan, reading every row — O(n). With an index, the database uses binary search on the B-Tree to find the row in O(log n) time. Indexes speed up SELECT but slow down INSERT, UPDATE, and DELETE because the index must be updated on every write. You should index columns used frequently in WHERE clauses, JOIN conditions, and ORDER BY clauses."

Q5: What is a correlated subquery? How is it different from a regular subquery?

"A regular subquery runs independently — it executes once and its result is used by the outer query. A correlated subquery references columns from the outer query, so it must be re-evaluated once for every row processed by the outer query. If the outer query has 1 million rows, the correlated subquery runs 1 million times, making it very slow. Regular subqueries run once regardless of outer row count. Correlated subqueries should be replaced with JOINs and GROUP BY wherever possible for better performance."

Q6: What are ACID properties?

"ACID stands for: Atomicity — a transaction is all or nothing, if any step fails the whole transaction rolls back. Consistency — the database goes from one valid state to another, all constraints are maintained. Isolation — concurrent transactions don't interfere with each other, each sees a consistent snapshot. Durability — once a transaction is committed, it is permanently saved even if the system crashes immediately after, thanks to write-ahead logging."

Q7: What is the difference between RANK, DENSE_RANK, and ROW_NUMBER?

"All three are window functions that assign numbers to rows within a partition. ROW_NUMBER assigns a unique sequential number with no gaps, even for ties. RANK assigns the same number to tied rows but skips the next numbers — if two rows tie at rank 2, the next rank is 4. DENSE_RANK also assigns the same number to ties but does NOT skip — if two rows tie at rank 2, the next rank is 3. ROW_NUMBER is best for pagination, RANK for leaderboards where gaps matter, DENSE_RANK for leaderboards where you don't want gaps."

Q8: Write a query to find the second highest salary.

"There are multiple ways. Using DENSE_RANK: SELECT salary FROM (SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk FROM employees) ranked WHERE rnk = 2; — this handles ties correctly. Using LIMIT/OFFSET: SELECT DISTINCT salary FROM employees ORDER BY salary DESC LIMIT 1 OFFSET 1; — simple but OFFSET-based pagination can be slow on large tables. Using subquery: SELECT MAX(salary) FROM employees WHERE salary < (SELECT MAX(salary) FROM employees); — clean and readable."

Quick Recap — All Joins

JOIN Type	Returns	NULL Rows	Use When
INNER JOIN	Matched rows only	None	You only want data with matches on both sides
LEFT JOIN	All left + matched right	Right side	All rows from left, even without matches
RIGHT JOIN	All right + matched left	Left side	All rows from right (rewrite as LEFT JOIN instead)
FULL OUTER JOIN	All rows from both	Both sides	All rows from both tables regardless of match
CROSS JOIN	Every A × every B	None	Cartesian product, combinations
SELF JOIN	Table joined with itself	Depends	Hierarchical data, employee-manager

Key Points to Remember

SQL execution order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT
WHERE filters rows BEFORE grouping | HAVING filters groups AFTER grouping
WHERE cannot use aggregates (SUM, COUNT) | HAVING can
UNION removes duplicates (slower) | UNION ALL keeps all (faster)
INNER JOIN excludes non-matching rows from both sides
LEFT JOIN + IS NULL on right side = find rows with NO match (Anti-Join)
Index speeds up SELECT but slows down INSERT/UPDATE/DELETE
Composite index leftmost prefix rule — (dept, salary) index helps WHERE dept = 10, not WHERE salary > X
EXPLAIN shows if your query uses an index — look for "ALL" (bad) vs "ref" (good)
ROW_NUMBER = always unique | RANK = skips after tie | DENSE_RANK = no skip
LAG() accesses previous row | LEAD() accesses next row
Correlated subquery runs once per outer row — replace with JOIN for performance
ACID: Atomicity, Consistency, Isolation, Durability — memorize this!
3NF rule: depend on the key, the whole key, and nothing but the key
UNION ALL is always preferred over UNION when duplicates don't matter

Keep coding, keep learning! See you in the next one!

sql

Shreyash Kolhe

Episode - SQL Deep Dive — Everything Interviews Ask You

Episode - SQL Deep Dive — Everything Interviews Ask You

Sample Tables — Used Throughout This Blog

1. All Types of JOINs

INNER JOIN

LEFT JOIN (LEFT OUTER JOIN)

RIGHT JOIN (RIGHT OUTER JOIN)

FULL OUTER JOIN

CROSS JOIN

SELF JOIN

INTERVIEW TRICK — Find Rows with No Match (Anti-Join)

2. Indexing — How Databases Go From Slow to Lightning Fast

3. WHERE vs HAVING — The Most Confused Pair in SQL!

4. UNION vs UNION ALL

5. Subqueries and Correlated Subqueries

6. Window Functions — The Interview Favorite!

7. Transactions and ACID Properties

8. Normalization — 1NF, 2NF, 3NF

9. Stored Procedures vs Functions

Interview Questions

Quick Recap — All Joins

Key Points to Remember

Post a Comment

Episode - SQL Deep Dive — Everything Interviews Ask You

Episode - SQL Deep Dive — Everything Interviews Ask You

Sample Tables — Used Throughout This Blog

1. All Types of JOINs

INNER JOIN

LEFT JOIN (LEFT OUTER JOIN)

RIGHT JOIN (RIGHT OUTER JOIN)

FULL OUTER JOIN

CROSS JOIN

SELF JOIN

INTERVIEW TRICK — Find Rows with No Match (Anti-Join)

2. Indexing — How Databases Go From Slow to Lightning Fast

3. WHERE vs HAVING — The Most Confused Pair in SQL!

4. UNION vs UNION ALL

5. Subqueries and Correlated Subqueries

6. Window Functions — The Interview Favorite!

7. Transactions and ACID Properties

8. Normalization — 1NF, 2NF, 3NF

9. Stored Procedures vs Functions

Interview Questions

Quick Recap — All Joins

Key Points to Remember

You may like these posts

Post a Comment