Episode - SQL Deep Dive — Everything Interviews Ask You
Episode - SQL Deep Dive — Everything Interviews Ask You
Hey everyone! Welcome back. Today we go deep into SQL — the way interviewers actually test it. Not just syntax, but HOW things work under the hood, edge cases, and the tricky questions that separate beginners from seniors!
What we will cover:
- SQL Basics Refresher — Tables, Keys, Constraints
- All Types of JOINs — With Visual Diagrams
- Indexing — What it is, How it Works, When to Use
- WHERE vs HAVING — The Most Confused Pair
- UNION vs UNION ALL
- Subqueries and Correlated Subqueries
- Window Functions — ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD
- GROUP BY Deep Dive
- Transactions and ACID Properties
- Normalization (1NF, 2NF, 3NF)
- Stored Procedures vs Functions
- Interview Questions
Sample Tables — Used Throughout This Blog
We'll use these tables for all examples:
── employees ──────────────────────────────────────────
emp_id | name | dept_id | salary | manager_id
────────────────────────────────────────────────────────
1 | Shreyesh | 10 | 75000 | NULL
2 | Arjun | 10 | 60000 | 1
3 | Priya | 20 | 80000 | 1
4 | Zara | 20 | 55000 | 3
5 | Rahul | 30 | 90000 | NULL
6 | Nisha | NULL | 45000 | 5
── departments ────────────────────────────────────────
dept_id | dept_name | location
─────────────────────────────────
10 | Engineering | Bangalore
20 | Marketing | Mumbai
30 | Finance | Delhi
40 | HR | Chennai ← No employees here!
── orders ─────────────────────────────────────────────
order_id | emp_id | amount | order_date
──────────────────────────────────────────
1 | 1 | 5000.00 | 2024-01-15
2 | 2 | 3200.00 | 2024-01-20
3 | 1 | 7500.00 | 2024-02-10
4 | 3 | 1200.00 | 2024-02-15
5 | 7 | 9900.00 | 2024-03-01 ← emp_id 7 doesn't exist!
1. All Types of JOINs
A JOIN combines rows from two or more tables based on a related column.
VISUAL DIAGRAM — All JOINs: ============================= Table A Table B ┌───────┐ ┌───────┐ │ A ∩ B │ │ A ∩ B │ │ only │ A ∩ B │ only │ └───────┘ └───────┘ INNER JOIN → only matching rows (A ∩ B) LEFT JOIN → all of A + matching B (NULL if no match) RIGHT JOIN → all of B + matching A (NULL if no match) FULL OUTER JOIN → all of A + all of B (NULL where no match) CROSS JOIN → every row of A × every row of B (cartesian product) SELF JOIN → table joined with itself
INNER JOIN
Returns ONLY rows where there is a match in BOTH tables. Rows with no match on either side are EXCLUDED. SELECT e.name, d.dept_name FROM employees e INNER JOIN departments d ON e.dept_id = d.dept_id; RESULT: name | dept_name ──────────────────────── Shreyesh | Engineering Arjun | Engineering Priya | Marketing Zara | Marketing Rahul | Finance Nisha is EXCLUDED → dept_id is NULL, no match in departments HR department is EXCLUDED → no employees in dept 40
LEFT JOIN (LEFT OUTER JOIN)
Returns ALL rows from the LEFT table. If no match in right table → NULLs for right table columns. SELECT e.name, d.dept_name FROM employees e LEFT JOIN departments d ON e.dept_id = d.dept_id; RESULT: name | dept_name ──────────────────────── Shreyesh | Engineering Arjun | Engineering Priya | Marketing Zara | Marketing Rahul | Finance Nisha | NULL ← Nisha included! dept_name is NULL Nisha IS included (left table = employees) HR department still NOT shown (right table = departments)
RIGHT JOIN (RIGHT OUTER JOIN)
Returns ALL rows from the RIGHT table.
If no match in left table → NULLs for left table columns.
SELECT e.name, d.dept_name
FROM employees e
RIGHT JOIN departments d ON e.dept_id = d.dept_id;
RESULT:
name | dept_name
────────────────────────
Shreyesh | Engineering
Arjun | Engineering
Priya | Marketing
Zara | Marketing
Rahul | Finance
NULL | HR ← HR included! name is NULL
HR IS included (right table = departments)
Nisha NOT shown (left table = employees, no dept match)
NOTE: RIGHT JOIN is rarely used — you can always rewrite it
as a LEFT JOIN by swapping table order!
-- These are equivalent:
SELECT * FROM A RIGHT JOIN B ON A.id = B.id;
SELECT * FROM B LEFT JOIN A ON A.id = B.id;
FULL OUTER JOIN
Returns ALL rows from BOTH tables. NULL where there is no match on either side. SELECT e.name, d.dept_name FROM employees e FULL OUTER JOIN departments d ON e.dept_id = d.dept_id; RESULT: name | dept_name ──────────────────────── Shreyesh | Engineering Arjun | Engineering Priya | Marketing Zara | Marketing Rahul | Finance Nisha | NULL ← No dept match NULL | HR ← No employee match -- MySQL doesn't support FULL OUTER JOIN directly! -- Emulate it with UNION: SELECT e.name, d.dept_name FROM employees e LEFT JOIN departments d ON e.dept_id = d.dept_id UNION SELECT e.name, d.dept_name FROM employees e RIGHT JOIN departments d ON e.dept_id = d.dept_id;
CROSS JOIN
Returns every combination of rows from both tables. Also called Cartesian Product. Result rows = (rows in A) × (rows in B) SELECT e.name, d.dept_name FROM employees e CROSS JOIN departments d; RESULT: 6 employees × 4 departments = 24 rows! (Every employee paired with every department) name | dept_name ──────────────────────── Shreyesh | Engineering Shreyesh | Marketing Shreyesh | Finance Shreyesh | HR Arjun | Engineering Arjun | Marketing ... (24 rows total) Use cases: - Generating test data combinations - Generating a calendar (cross join days × months) - Seat assignment (rows × columns in cinema)
SELF JOIN
A table joined with ITSELF.
Use case: hierarchical data (employee-manager), org charts.
-- Find each employee and their manager's name
SELECT
e.name AS employee,
m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.emp_id;
RESULT:
employee | manager
──────────────────────
Shreyesh | NULL ← Top-level, no manager
Arjun | Shreyesh
Priya | Shreyesh
Zara | Priya
Rahul | NULL ← Top-level
Nisha | Rahul
HOW: We treat 'employees' as TWO tables — 'e' (employee) and 'm' (manager)
We join e.manager_id = m.emp_id to find who the manager is
INTERVIEW TRICK — Find Rows with No Match (Anti-Join)
-- Find departments with NO employees
SELECT d.dept_name
FROM departments d
LEFT JOIN employees e ON d.dept_id = e.dept_id
WHERE e.emp_id IS NULL;
RESULT:
dept_name
───────────
HR
-- Alternative with NOT IN:
SELECT dept_name FROM departments
WHERE dept_id NOT IN (
SELECT DISTINCT dept_id FROM employees WHERE dept_id IS NOT NULL
);
-- Alternative with NOT EXISTS:
SELECT d.dept_name FROM departments d
WHERE NOT EXISTS (
SELECT 1 FROM employees e WHERE e.dept_id = d.dept_id
);
All three give the same result.
LEFT JOIN + IS NULL is usually fastest!
2. Indexing — How Databases Go From Slow to Lightning Fast
An index is a data structure (usually a B-Tree) that helps the database find rows without scanning every row.
Without Index — Full Table Scan:
==================================
SELECT * FROM employees WHERE name = "Priya";
Database reads EVERY row → checks if name = "Priya"
For 1 million rows → checks 1 million rows!
Time: O(n) — linear scan
This is called a FULL TABLE SCAN → SLOW!
With Index — B-Tree Lookup:
==============================
CREATE INDEX idx_name ON employees(name);
Now the database has a sorted B-Tree of names.
It jumps directly to "Priya" using binary search.
Time: O(log n) — from 1 million → only 20 comparisons!
B-Tree structure for names:
[Nisha]
/ \
[Arjun] [Rahul]
/ \ \
[Arjun] [Nisha] [Shreyesh]
Types of Indexes:
==================
1. PRIMARY KEY INDEX (Clustered Index)
- Automatically created on PRIMARY KEY
- Data rows are PHYSICALLY sorted by this key
- Only ONE per table (data can only be sorted one way)
- Fastest lookup — data and index are together
2. UNIQUE INDEX
- Ensures all values in column are unique
- Auto-created on UNIQUE constraint columns
CREATE UNIQUE INDEX idx_email ON employees(email);
3. COMPOSITE INDEX (Multi-column Index)
- Index on multiple columns together
CREATE INDEX idx_dept_salary ON employees(dept_id, salary);
-- Works for: WHERE dept_id = 10
-- Works for: WHERE dept_id = 10 AND salary > 50000
-- DOES NOT help: WHERE salary > 50000 (dept_id must come first!)
-- This is the "leftmost prefix rule"!
4. FULL-TEXT INDEX
- For searching text content (LIKE "%word%" is slow)
CREATE FULLTEXT INDEX idx_bio ON articles(content);
SELECT * FROM articles WHERE MATCH(content) AGAINST("JavaScript");
5. PARTIAL INDEX (PostgreSQL/SQLite)
- Index only a subset of rows (saves space!)
CREATE INDEX idx_active_users ON users(email) WHERE is_active = true;
When Indexes HELP vs HURT:
============================
✅ INDEXES HELP:
- High cardinality columns (many unique values): email, emp_id, phone
- Columns in WHERE clause: WHERE salary > 50000
- Columns in JOIN conditions: ON a.dept_id = b.dept_id
- Columns in ORDER BY (avoids sorting)
- Columns in GROUP BY
❌ INDEXES HURT (or don't help):
- Low cardinality columns: is_active (only true/false) → not useful!
- Small tables: index overhead not worth it
- Columns with many NULLs
- Frequently updated columns: index must be updated on every INSERT/UPDATE/DELETE
- Using functions on indexed column in WHERE:
WHERE YEAR(hire_date) = 2024 ← index NOT used!
WHERE hire_date >= '2024-01-01' ← index IS used ✅
INDEXING RULES:
- Don't index every column — each index costs write performance
- Index columns you frequently filter, join, or sort on
- Use EXPLAIN to see if your query uses an index!
EXPLAIN — See Your Query Execution Plan: ========================================== EXPLAIN SELECT * FROM employees WHERE name = "Priya"; Output without index: type: ALL ← Full Table Scan! BAD! rows: 6 ← scanned all 6 rows After: CREATE INDEX idx_name ON employees(name); Output with index: type: ref ← Index lookup! GOOD! rows: 1 ← found directly! key: idx_name ← which index was used EXPLAIN keywords to know: ALL → full table scan → add an index! index → full index scan → still slow for large indexes range → index range scan → good for BETWEEN, >, < ref → non-unique index lookup → good const → primary key lookup → fastest!
3. WHERE vs HAVING — The Most Confused Pair in SQL!
Simple Rule: ============= WHERE → filters ROWS before grouping HAVING → filters GROUPS after grouping WHERE works on individual ROW data HAVING works on AGGREGATED data (SUM, COUNT, AVG, MAX, MIN)
-- WHERE: Filter rows BEFORE GROUP BY -- "Show me the average salary per department, but only for Engineering" SELECT dept_id, AVG(salary) AS avg_salary FROM employees WHERE dept_id = 10 ← filters ROWS first (only dept 10 rows kept) GROUP BY dept_id; STEP BY STEP: 1. WHERE filters: only rows where dept_id = 10 remain 2. GROUP BY groups those rows by dept_id 3. AVG(salary) calculated on the filtered rows RESULT: dept_id | avg_salary ────────────────────── 10 | 67500.00
-- HAVING: Filter GROUPS after GROUP BY -- "Show me departments where average salary > 60000" SELECT dept_id, AVG(salary) AS avg_salary FROM employees GROUP BY dept_id HAVING AVG(salary) > 60000; ← filters GROUPS after aggregation STEP BY STEP: 1. GROUP BY groups all employees by dept_id 2. AVG(salary) calculated per group 3. HAVING filters: only groups where avg_salary > 60000 survive RESULT: dept_id | avg_salary ────────────────────── 10 | 67500.00 ← 67500 > 60000 ✅ 20 | 67500.00 ← 67500 > 60000 ✅ 30 | 90000.00 ← 90000 > 60000 ✅ -- Can you use WHERE here instead? -- ❌ NO! This is WRONG: SELECT dept_id, AVG(salary) AS avg_salary FROM employees WHERE AVG(salary) > 60000 ← ERROR! Can't use aggregate in WHERE! GROUP BY dept_id; -- WHY? WHERE runs BEFORE aggregation. -- AVG(salary) doesn't exist yet when WHERE runs!
-- USING BOTH WHERE and HAVING together: -- "Among employees with salary > 40000, -- show departments where average salary > 60000" SELECT dept_id, AVG(salary) AS avg_salary, COUNT(*) AS emp_count FROM employees WHERE salary > 40000 ← 1. Filter rows: exclude salary ≤ 40000 GROUP BY dept_id ← 2. Group remaining rows HAVING AVG(salary) > 60000 ← 3. Filter groups by average ORDER BY avg_salary DESC; ← 4. Sort result EXECUTION ORDER (very important for interviews!): 1. FROM → get the table 2. WHERE → filter rows 3. GROUP BY → group filtered rows 4. HAVING → filter groups 5. SELECT → compute output columns 6. ORDER BY → sort 7. LIMIT → return top N Memorize: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT
Feature Comparison: ==================== Feature WHERE HAVING ───────────────────────────────────────────────────────────── Acts on Individual rows Groups (after GROUP BY) When it runs BEFORE GROUP BY AFTER GROUP BY Can use Column values Column values + Aggregates Aggregates ❌ NO (SUM, COUNT etc) ✅ YES (SUM, COUNT, AVG...) Index usage ✅ Uses indexes ❌ Can't use indexes (groups) Performance Faster (fewer rows) Slower (filters after grouping) TRICK: If you CAN use WHERE instead of HAVING, always use WHERE! It filters rows early → less data for GROUP BY to process → FASTER!
4. UNION vs UNION ALL
Both combine results of two SELECT statements vertically (stacking rows). UNION → combines and REMOVES DUPLICATES (like SELECT DISTINCT) UNION ALL → combines and KEEPS ALL ROWS including duplicates (faster!)
-- Sample data for this section -- Table A: active_users Table B: premium_users id | name id | name 1 | Shreyesh 1 | Shreyesh ← same! 2 | Arjun 3 | Priya 3 | Priya 5 | Rahul -- UNION — removes duplicates SELECT id, name FROM active_users UNION SELECT id, name FROM premium_users; RESULT: id | name ──────────── 1 | Shreyesh ← appears once (duplicate removed) 2 | Arjun 3 | Priya ← appears once (duplicate removed) 5 | Rahul -- UNION ALL — keeps ALL rows including duplicates SELECT id, name FROM active_users UNION ALL SELECT id, name FROM premium_users; RESULT: id | name ──────────── 1 | Shreyesh ← from active_users 2 | Arjun 3 | Priya 1 | Shreyesh ← from premium_users (duplicate kept!) 3 | Priya ← from premium_users (duplicate kept!) 5 | Rahul Total: 6 rows (3 + 3)
Rules for UNION / UNION ALL: ============================== 1. Both queries must have the SAME number of columns 2. Corresponding columns must have compatible data types 3. Column names in result come from the FIRST query -- ❌ WRONG: different number of columns SELECT id, name FROM active_users UNION SELECT id FROM premium_users; ← Error! -- ✅ Column names come from FIRST SELECT SELECT id AS user_id, name AS user_name FROM active_users UNION SELECT id, name FROM premium_users; Result columns are named: user_id, user_name (from first query) -- ORDER BY applies to the FINAL result (not individual queries) SELECT id, name FROM active_users UNION ALL SELECT id, name FROM premium_users ORDER BY name ASC; ← orders the combined result
Performance Comparison:
========================
UNION: Sorts and removes duplicates → extra sort operation → SLOWER
UNION ALL: No sorting, no dedup → just concatenates → FASTER
WHEN TO USE WHICH:
UNION ALL → when you KNOW there are no duplicates (or don't care)
when combining non-overlapping data (Jan sales + Feb sales)
UNION → when you need to remove duplicates
when combining overlapping data (active + premium users list)
REAL WORLD UNION ALL USE CASE:
-- Combine sales from all regional tables (non-overlapping data)
SELECT sale_id, amount, 'North' AS region FROM north_sales
UNION ALL
SELECT sale_id, amount, 'South' AS region FROM south_sales
UNION ALL
SELECT sale_id, amount, 'East' AS region FROM east_sales
UNION ALL
SELECT sale_id, amount, 'West' AS region FROM west_sales;
-- No duplicates possible → UNION ALL is correct AND faster!
UNION vs JOIN — They're different!: ===================================== JOIN → combines columns HORIZONTALLY (adds more columns) UNION → combines rows VERTICALLY (adds more rows) -- JOIN: Employees WITH their department name (new columns) SELECT e.name, d.dept_name FROM employees e JOIN departments d ON e.dept_id = d.dept_id; -- UNION: List of names from both tables (new rows) SELECT name FROM employees UNION SELECT dept_name FROM departments;
5. Subqueries and Correlated Subqueries
A subquery is a query INSIDE another query. -- Find employees who earn more than the average salary SELECT name, salary FROM employees WHERE salary > (SELECT AVG(salary) FROM employees); -- The inner query runs first: -- SELECT AVG(salary) FROM employees → 67500 -- Then outer query uses that value: -- WHERE salary > 67500 RESULT: name | salary ────────────────── Priya | 80000 Rahul | 90000
Types of Subqueries:
=====================
1. SCALAR SUBQUERY — returns ONE value
SELECT name, salary,
(SELECT AVG(salary) FROM employees) AS avg_salary
FROM employees;
2. ROW SUBQUERY — returns ONE row
SELECT * FROM employees
WHERE (dept_id, salary) = (SELECT dept_id, MAX(salary)
FROM employees WHERE dept_id = 10);
3. TABLE SUBQUERY (Derived Table) — returns a table
SELECT dept_id, avg_sal
FROM (SELECT dept_id, AVG(salary) AS avg_sal
FROM employees GROUP BY dept_id) AS dept_avg
WHERE avg_sal > 60000;
4. CORRELATED SUBQUERY — references outer query (runs once per row!)
SELECT name, salary
FROM employees e1
WHERE salary > (SELECT AVG(salary)
FROM employees e2
WHERE e2.dept_id = e1.dept_id); ← uses e1!
This runs the inner query ONCE FOR EACH ROW of outer query!
For 1 million rows → 1 million inner queries → SLOW!
Use JOIN with GROUP BY instead for better performance.
-- Find employees earning MORE than their department's average
-- Correlated Subquery (slower):
SELECT name, salary, dept_id
FROM employees e
WHERE salary > (
SELECT AVG(salary) FROM employees
WHERE dept_id = e.dept_id ← correlated to outer query
);
-- Better with JOIN (faster):
SELECT e.name, e.salary, e.dept_id
FROM employees e
JOIN (
SELECT dept_id, AVG(salary) AS dept_avg
FROM employees
GROUP BY dept_id
) dept_stats ON e.dept_id = dept_stats.dept_id
WHERE e.salary > dept_stats.dept_avg;
RESULT:
name | salary | dept_id
──────────────────────────────
Rahul | 90000 | 30 ← only one in dept 30
Shreyesh| 75000 | 10 ← above dept 10 avg (67500)
Priya | 80000 | 20 ← above dept 20 avg (67500)
6. Window Functions — The Interview Favorite!
Window functions perform calculations across a set of rows related to the current row — WITHOUT collapsing rows like GROUP BY does!
Syntax:
========
function_name() OVER (
PARTITION BY column ← divide into groups (like GROUP BY)
ORDER BY column ← order within each group
ROWS/RANGE frame ← optional window frame
)
ROW_NUMBER() — Unique sequential number per partition:
========================================================
SELECT name, dept_id, salary,
ROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS row_num
FROM employees;
RESULT:
name | dept_id | salary | row_num
─────────────────────────────────────────
Shreyesh | 10 | 75000 | 1
Arjun | 10 | 60000 | 2
Priya | 20 | 80000 | 1
Zara | 20 | 55000 | 2
Rahul | 30 | 90000 | 1
Nisha | NULL | 45000 | 1
-- COMMON INTERVIEW USE: Find top N per group (e.g., top earner per dept)
SELECT name, dept_id, salary
FROM (
SELECT name, dept_id, salary,
ROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS rn
FROM employees
) ranked
WHERE rn = 1;
RESULT:
name | dept_id | salary
──────────────────────────────
Shreyesh | 10 | 75000
Priya | 20 | 80000
Rahul | 30 | 90000
RANK() vs DENSE_RANK() vs ROW_NUMBER() — Critical Difference!:
================================================================
ROW_NUMBER → Always unique: 1, 2, 3, 4, 5
RANK → Skips after ties: 1, 2, 2, 4, 5 (skips 3!)
DENSE_RANK → No skip after ties: 1, 2, 2, 3, 4 (no gap)
-- Example with tied salaries
salary: 90000, 75000, 75000, 60000, 55000
ROW_NUMBER: 1, 2, 3, 4, 5 (always unique)
RANK: 1, 2, 2, 4, 5 (skips 3 because of tie at rank 2)
DENSE_RANK: 1, 2, 2, 3, 4 (no gaps even with ties)
SELECT name, salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num,
RANK() OVER (ORDER BY salary DESC) AS rnk,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rnk
FROM employees
ORDER BY salary DESC;
LAG() and LEAD() — Access Previous and Next Row:
==================================================
LAG(col, n) → value from n rows BEFORE current row
LEAD(col, n) → value from n rows AFTER current row
-- Compare each employee's salary with the previous employee's salary
SELECT name, salary,
LAG(salary, 1) OVER (ORDER BY salary) AS prev_salary,
salary - LAG(salary, 1) OVER (ORDER BY salary) AS diff
FROM employees
ORDER BY salary;
RESULT:
name | salary | prev_salary | diff
──────────────────────────────────────────
Nisha | 45000 | NULL | NULL
Zara | 55000 | 45000 | 10000
Arjun | 60000 | 55000 | 5000
Shreyesh | 75000 | 60000 | 15000
Priya | 80000 | 75000 | 5000
Rahul | 90000 | 80000 | 10000
-- Real-world: Month over month revenue change
SELECT month, revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_month,
revenue - LAG(revenue) OVER (ORDER BY month) AS growth
FROM monthly_sales;
SUM() OVER — Running Total:
=============================
SELECT name, salary,
SUM(salary) OVER (ORDER BY emp_id) AS running_total
FROM employees;
RESULT:
name | salary | running_total
────────────────────────────────────
Shreyesh | 75000 | 75000
Arjun | 60000 | 135000
Priya | 80000 | 215000
Zara | 55000 | 270000
Rahul | 90000 | 360000
Nisha | 45000 | 405000
-- Running total PER DEPARTMENT
SELECT name, dept_id, salary,
SUM(salary) OVER (PARTITION BY dept_id ORDER BY emp_id) AS dept_running_total
FROM employees;
7. Transactions and ACID Properties
A TRANSACTION is a sequence of SQL operations treated as ONE unit. Either ALL succeed, or NONE do — no partial states! -- Classic example: Bank transfer BEGIN TRANSACTION; UPDATE accounts SET balance = balance - 1000 WHERE acc_id = 101; -- Debit UPDATE accounts SET balance = balance + 1000 WHERE acc_id = 202; -- Credit -- If BOTH succeed: COMMIT; -- Make changes permanent -- If ANYTHING fails: ROLLBACK; -- Undo ALL changes (balance restored!) -- Without transaction: debit succeeds, credit fails → money DISAPPEARS! -- With transaction: either both happen or neither → data stays consistent!
ACID Properties (Every interview asks this!):
==============================================
A — ATOMICITY
Transaction is ALL or NOTHING.
If any step fails → entire transaction rolls back.
"Atom" = indivisible unit
C — CONSISTENCY
Database goes from one VALID state to another VALID state.
All constraints (NOT NULL, FOREIGN KEY, CHECK) are respected.
Example: account balance can never go negative (if constraint exists)
I — ISOLATION
Concurrent transactions don't interfere with each other.
Each transaction sees the database as if it were alone.
Levels: READ UNCOMMITTED → READ COMMITTED → REPEATABLE READ → SERIALIZABLE
D — DURABILITY
Committed transactions are PERMANENT.
Even if server crashes immediately after COMMIT — data is safe.
Achieved via write-ahead logging (WAL) and disk persistence.
Isolation Levels — Interview Deep Dive: ========================================= Dirty Read → reading uncommitted changes from another transaction Non-repeatable → same query returns different values in same transaction Phantom Read → new rows appear between two identical queries in same transaction Level Dirty Read Non-Repeatable Phantom Read ────────────────────────────────────────────────────────────── READ UNCOMMITTED Possible Possible Possible READ COMMITTED Prevented Possible Possible (default in most DB) REPEATABLE READ Prevented Prevented Possible (default in MySQL) SERIALIZABLE Prevented Prevented Prevented (strictest, slowest) SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; BEGIN; -- Your queries here COMMIT;
8. Normalization — 1NF, 2NF, 3NF
Normalization = organizing tables to reduce REDUNDANCY and improve INTEGRITY.
Un-normalized (BAD) table:
===========================
order_id | customer | customer_email | products
──────────────────────────────────────────────────────────────
1 | Shreyesh | s@gmail.com | Laptop, Mouse
2 | Arjun | a@gmail.com | Keyboard
3 | Shreyesh | s@gmail.com | Monitor ← email repeated!
Problems: Duplicate data, update anomalies, insert anomalies, delete anomalies
FIRST NORMAL FORM (1NF): ========================== Rule: Each cell must contain ATOMIC (single) values. No repeating groups. Before 1NF: order_id | products 1 | Laptop, Mouse ← NOT atomic! After 1NF: order_id | product 1 | Laptop 1 | Mouse ← separate rows for each product ✅ 2 | Keyboard Rule: No arrays or lists in a single cell.
SECOND NORMAL FORM (2NF): ========================== Rule: Must be in 1NF AND every non-key column must depend on the WHOLE primary key. Violation = Partial Dependency (non-key column depends on PART of composite key) Table with partial dependency: order_id | product_id | product_name | quantity (PK = order_id + product_id) product_name depends ONLY on product_id → not the whole PK! This is PARTIAL DEPENDENCY → violates 2NF! Fix: Split into two tables: orders_products: order_id, product_id, quantity products: product_id, product_name
THIRD NORMAL FORM (3NF):
==========================
Rule: Must be in 2NF AND no non-key column should depend on ANOTHER non-key column.
Violation = Transitive Dependency
Table with transitive dependency:
emp_id | dept_id | dept_name
dept_name depends on dept_id (a non-key) → not directly on emp_id!
emp_id → dept_id → dept_name = TRANSITIVE DEPENDENCY → violates 3NF!
Fix: Split into two tables:
employees: emp_id, dept_id
departments: dept_id, dept_name
3NF Summary: "Every non-key attribute must depend on the KEY,
the WHOLE KEY, and NOTHING BUT THE KEY!"
9. Stored Procedures vs Functions
Feature Stored Procedure Function
────────────────────────────────────────────────────────────────
Returns 0 or more result sets Must return ONE value
Use in SELECT ❌ Cannot ✅ Can use in SELECT
DML (INSERT etc) ✅ Can have ❌ Usually restricted
Transaction ✅ Can manage ❌ Cannot start/commit
Call with CALL / EXEC SELECT / in expressions
Purpose Complex business logic Reusable calculations
-- STORED PROCEDURE: Get employees by department
DELIMITER //
CREATE PROCEDURE GetDeptEmployees(IN dept_id_param INT)
BEGIN
SELECT name, salary
FROM employees
WHERE dept_id = dept_id_param
ORDER BY salary DESC;
END //
DELIMITER ;
-- Calling it:
CALL GetDeptEmployees(10);
-- FUNCTION: Calculate tax for a salary
DELIMITER //
CREATE FUNCTION CalculateTax(salary DECIMAL(10,2))
RETURNS DECIMAL(10,2)
DETERMINISTIC
BEGIN
RETURN salary * 0.30;
END //
DELIMITER ;
-- Using it in SELECT:
SELECT name, salary, CalculateTax(salary) AS tax
FROM employees;
Interview Questions
Q1: What is the difference between INNER JOIN and LEFT JOIN?
"INNER JOIN returns only the rows where there is a matching value in both tables — rows with no match are excluded from both sides. LEFT JOIN returns ALL rows from the left table plus matching rows from the right table. If there is no match in the right table, the right table columns return NULL. Use INNER JOIN when you only care about matched data, and LEFT JOIN when you need all rows from the left table regardless of whether they have a match."
Q2: What is the difference between WHERE and HAVING?
"WHERE filters individual rows BEFORE the GROUP BY clause is applied — it cannot use aggregate functions. HAVING filters groups AFTER the GROUP BY is applied — it can use aggregate functions like SUM, COUNT, AVG. The SQL execution order is: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY. If you can achieve the same result with WHERE, prefer it over HAVING because WHERE filters early and reduces the data that GROUP BY has to process."
Q3: What is the difference between UNION and UNION ALL?
"Both UNION and UNION ALL combine the result sets of two SELECT statements vertically. UNION removes duplicate rows from the combined result, which requires a sort/deduplication step and is slower. UNION ALL keeps all rows including duplicates and is faster because it skips the deduplication step. Use UNION ALL when you know there are no duplicates or when duplicates are acceptable, as it is significantly more efficient."
Q4: What is an index and how does it work?
"An index is a data structure — typically a B-Tree — that the database builds on a column to speed up data retrieval. Without an index, a query does a full table scan, reading every row — O(n). With an index, the database uses binary search on the B-Tree to find the row in O(log n) time. Indexes speed up SELECT but slow down INSERT, UPDATE, and DELETE because the index must be updated on every write. You should index columns used frequently in WHERE clauses, JOIN conditions, and ORDER BY clauses."
Q5: What is a correlated subquery? How is it different from a regular subquery?
"A regular subquery runs independently — it executes once and its result is used by the outer query. A correlated subquery references columns from the outer query, so it must be re-evaluated once for every row processed by the outer query. If the outer query has 1 million rows, the correlated subquery runs 1 million times, making it very slow. Regular subqueries run once regardless of outer row count. Correlated subqueries should be replaced with JOINs and GROUP BY wherever possible for better performance."
Q6: What are ACID properties?
"ACID stands for: Atomicity — a transaction is all or nothing, if any step fails the whole transaction rolls back. Consistency — the database goes from one valid state to another, all constraints are maintained. Isolation — concurrent transactions don't interfere with each other, each sees a consistent snapshot. Durability — once a transaction is committed, it is permanently saved even if the system crashes immediately after, thanks to write-ahead logging."
Q7: What is the difference between RANK, DENSE_RANK, and ROW_NUMBER?
"All three are window functions that assign numbers to rows within a partition. ROW_NUMBER assigns a unique sequential number with no gaps, even for ties. RANK assigns the same number to tied rows but skips the next numbers — if two rows tie at rank 2, the next rank is 4. DENSE_RANK also assigns the same number to ties but does NOT skip — if two rows tie at rank 2, the next rank is 3. ROW_NUMBER is best for pagination, RANK for leaderboards where gaps matter, DENSE_RANK for leaderboards where you don't want gaps."
Q8: Write a query to find the second highest salary.
"There are multiple ways. Using DENSE_RANK: SELECT salary FROM (SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk FROM employees) ranked WHERE rnk = 2; — this handles ties correctly. Using LIMIT/OFFSET: SELECT DISTINCT salary FROM employees ORDER BY salary DESC LIMIT 1 OFFSET 1; — simple but OFFSET-based pagination can be slow on large tables. Using subquery: SELECT MAX(salary) FROM employees WHERE salary < (SELECT MAX(salary) FROM employees); — clean and readable."
Quick Recap — All Joins
| JOIN Type | Returns | NULL Rows | Use When |
|---|---|---|---|
| INNER JOIN | Matched rows only | None | You only want data with matches on both sides |
| LEFT JOIN | All left + matched right | Right side | All rows from left, even without matches |
| RIGHT JOIN | All right + matched left | Left side | All rows from right (rewrite as LEFT JOIN instead) |
| FULL OUTER JOIN | All rows from both | Both sides | All rows from both tables regardless of match |
| CROSS JOIN | Every A × every B | None | Cartesian product, combinations |
| SELF JOIN | Table joined with itself | Depends | Hierarchical data, employee-manager |
Key Points to Remember
- SQL execution order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT
- WHERE filters rows BEFORE grouping | HAVING filters groups AFTER grouping
- WHERE cannot use aggregates (SUM, COUNT) | HAVING can
- UNION removes duplicates (slower) | UNION ALL keeps all (faster)
- INNER JOIN excludes non-matching rows from both sides
- LEFT JOIN + IS NULL on right side = find rows with NO match (Anti-Join)
- Index speeds up SELECT but slows down INSERT/UPDATE/DELETE
- Composite index leftmost prefix rule — (dept, salary) index helps WHERE dept = 10, not WHERE salary > X
- EXPLAIN shows if your query uses an index — look for "ALL" (bad) vs "ref" (good)
- ROW_NUMBER = always unique | RANK = skips after tie | DENSE_RANK = no skip
- LAG() accesses previous row | LEAD() accesses next row
- Correlated subquery runs once per outer row — replace with JOIN for performance
- ACID: Atomicity, Consistency, Isolation, Durability — memorize this!
- 3NF rule: depend on the key, the whole key, and nothing but the key
- UNION ALL is always preferred over UNION when duplicates don't matter
Keep coding, keep learning! See you in the next one!
Post a Comment